Machine Learning Thoughts: Philosophy

Computer vs Human Learning

It seems obvious that despite 3 or 4 decades of research on Machine Learning, the ability of computers to learn is far inferior to that of humans.
As a result it seems natural to choose as a goal for learning algorithms to match human performance.

However, is it really fair to compare computer learning with human learning? Or, more precisely, how should they be compared in order to have an objective assessment of their respective abilities?

Here are some remarks that one should have in mind before making such a comparison:

First of all, it is well known that, theoretically speaking, there is no better algorithm: if you look at all the possible problems, and compare any two algorithms, there will be exactly one half (I am not formalizing these notions here, but precise statements exist) of the problems on which the first algorithm will do a better job than the second one. In other terms, there is no algorithm that systematically outperforms the other algorithms on all problems. It is thus pointless to compare learning algorithms in general.
So in order to make a meaningful comparison, one should first select a limited number of problems of interest. The natural class of problems in our case would be to consider all the learning problems that humans are coping with: e.g. speech recognition, object recognition, motor learning...
Having a list of such problems would make a reasonable benchmark on which learning algorithms could be evaluated and compared to human performance.

Of course, on all these tasks the algorithms we have built so far are unable to match (or in certain cases even get close to) human performance.
Does it mean our algorithms are bad?
I would tend to think rather the opposite: they are very good and much better than humans in many respects. In my opinion, the reason why they do not get good success rates on those problems that are trivially solved by humans is that they don't have access to the same information and they are not given the right "priors"

Human brains are genetically designed for solving certain types of learning problems. For example, there are plenty of hard wiring in the visual system that makes object recognition easy for humans.
Also, for most "human learning" tasks, when we build a database of examples to be given to a computer, we only give a very limited amount of information to the computer. For example, when humans start to learn to recognize hand written characters, they already have a visual system that has been extensively trained to recognize all sorts of shapes. Imagine someone who has been blind from birth who suddenly recovers sight at the age of 6 and is directly put in front of handwritten characters on a screen, and asked to classify them. I would suspect that he would not get a better accuracy than existing learning algorithms.

A similar but more realistic example could be constructed as follows. Imagine you are presented with images of handwritten characters whose pixels have been shuffled in a deterministic way. For vector-based learning algorithms this would not make any difference (they would still get some good predictive accuracy), while for humans it would be completely impossible to reach any reasonable level of generalization.

Yet another example to illustrate the kind of task that computers are faced with is the spam classification. It is clear that humans are better at classifying spam vs non-spam emails. The reason is that they "understand" what the emails mean and this is because of years of language training. So now, imagine you are given 200 emails written in chineese among which 100 are spam and 100 non-spam. Do you think someone who has never had any training about chineese language would be able to reach the kind of generalization accuracy a computer would?

The examples above illustrate how little information computers have when they are faced with a supervised learning task. It seems reasonable to assume that humans faced with similar tasks would not be much better.

The above discussion aims at emphasizing the importance of "having the right prior". To some extent, building an algorithm essentially means designing a prior, and designing a prior can only be done with respect to a class of problems (there is no "universal" prior). Designing a prior (which also means choosing an appropriate representation of the data) actually allows to introduce a large amount of information into the learning algorithm. Most human learning tasks are tasks that require a lot of information and this is why computers usually fail on those.

July 23, 2005 in Machine Learning, Philosophy | Permalink | Comments (0) | TrackBack (0)

The Importance of a Good Name

Interestingly, marketing is also important in scientific research. Having the right name for a scientific field is extremely important for the development and popularity of the field (e.g. to obtain grants or people's attention).

A consequence is that many scientists tend to follow the trend by qualifying themselves with the most fashionable name. For example, those who used to work in "Artificial Intelligence", worked later in "Data Mining" and then in "Machine Learning". One advantage is that this tends to mix people from different communities, but the drawback is that this blurs the distinction between domains and hence may be detrimental to the image of the field since it may lead to a lack of unity, lack of formalism, or lack of common language.

Another consequence is that theoreticians (e.g. people working on specific mathematical domains) are encouraged to pretend that their work has applications in the currently fashionable domain. One advantage is that many domains get formalized and "cleaned up" when mathematicians get involved, think for example about Quantum Mechanics and Von Neumann, or more recently, Spin Glasses and Talagrand. However, there is always a risk that theoreticians keep doing what they are good at (e.g. keep working on their own theories), and ignore the specifics of the application domain.

July 18, 2005 in Philosophy | Permalink | Comments (0) | TrackBack (0)

Overfitting and psychology

I was thinking about writing a post in order to explain overfitting in simple terms.
The way I wanted to start was something like:
"Overfitting is the phenomenon of building a model that agrees well with the observed data but has no predictive ability (it does not agree with unseen or future data)"

This is probably fine, but somewhat formal, and sometimes a simple example is better than a long explanation. The example I needed was given to me by Bernhard Schölkopf who noticed a connection between a psychological disease called "obsessive compulsive disorder" and the overfitting phenomenon:

Someone who is O.C. will for instance take on and off his trousers five times in a row, since he once did it and it causes something positive to happen (or he thinks it helped avoid a catastrophe).
A lot of children have this, in a mild form (e.g. they do not want to step on the edge of the tiles in the floor, etc.). So maybe it is actually part of our inference engine, trying to learn decisions/actions. If this is true, then O.C. is nothing but an error in the inference enginge. Maybe the wrong capacity, leading to overfitting.

June 30, 2005 in Philosophy | Permalink | Comments (2) | TrackBack (0)

Machine Learning Thoughts

Some thoughts about philosophical, theoretical and practical aspects of Machine Learning.

About

Favorite Links

Categories

Recent Comments

Related blogs

Archives

Favorite Books

Machine Learning (Theory)

Other links

Computer vs Human Learning

The Importance of a Good Name

Overfitting and psychology