Neural networks have highly stimulated the development of pattern classifiers. Their architecture supports the implementation of almost any classification function. The wide set of possibilities to train them allow for almost any training strategy. Even the most degenerated neural network, the single perceptron, can be trained according various strategies like the minimization of the probability of error, the mean square error and the description length by just playing with targets, step sizes and by neglecting small weights.
What has been learned from studying neural network training and behavior is that it is possible to handle very large nonlinear machines using a large set of regularization tools like early stopping, weight decay and noise injection. By this it is possible that neural networks much larger than necessary for the problem still find good solutions. So it is not a necessary condition for obtaining generalization to have a sufficiently small machine that by its nature damps all noise. By studying neural networks it can be observed that during training the influence of remote objects is diminished. Inspired by this we developed a support vector perceptron technique. The initial experiments reported here are stimulating. Support vector methods in general should be welcomed for their small dependency on the feature size. They are thereby expected to suffer hardly or not at all from the peaking phenomenon for increasing resolutions as shown in fig 2 .
The performance of neural network techniques themselves are always somewhat disappointing in relation with their computational effort and compared with more dedicated techniques and with its eternal competitor, the nearest neighbor rule.
We have now some powerful possibilities for attacking the very small sample size problem (sample sizes smaller than the dimensionality). On the other side, the large sample size problem certainly needs some attention: Researcher like Le Cun [20] use larger and larger networks in order to be able to use all separation possibilities offered by the data. Support vector machines have difficulties to handle large datasizes simultaneously for nonlinear classifiers. A solution might be the use of combining classifiers trained by moderate sample sizes. This technique is widely studied now and it may be a fruitful option for training support vector machines by large data sizes.
Adrian F Clark