Next: 5 Discussion Up: A Comparative Study of Previous: 3 Data collection and

4 Results

4.1 Snakes

The output of the ``snake'' method is the percentage of the double edge that was found in each object. In order to produce a pollen/non-pollen result a threshold value has been used. Figure 3 shows the level of true positives and false positives for percentages of edge from 0 to 100%.

   
Figure 3: Snake results for set B, Percentages of true and false positives

The number of true positives (pollen that was recognised as pollen) is fairly constant up to about 70% edge identification, and includes most (90%) of the pollen, indicating that the model was correct, in that most of the pollen in this sample has a strong double edge. However, the results also show that a considerable amount of the non-pollen also seems to fit this model, resulting in a high number of false positives.

4.2 Paradise

The Paradise network was trained using the data from set A which was presented to the network twice. During this training phase the network is able to reject objects which do not have sufficient features to allow a sufficiently accurate internal representation. This resulted in 405 objects being classified by the network. (The large number of rejected objects is entirely due to the automatic method used to select the objects initially).

Figure 4 shows the most important classes created by the Paradise network during the training run. For each class the class number is given first and the value in brackets is the number of objects represented by this class. All of the classes which contained more than 5 objects are shown and these classes constitute 74% of those classified from set A. The objects are shown in order of recognition quality i.e. those on the left are best represented by the class. Where a class is too large to display in full only the three sections from the beginning, middle and end of the class are shown (e.g. class 0).

For each of the 90 classes created the meta-classes were determined by expert examination. This resulted in 36 pollen classes, 50 non-pollen (mostly debris) and 4 which contained a mixture of pollen and non-pollen. The diversity of the debris means that 90% of the debris classes only recognised a single object from the set and when this type of system is used in a recall mode it is often possible to remove these ``singleton'' classes as they represent outliers of the data set.

   
Figure 4: The most important Paradise classes from the training run on set A

   
Figure 5: Objects from the validation run on set B which were recognised by the classes given in Figure 4

 

Table 1: Paradise classification results for set B

Object type Recognised as
Pollen Non-pollen Unsure Unclassified
Pollen 79.1% 8.9% 7.1% 4.9%
Non-pollen 4% 22% 2% 72%

 

Table 1 shows the results of the validation run using set B, and Figure 5 shows the objects recognised by the classes given in Figure 4 . In this case no new classes were allowed to be created and therefore it is possible for a presented object to be unclassified. The network recognised nearly 80% of the pollen as pollen and only misclassified 4% of the debris as pollen. This performance is estimated to be at least comparable with a human operator. The low classification rate and large amount of unclassified material for non-pollen is due to the variability of the debris which was identified during the training run on set A.



Next: 5 Discussion Up: A Comparative Study of Previous: 3 Data collection and

Mr I France
Mon Jul 7 13:24:58 BST 1997