In the last year, the use of digital photography has spread beyond the bounds of high-profile media companies and into the home. Inexpensive, high quality digital cameras are now widely available and, in many situations, are being used in preference to chemical-film systems. The change in attitude has been fuelled by the increasing availability of personal computers and affordable colour printers which enable the user to view, manipulate and take hard copies of their own images. The result of this move is that many people's home photograph collections will be held on digital mediums such as CD-ROM. The drawback of this is that the user will have to rely on software search facilities having lost the ability to rapidly browse through images by ``flicking through'' by hand. Current search techniques for commercial database indexing systems generally employ text-based searches requiring a human user to tag each and every image with an appropriate key. This can lead to problems of subjective tagging potentially resulting in a tag applied to an image not fully portraying its content leading to failed searches. Although text-based keys are appropriate for purely textual databases, they are less useful when addressing image databases. Thus an alternative approach is required which employs information held within the image itself.
Previous work has recognised the need for such an approach and have tackled it with a number of techniques. Examples of systems which use global image content include Query By Image Content (QBIC) as described by Flickner et al [ 7 ], PhotoBook (Picard and and Liu [ 9 ]) and ART-MUSEUM (Kato et al [ 4 ]). Global image content information can be employed to provide syntactic pictorial indices as illustrated by De Marsicoi et al [ 3 ]. These usually employ feature types such as Colour, Texture, Shape and Spatial Relation.
A method of representing colour is presented by Chang and Yang [ 2 ]. An image information measure is given by the minimum number of pixel grey level changes required to convert the colour image to one of constant grey-level. Images with differing content will result in unique image information measures which can be employed as search keys. QBIC approaches the search task by analysing colour, texture and shape in the image [ 7 ]. Texture is examined on a global scale, extracting direction, coarseness and contrast, which are used as comparative measures. Colour is employed by the generation of a histogram for the entire image to which Fuzzy K-means is applied in order to identify similarities in image colour balance. An interesting approach to texture is employed by PhotoBook developed at the MIT-Media Lab. Based on a new image model using the Wold decomposition for regular stochastic processes [ 3 ], this technique orders images according to their textural distance between their Wold components. This can be employed as a quick, first step match in an on-going query.
So far, only those techniques which examine the image on a global scale have been considered. This provides a useful tool to group images according to their overall appearance, but is less appropriate for finding specific image content with possibly significantly-differing surroundings. In order to do this, local, regional image content must be described. This requires the image to be broken up or segmented into meaningful regions from which feature data can be extracted. This data can be employed in a similar manner to the global data already discussed but on a regional level.
In order to make the system as ergonomic as possible, the process of segmentation would have to be automatic since manual segmentation is a laborious and painstaking task. Automatic segmentation of images is a complex problem to address and is difficult to implement. One technique is described by Samet [ 11 ] whereby the image is subdivided according to the presence of relevant points within a given quadrant, forming a hierarchical data structure based on quad-trees. Although there are overheads associated with this technique, it can be efficient for the accurate analysis of areas of the image such as the intersections of region boundaries. QBIC approaches this problem by employing the user to help with the segmentation. As well as the more regular colour-boundary flood-fill, the production of segmentations is achieved by having the user identify rough regions-of-interest and using an edge-detection algorithm to pull boundaries into more exact shapes by searching for high gradient colour changes nearby the user's sketch [ 7 ]. ART-MUSEUM uses a similar approach [ 4 ].
Once a usable segmentation has been established, feature data can be extracted and used to carry out region-based database queries. This allows information about specific objects within images to be represented and searched for, enabling much more specific queries.
Previous work in the Advanced Computing Research Centre has shown that neural networks can be used to a high degree of success in the classification of objects held within 2D images. This was demonstrated by Campbell et al [ 1 ] where a Multi-Layer Perceptron (MLP) network was used to classify a closed-domain set of images with fixed characteristics. Unfortunately, the nature of MLPs means that a large volume of classified training data is required in order to produce such a result.
When considering the open-ended scenario of a photographic database, the domain is not closed and the number of region-classes potentially quite large, as well as problems caused the subjective nature of potential queries. A number of problems arise when considering a photographic database, especially one built from a home collection. These include the variable quality and physical characteristics of the images themselves, the large number of potential search classes, and the subjective nature of the queries. Further to this, the data contained in the database will not be tagged since the user will not wish to perform a large-volume manual classification in order to carry out subsequent queries. Any training data must be obtained on-line from information provided by the user during queries. The consequence of this is that the data available for network training will be minimal. In the analysis of input data, MLPs use hyperplanes to subdivide feature-space into segments representing meaningful classifications. If it is assumed that the data can be clustered in this manner then it is reasonable to expect that points in feature space which are close together represent the same class. The problem of segmenting feature space using hyperplanes can be overcome by assuming that a region given as a search key has a feature vector which lies at the centre of a given class-cluster and that other data points closest to it will belong to the same class, at least in the early stages of the database use. This forms the foundation of a Radial Basis Function (RBF) node. This defines the likelihood of class membership as the n-D Gaussian distribution from the node's central point. As more data becomes available, the distribution of the initial node can be adjusted and additional nodes can be added to refine the description of the class membership in feature space. Further to this, additional nodes can be defined which are centred on known clusters of feature vectors not belonging to the required class (as indicated by the user during classification). This extra data can be gathered by asking the user to accept or reject regions from images obtained by employing the initial node to select nearby data points.
RBFs were first embodied into the field of neural networks principally as a result of the work of Powell [ 10 ] and Micchelli [ 8 ]. An RBF node is placed at the centroid of each cluster and trained to identify classes (which in the case of our search will simply be `Yes' or `No'). A node is supplied the position of a given region for consideration and outputs its appropriate activation computed by an n-D Gaussian distribution function giving the confidence value for the given region's membership of its class. A sample vector is given membership of the class whose node responds with the highest activation. Resulting images are ranked according to the level of this activation and the highest are returned to the user.
Wood M E J