Abstract
A new system named RAIDER has been designed in an attempt to combat the inadequacies and inefficiencies of current image database management systems. RAIDER consists of two sections: content based annotation and content based retrieval. A system overview is given in this paper together with retrieval results on a database of over 1300 images from 44 Brodatz texture classes. Two retrieval methods are presented both of which extract rotation invariant texture features from the images. The first, a multichannel filtering technique, is highly accurate and the second, a method based on histogramming edge information gathered by a well known edge operator, is both simple and efficient. Finally the two methods' resistance to noise is measured by adding Gaussian noise to 6600 query images. An interactive object selection tool based on colour and texture is described. Experimental results are given where applicable.
Image databases are becoming more widespread as a result of the advent of digital cameras and the increased availability of inexpensive storage media. Image retrieval from such databases is a major concern as current technology is extremely restrictive. Keyword searches are the traditional retrieval method; each image is associated with a list of words which is accessed during a database search. The word list can never be comprehensive enough to cover every conceivable search pattern. Its compilation is also extremely time consuming. Intelligent image databases have become a major research focus. Content based image retrieval is a widely accepted solution to the problem. It enables users to search the actual images rather than a list of words. General systems under development include QBIC [1] and Photobook [2]. More specialised examples are I2C [10], a system for indexing, storing and retrieving medical images, and MARCO [11], a system for retrieving maps by image content. The problem is also extended to video databases where millions of images are stored for each film. Systems such as JACOB [12] are currently under development to browse and query such databases.
In this paper we describe an alternative image database management system (IDBMS), the RAIDER system, which exploits content based image retrieval and annotation for increased flexibility and efficiency. A modular approach is adopted in the implementation of RAIDER. The paper focuses on an important but hitherto overlooked problem in image databases and texture analysis [6] - the rotation invariant retrieval of texture images. Two methods of accomplishing rotation invariance are presented: a multichannel filtering approach and an edge operator based method. Both are incorporated into RAIDER for rotation invariant annotation and retrieval of images. The inclusion of rotation invariance distinguishes RAIDER from all other image database systems. It also makes the system more consistent with human visual annotation and retrieval of images (as image recognition by the human visual system is clearly rotation invariant).
The remainder of the paper is organised as follows. The annotation and retrieval parts of the RAIDER system are presented. The rotation invariant texture analysis methods used in RAIDER are explained along with classification experiments and appropriate results. The methods are incorporated in to RAIDER for image retrieval and their resistance to Gaussian noise studied. Finally the most appropriate method is applied to object selection along with colour analysis for use in image annotation.
RAIDER is comprised of two sections namely image annotation and image retrieval. The following sections explain the two parts in more detail.
Text based queries such as 'Find me a picture of a house' pose no problem for traditional methods if two conditions hold: 1) The label 'house' is included in the database, 2) All instances of houses have been labelled. In this example it is quite likely that many images possess the 'house' label so a correct result will be seen. It is however unlikely that condition two holds as well.
A solution is to incorporate definitions of all objects in terms of colour, texture, shape etc. in to the system. This is a content based solution as the system can analyse each image in the database and use the descriptions to locate the required objects. Assuming this theory succeeds in practice a lack of flexibility is still apparent. If the object is not known to the system condition one will fail. An efficient method of dynamically increasing the systems world knowledge is required. The content based annotation section of RAIDER attempts to solve the problem at the database population stage. The annotation process is an interactive content based procedure designed to continually update and add to existing object descriptions in the system. Picard and Minka [3] introduce the idea of label propagation which plays an important role in the annotation system.
Figure 1 Illustration of interactive content based annotation
Figure 1 illustrates the process undertaken when an image is added to the database. The system first attempts to label all objects in the scene via the use of colour, shape, texture and their combinations. The labels are then verified by the user and deselected as necessary. The user selects and labels remaining objects via an interactive object selection process (detailed in Section 7 ). The label is propagated through the image (intra-frame propagation) and the rest of the database (inter-frame propagation) if required via the use of colour and texture classification and segmentation techniques. As time progresses RAIDER's knowledge increases, therefore the user's work load decreases. In more specialised databases (e.g. when all possible objects are known in advance) annotation can become a fully automatic process.
Image annotation is traditionally inadequate and tedious. RAIDER's annotation method combats both of these problems and the process becomes quick, easy and effective.
RAIDER accommodates two methods of query formation namely text and content based queries which can be mixed and matched as required. Text based queries include object searches, e.g. "Find me a picture of a house" and image descriptions e.g. "Find me a picture of a bright sunny day". The former was discussed in the previous section. The latter is addressed in Section 9 . Figure 2a shows the current version of the retrieval section of RAIDER. Content queries can be based on colour, texture, shape, detail areas and similar image properties explained below:
Colour: Either single or multiple search colours can be specified via the use of a colour wheel or predefined colour samples.
Texture: A texture selection tool enables the user to search for textures from a system library (i.e. textures RAIDER possesses knowledge of). User defined textures can be added to the library via the use of a filename entry box.
Shape: A drawing area is available for specifying shapes on which to search.
Detail area: The detail area drawing tool may be used in conjunction with the colour, texture and shape tools to specify areas in which the properties should occur. It can also be used separately to define required areas of dense image detail.
Similar Image Properties: Figure 2b shows the result of a 'similar image' search where a query image (shown at the top of the window) is presented to the system. Image features are computed and compared to those from all the images in the database. The n closest matches are selected and the corresponding images displayed as thumbnails in the lower half of the window. This method of retrieving images is the main focus of current work. The features used to retrieve the images are derived via texture analysis. The techniques used are described in Section 4 .
Figure 2 Content based image retrieval in RAIDER (click on images to enlarge)
A test database was created for texture classification and image retrieval experiments. The database consists of 44 Brodatz [5] texture classes shown in Figure 3. Each texture was randomly rotated and cropped to 128*128 pixels. The resulting images were subjected to histogram equalisation to prevent bias towards images with similar grey levels. A total of 1320 images (30 from each texture class) were obtained.
Figure 3 The 44 Brodatz texture classes contained in the test database (click on image to enlarge)
During classification experiments half the database was used as training data. Exemplar feature vectors were obtained by averaging the feature vectors for each texture class. The second half of the database was used during classification where the Euclidean distance classifier was used.
Texture analysis has been a major research area for decades. Many established methods exist for the classification of textured images. Unfortunately most techniques assume that the textures are uniformly presented and captured from the same viewpoint. This is an unrealistic assumption in the real world [6]. For applications such as content based image retrieval, texture analysis often need to be invariant to viewpoints. Genuine viewpoint invariance is extremely difficult to obtain. Rotation invariance (an important aspect of the general viewpoint invariance problem) is a practical starting position and forms the main goal of the studies in this paper. In this section two novel algorithms are described for extracting rotation invariant texture features. They are used in the RAIDER system for rotation invariant texture classification and for the retrieval of texture images.
A multichannel filtering technique based on Gabor filters in the frequency domain is used to acquire rotation invariant texture features. The definition of a Gabor filter is given in Equation 1.
where g(x,y) is a symmetrical Gaussian of the form:
This function can be split into two parts, the even and odd filters he(x,y) and ho(x,y) which are also known as the symmetric and antisymmetric filters respectively. These filter pairs are given in Equation 3 and are used in the multichannel method of rotation invariant texture analysis.
The Fourier transform of the filters is taken and the output images obtained via FFT. For example:
where P(u,v) is the Fourier transform of the input image p(u,v) and He(u,v) is the Fourier transform of the filter he(u,v). The outputs of the two filters are combined using the following equation to obtain a single value at each pixel (see [13-14] for a justification of this combination):
Two input parameters are required for the filter location namely the radial frequency ( f ) and the orientation (theta). For each radial frequency, filters are positioned at, and sampled around, a circle of radius f . 180/ delta theta filters are thus required per frequency as conjugate symmetry is exploited, where delta theta is the sampling interval. The energy values of the filtered images form a periodic function of theta with period pi. A rotation of the input image corresponds to a translation of this function. n rotation invariant features are obtained from the first n magnitudes of the periodic function's Fourier coefficients. The process is repeated for each of x frequencies resulting in an xn -dimensional feature vector which can be used during classification. Further details of the method may be found in [8]. Similar rotation invariant features are proposed in [4] and [9]. A sampling interval of 10 was used and 3 features were retained per radial frequency ( f =2,4,8,16,32,64) [7]. Each image in the database was presented to the classification system which then returned a texture classification. A 94% overall correct recognition rate was obtained. Table 1 shows the average recognition rate for each texture class.
In the second method a Sobel edge operator is used to generate gradient direction and magnitude images of the input texture. The gradient directions (theta) at all pixels are then histogrammed and weighted by the corresponding gradient magnitudes. The resulting histograms are spiky; spurious spikes are removed by smoothing. Normalisation is required to remove the undesirable effects of different illuminations. The following equation defines the normalisation technique used:
where h is the desired height of the histograms, m is the largest histogram value and B(theta) and b(theta) are the normalised and original values at a histogram bin theta respectively.
The cyclic direction histogram formed can be regarded as a periodic function of theta with period 2 pi where a rotation of the image results in a translation of this function. The Fourier transform of the periodic function is taken, the magnitudes of the function's Fourier coefficients are invariant to rotations; the first n magnitudes can be represented in an n -dimensional feature vector for use in classification.
Each texture from the image database is presented to the classifier for classification. The method achieved an overall recognition rate of 53% on the test database using 4 features. A breakdown of this result into individual texture classes is shown in Table 1.
Table 1 Classification rates for the Gabor and Sobel method
The Sobel method was found to be less accurate than the Gabor method. Its main attraction is simplicity and efficiency; its execution time being a fraction of the Gabor methods. The Sobel method is an automatic process requiring no input or tuning parameters. It is these issues which render it suitable for image database applications. In contrast the Gabor method is highly accurate. 99% of all textures are correctly identified within three guesses compared to 81% for the Sobel operator. Four input parameters are required in the Gabor method; sigma, theta, the radial frequencies and a value for the number of features to be kept per frequency. Experimental evidence suggests that optimal values for these parameters can be established and used. The methods execution time can be decreased by applying the filters to the image in parallel.
Both methods were included in RAIDER and image retrieval experiments performed. The experiments are based on similar image properties, i.e. a query image is presented to RAIDER which returns the n most similar images from the database according to a criterion c . Textures from the database described in Section 3 were used as query images and were presented to RAIDER in turn. The closest five images from the data-base were returned per search. Euclidean distance was used as the similarity measure.
Using the Gabor method an average of 98% of the images returned by RAIDER are of the same Brodatz texture class as the query image. This compares to 63% for the Sobel method. The averages are decomposed into individual texture classes in Table 2.
Table 2 The mean retrieval results for each texture class using the both methods
17 texture classes gave perfect retrieval results for the Gabor method, compared to 1 for the Sobel method. D6 (a highly regular texture) proved to be the most successful class obtaining a combined 100% retrieval rate. Textures D21 and D68 were also very successful in both methods. The results for D105 are the lowest using the Gabor method but are surprisingly high using the Sobel method.
Another practically important but often overlooked issue in image databases and texture analysis is the noise robustness of texture features. In this section we outline our studies on the noise robustness of the rotation invariant features in the context of image retrieval. For this purpose various levels of Gaussian noise (sigma=0-90) were added to each image in the database.
Figure 4 The addition of Gaussian noise to texture D104
The resultant 6600 noisy images were then used to query the database. A total of five images were returned per search as before. Figure 4 shows texture D104 with the addition of various levels of noise.
Figure 5 shows the probability that an image returned from a search is of the same texture class as the query image. It can be seen that for the Gabor features, noise with a sigma of 38 (shown in Figure 4) can be added to the query images before the retrieval rate drops equal to the Sobel method using clean images. It is also at this noise level when the Sobel curve begins to level off as retrieval becomes random. The shallower gradient of the Gabor curve suggests that the method is more resistant to noise than the Sobel method.
Figure 5 Results from the Sobel and Gabor methods of content based image retrieval.
It is interesting to note that for an average of one correct texture class returned per search the Sobel and Gabor methods succeed to noise levels of 26 and 74 respectively. Example images containing such levels of noise are presented in Figure 4.
In the previous sections we have discussed rotation invariant texture features and their use in RAIDER for rotation invariant image retrieval. In this section we describe our initial work on content based image annotation in RAIDER. We focus on object selection as it is essential during annotation (intra- and inter-frame propagation). Manual object selection is a painstaking experience (especially for complicated objects such as trees which could take hours to outline with a mouse) and fully automatic object selection (image segmentation) is generally beyond the state-of-the-art in image processing and computer vision. Therefore a semi-automatic method has been developed. At present object selection is based either on colour or texture. In both cases the user draws a dragbox over the object to be labelled (see Figure 6) which is later extended automatically.
The HSV colour space is used in order to minimise errors due to brightness differences. A colour histogram (hue-saturation-number of pixels) of the marked area is compiled. A region growing technique is employed which takes the central pixel of the area as a seed. On region growing the smoothed hue and saturation values of the test pixel are used to index the colour histogram. If the relevant bin contains entries the pixel is part of the region. Figure 6 shows object selection examples on natural scenes.
Figure 6 Object selection using the multi-channel colour method (click on images to enlarge)
Texture is often a more appropriate segmentation feature than colour. Texture analysis is required when colour segmentation fails. The method developed is based on multichannel Gabor filtering, as explained in Section 4.1, with the addition of a dynamic frequency detection stage. Filters must be positioned at areas of high activity, therefore, peaks in the image's power spectra are located and these x frequencies are selected along with a sampling angle of 10 degrees for filter placement.
The filtered images are analysed and rotation invariant features extracted at each pixel. Object selection then continues as in Section 7.1 using the rotation invariant features. Examples are given in Figure 7 (one is a synthetic image and the other is a natural image showing a shirt flanked by a jacket). Images in (a) are the original with user selected regions; those in (b) the final selection based on colour; and those in (c) the final selection based on texture. The results show that the use of colour fails to locate the desired object and that the texture based method is invariant to image rotation.
Once the entire image has been segmented and the objects located, attributes such as rotation invariant texture features discussed in Section 4 can then be computed for each segmented/located region/object. Such attributes will subsequently be used in label propagation
Figure 7 Object selection based on texture analysis
RAIDER has been introduced as an IDMS which exploits content based annotation and retrieval for increased flexibility and efficiency of database searches. An overview of the two sections of RAIDER was given. Two rotation invariant texture analysis techniques have been explained and classification results on a database of over 1300 images presented. Correct classification rates of 94% and 55% (99% and 83% using three returns) were obtained for the multichannel Gabor method and Sobel methods respectively. The techniques were incorporated in to RAIDER and retrieval experiments conducted. 98% and 63% of all textures returned from a search were of the correct texture class for the Gabor and Sobel methods respectively. 6600 query images containing various levels of Gaussian noise were used to test the methods' resistance to noise. The noise level rose to a standard deviation of 26 and 74 respectively before an average of only 1 correctly retrieved image out of 5 was reached (1 per search).
The Gabor method was also applied to object selection for use in the annotation process. Rotation invariant texture segmentation of both synthetic and natural images was obtained. A multi-channel colour histogramming method of segmentation was also successfully applied to object selection.
The implementation of the RAIDER system is in its infancy and a great deal of work remains to be done. Future work includes geometric invariant texture analysis and the combination of texture and colour for image retrieval and label propagation. The implementation of an object hierarchy system to advance query processing to new levels of intelligence would be extremely beneficial.
[1] M. Flickner et al (1995), Query By Image and Video Content: The QBIC system, IEEE Computer Vol. 28 No. 9.
[2] A. Pentland et al (1994), Photobook: Content-Based Manipulation of Image Databases, M.I.T. Media Laboratory, Technical Report No. 255 .
[3] R. Picard and T. Minka (1995), Vision Texture for Annotation, Multimedia Systems 3:3-14.
[4] B. S. Manjunath et al (1995), Rotation Invariant Texture Classification using Modified Gabor Filters, Procs of IEEE ICIP95 , pp262-265.
[5] Brodatz (1966), Textures: A Photographic Album for Artists and Designer, Dover, NY.
[6] T. N. Tan (1995), Geometric Invariant Texture Analysis, Procs SPIE Vol. 2488 pp475- 485.
[7] S. R. Fountain and T. N. Tan (1997), Extraction of Noise Robust Rotation Invariant Texture Features Via Multichannel Filtering, Submitted to IEEE ICIP97 .
[8] T. N. Tan (1994), Noise Robust and Rotation Invariant Texture Classification, Procs of EUSIPCO-94 , pp1377-1380.
[9] Greenspan et. al. (1994), Rotation invariant texture recognition using a steerable pyramid, Procs of ICPR94 , pp162-167.
[10] S. C. Orphanoudakis et al (1994), I2C: A system for The Indexing, Storage, and Retrieval of Medical Images by Content , Med. Inform., Vol. 19, No. 2, pp102-122.
[11] H. Samet (1996), MARCO: MAp Retrieval by Content, IEEE Transactions on Pattern Analysis and Machine Intelligence , Vol. 18, No. 8.
[12] M. Cascia and E. Ardizzone (1996), JACOB: Just Another Content-Based Query System for Video Databases, Procs of ICASSP-96 , May 7-10, Atlanta.
[13] D. A. Pollen and S. F. Ronner (1983), Visual Cortical Neuronsas Localised Spatial Frequency Filters, IEEE Trans. SMC , Vol. 13, pp907-916.
[14] T. N. Tan (1992), Texture Feature Extraction via Visual Cortical Channel Modelling, Proc. ICPR, C607-C610.