Next: References Up: Comparing image resamplers via Previous: Results

Discussion

For the bridge and face image, Methods  5 (B-spline interpolation) and 8 (edge-enhanced zooming) were preferred by the human panel, and had low visual difference scores. Furthermore the panels' choices would not have been predicted by signal-to-error ratio. For the text image the visual difference scores did not predict the panels preference. This may be because human observers are judging readability not image quality . We also found a very simple method based on truncating the DCT coefficients [ 7 ] was quite effective, so for applications where computational complexity is important this may be a good choice.

It is of interest to formally compare the visual difference scores to the human panel. We do this via the robust Spearman rank correlation coefficient, , [ 14 ]. Each observer produced a separate ranking so we have computed between each observer and each of the error measures. Table  4 gives the mean of the rank correlation coefficient over all 13 observers.

   
Table 4: Mean Spearman rank correlation coefficients for each image and error metric.

The rank correlation coefficients have a zero significance statistic that is distributed according to a Student's t distribution with 6 degrees of freedom. We find that a correlation of 0.28 is roughly at the significance level. What this means is that all the correlations in Table  4 , apart from those marked with a *, are significantly different from zero.

The visual difference score appears to predict human performance for a variety of grey scale images (in this paper we show only two of the images we have tried), but does not work well for images where the observers may use additional interpretation to assess image quality. The text image presented here is an example of this, but we have also found some face images have produced a disparity between the visual difference score and human scores.

A further problem when testing resampling methods is the provenance of images. All of the images shown here were generated using known methods, but we have noted inconsistent results when using well known test images. We suspect that several of these images have been previously resampled.

Currently we are repeating the human tests using, what we think is, an improved method in which observers are presented with image pairs for a short duration and are asked to select the better of the two. A ranking is produced by sorting these pairwise comparisons.



Next: References Up: Comparing image resamplers via Previous: Results

Stephen King ESE PG
Thu Jul 10 15:27:29 BST 1997