Improved Bilinear Pooling with CNNs
Tsung-Yu Lin and Subhransu Maji
Abstract
Bilinear pooling of Convolutional Neural Network (CNN) features [22, 23], and their
compact variants [10], have been shown to be effective at fine-grained recognition, scene
categorization, texture recognition, and visual question-answering tasks among others.
The resulting representation captures second-order statistics of convolutional features in
a translationally invariant manner. In this paper we investigate various ways of normalizing these statistics to improve their representation power. In particular we find that
the matrix square-root normalization offers significant improvements and outperforms
alternative schemes such as the matrix logarithm normalization when combined with elementwise square-root and (cid:96)2 normalization. This improves the accuracy by 2-3% on a
range of fine-grained recognition datasets leading to a new state of the art.
We also investigate how the accuracy of matrix function computations effect network
training and evaluation.
In particular we compare against a technique for estimating
matrix square-root gradients via solving a Lyapunov equation that is more numerically
accurate than computing gradients via a Singular Value Decomposition (SVD). We find
that while SVD gradients are numerically inaccurate the overall effect on the final accuracy is negligible once boundary cases are handled carefully. We present an alternative
scheme for computing gradients that is faster and yet it offers improvements over the
baseline model.
Session
Posters
Files
Paper (PDF)
DOI
10.5244/C.31.117
https://dx.doi.org/10.5244/C.31.117
Citation
Tsung-Yu Lin and Subhransu Maji. Improved Bilinear Pooling with CNNs. In T.K. Kim, S. Zafeiriou, G. Brostow and K. Mikolajczyk, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 117.1-117.12. BMVA Press, September 2017.
Bibtex
@inproceedings{BMVC2017_117,
title={Improved Bilinear Pooling with CNNs},
author={Tsung-Yu Lin and Subhransu Maji},
year={2017},
month={September},
pages={117.1-117.12},
articleno={117},
numpages={12},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Tae-Kyun Kim, Stefanos Zafeiriou, Gabriel Brostow and Krystian Mikolajczyk},
doi={10.5244/C.31.117},
isbn={1-901725-60-X},
url={https://dx.doi.org/10.5244/C.31.117}
}