Oracle Performance for Visual Captioning

Li Yao, Nicolas Ballas, Kyunghyun Cho, John Smith and Yoshua Bengio

Abstract

The task of associating images and videos with a natural language description has attracted a great amount of attention recently. The state-of-the-art results on some of the standard datasets have been pushed into the regime where it has become more and more difficult to make significant improvements. Instead of proposing new models, this work investigates performances that an oracle can obtain. In order to disentangle the contribution from visual model from the language model, our oracle assumes that high-quality visual concept extractor is available and focuses only on the language part. We demonstrate the construction of such oracles on MS-COCO, YouTube2Text and LSMDC (a combination of M-VAD and MPII-MD). Surprisingly, despite the simplicity of the model and the training procedure, we show that current state-of-the-art models fall short when being compared with the learned oracle. Furthermore, it suggests the inability of current models in capturing important visual concepts in captioning tasks.

Session

Recognition, Optimisation and Performance Evaluation

Files

PDF iconExtended Abstract (PDF, 83K)
PDF iconPaper (PDF, 761K)

DOI

10.5244/C.30.141
https://dx.doi.org/10.5244/C.30.141

Citation

Li Yao, Nicolas Ballas, Kyunghyun Cho, John Smith and Yoshua Bengio. Oracle Performance for Visual Captioning. In Richard C. Wilson, Edwin R. Hancock and William A. P. Smith, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 141.1-141.13. BMVA Press, September 2016.

Bibtex

        @inproceedings{BMVC2016_141,
        	title={Oracle Performance for Visual Captioning},
        	author={Li Yao, Nicolas  Ballas, Kyunghyun Cho, John Smith and Yoshua Bengio},
        	year={2016},
        	month={September},
        	pages={141.1-141.13},
        	articleno={141},
        	numpages={13},
        	booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
        	publisher={BMVA Press},
        	editor={Richard C. Wilson, Edwin R. Hancock and William A. P. Smith},
        	doi={10.5244/C.30.141},
        	isbn={1-901725-59-6},
        	url={https://dx.doi.org/10.5244/C.30.141}
        }