Multiple Instance Visual-Semantic Embedding
Zhou Ren, Hailin Jin, Zhe Lin, Chen Fang and Alan Yuille
Abstract
Visual-semantic embedding models have been recently proposed and shown to be
effective for image classification and zero-shot learning. The key idea is that by directly
learning a mapping from images into a semantic label space, the algorithm can generalize to a large number of unseen labels. However, existing approaches are limited
to single-label embedding, handling images with multiple labels still remains an open
problem, mainly due to the complex underlying correspondence between an image and
its labels. In this work, we present a novel Multiple Instance Visual-Semantic Embedding
(MIVSE) model for multi-label images. Instead of embedding a whole image into the
semantic space, our model characterizes the subregion-to-label correspondence, which
discovers and maps semantically meaningful image subregions to the corresponding labels.
Session
Orals - Scene Understanding
Files
Paper (PDF)
Supplementary (PDF)
DOI
10.5244/C.31.89
https://dx.doi.org/10.5244/C.31.89
Citation
Zhou Ren, Hailin Jin, Zhe Lin, Chen Fang and Alan Yuille. Multiple Instance Visual-Semantic Embedding. In T.K. Kim, S. Zafeiriou, G. Brostow and K. Mikolajczyk, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 89.1-89.12. BMVA Press, September 2017.
Bibtex
@inproceedings{BMVC2017_89,
title={Multiple Instance Visual-Semantic Embedding},
author={Zhou Ren, Hailin Jin, Zhe Lin, Chen Fang and Alan Yuille},
year={2017},
month={September},
pages={89.1-89.12},
articleno={89},
numpages={12},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Tae-Kyun Kim, Stefanos Zafeiriou, Gabriel Brostow and Krystian Mikolajczyk},
doi={10.5244/C.31.89},
isbn={1-901725-60-X},
url={https://dx.doi.org/10.5244/C.31.89}
}