Fine-Grained Image Retrieval: the Text/Sketch Input Dilemma

Jifei Song, Yi-zhe Song, Tony Xiang and Timothy Hospedales

Abstract

Fine-grained image retrieval (FGIR) enables a user to search for a photo of an object instance based on a mental picture. Depending on how the object is described by the user, two general approaches exist: sketch-based FGIR or text-based FGIR, each of which has its own pros and cons. However, no attempt has been made to systematically investigate how informative each of these two input modalities is, and more importantly whether they are complementary to each thus should be modelled jointly. In this work, for the ﬁrst time we introduce a multi-modal FGIR dataset with both sketches and sentences description provided as query modalities. A multi-modal quadruplet deep network is formulated to jointly model the sketch and text input modalities as well as the photo output modality.

Session

Posters

Files

Paper (PDF)

Supplementary (PDF)

DOI

10.5244/C.31.45
https://dx.doi.org/10.5244/C.31.45

Citation

Jifei Song, Yi-zhe Song, Tony Xiang and Timothy Hospedales. Fine-Grained Image Retrieval: the Text/Sketch Input Dilemma. In T.K. Kim, S. Zafeiriou, G. Brostow and K. Mikolajczyk, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 45.1-45.12. BMVA Press, September 2017.

Bibtex

            @inproceedings{BMVC2017_45,
                title={Fine-Grained Image Retrieval: the Text/Sketch Input Dilemma},
                author={Jifei Song, Yi-zhe Song, Tony Xiang and Timothy Hospedales},
                year={2017},
                month={September},
                pages={45.1-45.12},
                articleno={45},
                numpages={12},
                booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
                publisher={BMVA Press},
                editor={Tae-Kyun Kim, Stefanos Zafeiriou, Gabriel Brostow and Krystian Mikolajczyk},
                doi={10.5244/C.31.45},
                isbn={1-901725-60-X},
                url={https://dx.doi.org/10.5244/C.31.45}
            }