Cross-modal Retrieval via Memory Network

Ge Song and Xiaoyang Tan

Abstract

With the explosive growth of multimedia data on the Internet, cross-modal retrieval has attracted a great deal of attention in computer vision and multimedia community. However, this task is very challenging due to the heterogeneity gap between different modalities. Current approaches typically involve a common representation learning process that maps different data into a common space by linear or nonlinear functions. Yet most of them 1) only handle the dual-modal situation and generalize poorly to complex cases; 2) require example-level alignment of training data, which is often prohibitively expensive in practical applications; and 3) do not fully exploit prior knowledge about different modalities during the mapping process. In this paper, we address above issues by casting common representation learning as a Question Answer problem via a cross-modal memory neural network (CMMN). Speciﬁcally, raw features of all modalities are seemed as ’Question’, and extra discriminator is exploited to select high-quality ones as ’Statements’ for storage whereby common features are desired ’Answer’.

Session

Posters

Files

Paper (PDF)

DOI

10.5244/C.31.178
https://dx.doi.org/10.5244/C.31.178

Citation

Ge Song and Xiaoyang Tan. Cross-modal Retrieval via Memory Network. In T.K. Kim, S. Zafeiriou, G. Brostow and K. Mikolajczyk, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 178.1-178.12. BMVA Press, September 2017.

Bibtex

            @inproceedings{BMVC2017_178,
                title={Cross-modal Retrieval via Memory Network},
                author={Ge Song and Xiaoyang Tan},
                year={2017},
                month={September},
                pages={178.1-178.12},
                articleno={178},
                numpages={12},
                booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
                publisher={BMVA Press},
                editor={Tae-Kyun Kim, Stefanos Zafeiriou, Gabriel Brostow and Krystian Mikolajczyk},
                doi={10.5244/C.31.178},
                isbn={1-901725-60-X},
                url={https://dx.doi.org/10.5244/C.31.178}
            }