A Convolutional Temporal Encoder for Video Caption Generation
Qingle Huang and Zicheng Liao
Abstract
We propose a convolutional temporal encoding network for video sequence embedding and caption generation. The mainstream video captioning work is based on recurrent encoder of various forms (e.g. LSTMs and hierarchical encoders). In this work, a
multi-layer convolutional neural network encoder is proposed. At the core of this encoder is a gated linear unit (GLU) that performs a linear convolutional transformation
of input with a nonlinear gating, which has demonstrated superior performance in natural language modeling. Our model is built on top of this unit for video encoding and
integrates several up-to-date tricks including batch normalization, skip connection and
soft attention.
Session
Posters
Files
Paper (PDF)
DOI
10.5244/C.31.126
https://dx.doi.org/10.5244/C.31.126
Citation
Qingle Huang and Zicheng Liao. A Convolutional Temporal Encoder for Video Caption Generation. In T.K. Kim, S. Zafeiriou, G. Brostow and K. Mikolajczyk, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 126.1-126.12. BMVA Press, September 2017.
Bibtex
@inproceedings{BMVC2017_126,
title={A Convolutional Temporal Encoder for Video Caption Generation},
author={Qingle Huang and Zicheng Liao},
year={2017},
month={September},
pages={126.1-126.12},
articleno={126},
numpages={12},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Tae-Kyun Kim, Stefanos Zafeiriou, Gabriel Brostow and Krystian Mikolajczyk},
doi={10.5244/C.31.126},
isbn={1-901725-60-X},
url={https://dx.doi.org/10.5244/C.31.126}
}