End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos
Shyamal Buch, Victor Escorcia, Bernard Ghanem and Juan Carlos Niebles
Abstract
In this work, we present a new intuitive, end-to-end approach for temporal action
detection in untrimmed videos. We introduce our new architecture for Single-Stream
Temporal Action Detection (SS-TAD), which effectively integrates joint action detection
with its semantic sub-tasks in a single unifying end-to-end framework. We develop a
method for training our deep recurrent architecture based on enforcing semantic constraints on intermediate modules that are gradually relaxed as learning progresses. We
find that such a dynamic learning scheme enables SS-TAD to achieve higher overall detection performance, with fewer training epochs.
Session
Orals - Action Recognition
Files
Paper (PDF)
DOI
10.5244/C.31.93
https://dx.doi.org/10.5244/C.31.93
Citation
Shyamal Buch, Victor Escorcia, Bernard Ghanem and Juan Carlos Niebles. End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos. In T.K. Kim, S. Zafeiriou, G. Brostow and K. Mikolajczyk, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 93.1-93.12. BMVA Press, September 2017.
Bibtex
@inproceedings{BMVC2017_93,
title={End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos},
author={Shyamal Buch, Victor Escorcia, Bernard Ghanem and Juan Carlos Niebles},
year={2017},
month={September},
pages={93.1-93.12},
articleno={93},
numpages={12},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Tae-Kyun Kim, Stefanos Zafeiriou, Gabriel Brostow and Krystian Mikolajczyk},
doi={10.5244/C.31.93},
isbn={1-901725-60-X},
url={https://dx.doi.org/10.5244/C.31.93}
}