End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos

Shyamal Buch, Victor Escorcia, Bernard Ghanem and Juan Carlos Niebles

Abstract

In this work, we present a new intuitive, end-to-end approach for temporal action detection in untrimmed videos. We introduce our new architecture for Single-Stream Temporal Action Detection (SS-TAD), which effectively integrates joint action detection with its semantic sub-tasks in a single unifying end-to-end framework. We develop a method for training our deep recurrent architecture based on enforcing semantic constraints on intermediate modules that are gradually relaxed as learning progresses. We find that such a dynamic learning scheme enables SS-TAD to achieve higher overall detection performance, with fewer training epochs.

Session

Orals - Action Recognition

Files

PDF iconPaper (PDF)

DOI

10.5244/C.31.93
https://dx.doi.org/10.5244/C.31.93

Citation

Shyamal Buch, Victor Escorcia, Bernard Ghanem and Juan Carlos Niebles. End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos. In T.K. Kim, S. Zafeiriou, G. Brostow and K. Mikolajczyk, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 93.1-93.12. BMVA Press, September 2017.

Bibtex

            @inproceedings{BMVC2017_93,
                title={End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos},
                author={Shyamal Buch, Victor Escorcia, Bernard Ghanem and Juan Carlos Niebles},
                year={2017},
                month={September},
                pages={93.1-93.12},
                articleno={93},
                numpages={12},
                booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
                publisher={BMVA Press},
                editor={Tae-Kyun Kim, Stefanos Zafeiriou, Gabriel Brostow and Krystian Mikolajczyk},
                doi={10.5244/C.31.93},
                isbn={1-901725-60-X},
                url={https://dx.doi.org/10.5244/C.31.93}
            }