Deep Sign: Hybrid CNN-HMM for Continuous Sign Language Recognition

Oscar Koller, Sepehr Zargaran, Hermann Ney and Richard Bowden

Abstract

This paper introduces the end-to-end embedding of a CNN into a HMM, while interpreting the outputs of the CNN in a Bayesian fashion. The hybrid CNN-HMM combines the strong discriminative abilities of CNNs with the sequence modelling capabilities of HMMs. Most current approaches in the field of gesture and sign language recognition disregard the necessity of dealing with sequence data both for training and evaluation. With our presented end-to-end embedding we are able to improve over the state-of-the-art on three challenging benchmark continuous sign language recognition tasks by between 15% and 38% relative and up to 13.3% absolute.

Session

Face and Gesture

Files

PDF iconExtended Abstract (PDF, 3M)
PDF iconPaper (PDF, 3M)

DOI

10.5244/C.30.136
https://dx.doi.org/10.5244/C.30.136

Citation

Oscar Koller, Sepehr Zargaran, Hermann Ney and Richard Bowden. Deep Sign: Hybrid CNN-HMM for Continuous Sign Language Recognition. In Richard C. Wilson, Edwin R. Hancock and William A. P. Smith, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 136.1-136.12. BMVA Press, September 2016.

Bibtex

        @inproceedings{BMVC2016_136,
        	title={Deep Sign: Hybrid CNN-HMM for Continuous Sign Language Recognition},
        	author={Oscar Koller, Sepehr Zargaran, Hermann Ney and Richard Bowden},
        	year={2016},
        	month={September},
        	pages={136.1-136.12},
        	articleno={136},
        	numpages={12},
        	booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
        	publisher={BMVA Press},
        	editor={Richard C. Wilson, Edwin R. Hancock and William A. P. Smith},
        	doi={10.5244/C.30.136},
        	isbn={1-901725-59-6},
        	url={https://dx.doi.org/10.5244/C.30.136}
        }