Lip Reading in Profile

Joon Son Son and Andrew Zisserman

Abstract

There has been a quantum leap in the performance of automated lip reading recently due to the application of neural network sequence models trained on a very large corpus of aligned text and face videos. However, this advance has only been demonstrated for frontal or near frontal faces, and so the question remains: can lips be read in profile to the same standard? The objective of this paper is to answer that question. We make three contributions: first, we obtain a new large aligned training corpus that contains profile faces, and select these using a face pose regressor network; second, we propose a curriculum learning procedure that is able to extend SyncNet [10] (a network to synchronize face movements and speech) progressively from frontal to profile faces; third, we demonstrate lip reading in profile for unseen videos.

Session

Posters

Files

PDF iconPaper (PDF)
PDF iconSupplementary (PDF)

DOI

10.5244/C.31.155
https://dx.doi.org/10.5244/C.31.155

Citation

Joon Son Son and Andrew Zisserman. Lip Reading in Profile. In T.K. Kim, S. Zafeiriou, G. Brostow and K. Mikolajczyk, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 155.1-155.11. BMVA Press, September 2017.

Bibtex

            @inproceedings{BMVC2017_155,
                title={Lip Reading in Profile},
                author={Joon Son Son and Andrew Zisserman},
                year={2017},
                month={September},
                pages={155.1-155.11},
                articleno={155},
                numpages={11},
                booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
                publisher={BMVA Press},
                editor={Tae-Kyun Kim, Stefanos Zafeiriou, Gabriel Brostow and Krystian Mikolajczyk},
                doi={10.5244/C.31.155},
                isbn={1-901725-60-X},
                url={https://dx.doi.org/10.5244/C.31.155}
            }