Lip Reading in Profile
Joon Son Son and Andrew Zisserman
Abstract
There has been a quantum leap in the performance of automated lip reading recently
due to the application of neural network sequence models trained on a very large corpus
of aligned text and face videos. However, this advance has only been demonstrated for
frontal or near frontal faces, and so the question remains: can lips be read in profile to
the same standard?
The objective of this paper is to answer that question. We make three contributions:
first, we obtain a new large aligned training corpus that contains profile faces, and select
these using a face pose regressor network; second, we propose a curriculum learning
procedure that is able to extend SyncNet [10] (a network to synchronize face movements
and speech) progressively from frontal to profile faces; third, we demonstrate lip reading
in profile for unseen videos.
Session
Posters
Files
Paper (PDF)
Supplementary (PDF)
DOI
10.5244/C.31.155
https://dx.doi.org/10.5244/C.31.155
Citation
Joon Son Son and Andrew Zisserman. Lip Reading in Profile. In T.K. Kim, S. Zafeiriou, G. Brostow and K. Mikolajczyk, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 155.1-155.11. BMVA Press, September 2017.
Bibtex
@inproceedings{BMVC2017_155,
title={Lip Reading in Profile},
author={Joon Son Son and Andrew Zisserman},
year={2017},
month={September},
pages={155.1-155.11},
articleno={155},
numpages={11},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Tae-Kyun Kim, Stefanos Zafeiriou, Gabriel Brostow and Krystian Mikolajczyk},
doi={10.5244/C.31.155},
isbn={1-901725-60-X},
url={https://dx.doi.org/10.5244/C.31.155}
}