END-TO-END MULTI-VIEW LIPREADING
Stavros Petridis, Yujiang Wang, Zuwei Li and Maja Pantic
Abstract
Non-frontal lip views contain useful information which can be used to enhance the
performance of frontal view lipreading. However, the vast majority of recent lipreading
works, including the deep learning approaches which significantly outperform traditional
approaches, have focused on frontal mouth images. As a consequence, research on joint
learning of visual features and speech classification from multiple views is limited. In
this work, we present an end-to-end multi-view lipreading system based on Bidirectional
Long-Short Memory (BLSTM) networks. To the best of our knowledge, this is the first
model which simultaneously learns to extract features directly from the pixels and performs visual speech classification from multiple views and also achieves state-of-the-art
performance. The model consists of multiple identical streams, one for each view, which
extract features directly from different poses of mouth images. The temporal dynamics
in each stream/view are modelled by a BLSTM and the fusion of multiple streams/views
takes place via another BLSTM. An absolute average improvement of 3% and 3.8% over
the frontal view performance is reported on the OuluVS2 database when the best two
(frontal and profile) and three views (frontal, profile, 45◦) are combined, respectively.
The best three-view model results in a 10.5% absolute improvement over the current
multi-view state-of-the-art performance on OuluVS2, without using external databases
for training, achieving a maximum classification accuracy of 96.
Session
Posters
Files
Paper (PDF)
Supplementary (PDF)
DOI
10.5244/C.31.161
https://dx.doi.org/10.5244/C.31.161
Citation
Stavros Petridis, Yujiang Wang, Zuwei Li and Maja Pantic. END-TO-END MULTI-VIEW LIPREADING. In T.K. Kim, S. Zafeiriou, G. Brostow and K. Mikolajczyk, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 161.1-161.12. BMVA Press, September 2017.
Bibtex
@inproceedings{BMVC2017_161,
title={END-TO-END MULTI-VIEW LIPREADING},
author={Stavros Petridis, Yujiang Wang, Zuwei Li and Maja Pantic},
year={2017},
month={September},
pages={161.1-161.12},
articleno={161},
numpages={12},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Tae-Kyun Kim, Stefanos Zafeiriou, Gabriel Brostow and Krystian Mikolajczyk},
doi={10.5244/C.31.161},
isbn={1-901725-60-X},
url={https://dx.doi.org/10.5244/C.31.161}
}