STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition

Wei Liu, Chaofeng Chen, Kwan-Yee K. Wong, Zhizhong Su and Junyu Han

Abstract

In this paper, we present a novel SpaTial Attention Residue Network (STAR-Net) for recognising scene texts. Our STAR-Net is equipped with a spatial attention mechanism which employs a spatial transformer to remove the distortions of texts in natural images. This allows the subsequent feature extractor to focus on the rectified text region without being sidetracked by the distortions. Our STAR-Net also exploits residue convolutional blocks to build a very deep feature extractor, which is essential to the successful extraction of discriminative text features for this fine grained recognition task. Combining the spatial attention mechanism with the residue convolutional blocks, our STAR-Net is the deepest end-to-end trainable neural network for scene text recognition. Experiments have been conducted on five public benchmark datasets. And the results show that our STAR-Net can achieve a performance comparable to state-of-the-art methods for scene texts with little distortions, and outperform these methods for scene texts with considerable distortions.

Session

Posters 1

Files

Extended Abstract (PDF, 171K)

Paper (PDF, 1M)

DOI

10.5244/C.30.43
https://dx.doi.org/10.5244/C.30.43

Citation

Wei Liu, Chaofeng Chen, Kwan-Yee K. Wong, Zhizhong Su and Junyu Han. STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition. In Richard C. Wilson, Edwin R. Hancock and William A. P. Smith, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 43.1-43.13. BMVA Press, September 2016.

Bibtex

        @inproceedings{BMVC2016_43,
        	title={STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition},
        	author={Wei Liu, Chaofeng Chen, Kwan-Yee K. Wong, Zhizhong Su and Junyu Han},
        	year={2016},
        	month={September},
        	pages={43.1-43.13},
        	articleno={43},
        	numpages={13},
        	booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
        	publisher={BMVA Press},
        	editor={Richard C. Wilson, Edwin R. Hancock and William A. P. Smith},
        	doi={10.5244/C.30.43},
        	isbn={1-901725-59-6},
        	url={https://dx.doi.org/10.5244/C.30.43}
        }