Learning Neural Network Architectures using Backpropagation

Suraj Srinivas and Venkatesh Babu

Abstract

Deep neural networks with millions of parameters are at the heart of many state of the art machine learning models today. However, recent works have shown that models with much smaller number of parameters can also perform just as well. In this work, we introduce the problem of architecture-learning, i.e; learning the architecture of a neural network along with weights. We start with a large neural network, and then learn which neurons to prune. To this end, we introduce a new trainable parameter called the Tri-State ReLU, which helps in pruning unnecessary neurons. We also propose a smooth regularizer which encourages the total number of neurons after elimination to be small. The resulting objective is differentiable and simple to optimize. We experimentally validate our method on both small and large networks, and show that it can learn models with considerably smaller number of parameters without affecting prediction accuracy.

Session

Posters 2

Files

Extended Abstract (PDF, 113K)

Paper (PDF, 215K)

Supplemental Materials (ZIP, 133K)

DOI

10.5244/C.30.104
https://dx.doi.org/10.5244/C.30.104

Citation

Suraj Srinivas and Venkatesh Babu. Learning Neural Network Architectures using Backpropagation. In Richard C. Wilson, Edwin R. Hancock and William A. P. Smith, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 104.1-104.11. BMVA Press, September 2016.

Bibtex

        @inproceedings{BMVC2016_104,
        	title={Learning Neural Network Architectures using Backpropagation},
        	author={Suraj Srinivas and Venkatesh Babu},
        	year={2016},
        	month={September},
        	pages={104.1-104.11},
        	articleno={104},
        	numpages={11},
        	booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
        	publisher={BMVA Press},
        	editor={Richard C. Wilson, Edwin R. Hancock and William A. P. Smith},
        	doi={10.5244/C.30.104},
        	isbn={1-901725-59-6},
        	url={https://dx.doi.org/10.5244/C.30.104}
        }