Learning Neural Network Architectures using Backpropagation
Suraj Srinivas and Venkatesh Babu
Abstract
Deep neural networks with millions of parameters are at the heart of many state of the art machine learning models today. However, recent works have shown that models with much smaller number of parameters can also perform just as well. In this work, we introduce the problem of architecture-learning, i.e; learning the architecture of a neural network along with weights. We start with a large neural network, and then learn which neurons to prune. To this end, we introduce a new trainable parameter called the Tri-State ReLU, which helps in pruning unnecessary neurons. We also propose a smooth regularizer which encourages the total number of neurons after elimination to be small. The resulting objective is differentiable and simple to optimize. We experimentally validate our method on both small and large networks, and show that it can learn models with considerably smaller number of parameters without affecting prediction accuracy.
Session
Posters 2
Files
Extended Abstract (PDF, 113K)
Paper (PDF, 215K)
Supplemental Materials (ZIP, 133K) DOI
10.5244/C.30.104
https://dx.doi.org/10.5244/C.30.104
Citation
Suraj Srinivas and Venkatesh Babu. Learning Neural Network Architectures using Backpropagation. In Richard C. Wilson, Edwin R. Hancock and William A. P. Smith, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 104.1-104.11. BMVA Press, September 2016.
Bibtex
@inproceedings{BMVC2016_104,
title={Learning Neural Network Architectures using Backpropagation},
author={Suraj Srinivas and Venkatesh Babu},
year={2016},
month={September},
pages={104.1-104.11},
articleno={104},
numpages={11},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Richard C. Wilson, Edwin R. Hancock and William A. P. Smith},
doi={10.5244/C.30.104},
isbn={1-901725-59-6},
url={https://dx.doi.org/10.5244/C.30.104}
}