Research Paper: DNN one feature at a time

A summary of an interesting DNN research paper we are reading…

“Building effective deep neural network architectures one feature at a time”
Martin Mundt, Tobias Weis, Kishore Konda, Visvanathan Ramesh
https://arxiv.org/pdf/1705.06778.pdf

This 2017 paper looks at adding on feature map at a time to a CNN

One Feature at a Time summary:

  • Instead of starting with an arbitrary number of feature maps at every level of a CNN, they start with one, then add one feature map at a time as determined by a metric.
  • The end state of their model challenges the long standing “rule of thumb” that one should monotonically increase feature maps at the higher levels(in read in bar graph shown above).  The final state of the feature at time (shown in green) has a very different profile.
  • The metric used is how much a feature has changed with respect to its initial state, i.e. features that have a high amount of structure change for initial state are more likely to play a more vital role.
  • Growing one at a time is comes at less computation cost than training too large of an architecture.
  • More effective CNN architectures should grow number of feature as one moves to the middle layer (getting to higher dimensional state) and then reduce number of features to push toward better aggregation which is needed at final layers.

The arbitrary numbers of feature maps has always bothered me. I find this paper to be quite inspiring.

We quickly tried crudely just having more features in the middle of some of our TensorFlow CNN models and it improved our test accuracy (reducing our cross-entropy loss). Now we need to try their full feature at a time!  …Any of this code posted at GitHub?

The only thing that bothers me is that it seems there must be a slightly more universal metric than amount of change from the initial state, because what if the initialization happened to be to a very good state.  I completely buy their argument that for the randomized initialization cases this is unlikely, but what if other techniques are used to initialized. I’m going to be spending some time thinking about alternatives.

All the best to the authors!

This is an definitely an interesting area of research that our team is thinking more about…

Leave a Reply

Your email address will not be published. Required fields are marked *