A summary of an interesting DNN research paper we are reading…
“Multi-Scale Context Aggregation by Dilated Convolutions”
Fisher Yu, Vladlen Koltun
https://arxiv.org/pdf/1511.07122.pdf
This 2016 paper introduced dilated convolutions. We expect it to get wider attention:
Dilated Convolutions summary:
- basically Conv filters with gaps between the filter elements
- adds a level of scale invariance
- broader view of the input, can capture multi-scale contextual information
- one doesn’t need to lose resolution or to analyzing rescaled images
Typically when we talk about strides in convolutions for DNN, it means striding the filter over the input and applying the filter to adjacent input elements. One skips some number of input elements are reapply the filter. In dilated filters consecutive filter elements are applied to non-adjacent input elements.
In dilated convolutions the filter is applied with defined gaps. For example if a 3×3 filter is applied to 2D image, a dilation of k=1 is just a normal convolution when k=2 one skips one pixel per input with the filter values spread out over the input. k=4 means skipping 3 pixels.
Figure 1 below (from the paper) shows dilated 2D convolution on 2D data.
The caption has confused some people on the web so we offer the following additional explanation: The Red dots show which 3×3 filter elements are applied to which input elements. (a) shows a normal 3×3 filter, the 9 elements of the filters are applied to consecutive elements of the input, (b) shows the same 3×3 filter but notice the same 9 elements are applied to different input points with a gap of 1 between them {dilation k=2}, (c) shows the same 3×3 filter but notice it is applied to different input points with a gap of 3 {dilation k=4}. The blue/green shading shows the receptive field captured with the next layer.
It gets us thinking about creating new ways to add scale and rotational variance to DNNs.