If you visualize the first convolutional layer of each neural network, you can see that the first layer looks for oriented edges.
Why does visualizing the filter tell you what the filter is looking for?
This intuition comes from ‘Template Matching’ and ‘Inner Product’.
Imagine you have some template vector, and then imagine you compute a scalar output by taking inner product between your template vector some arbitrary piece of data. Then, the input which maximizes that activation under a norm constraint on the input is exactly when those two vectors match up.
So, in that sense, whenever you are taking an inner products, the thing causes an inner product to excite maximally is a copy of the thing you are taking an inner product with. So, that’s why we can actually visualize these weights and why that shows us what this first layer is looking for.