1. Feature Visualization

One way to make progress in rendering deep artificial neural networks intelligible is to figure out what is going on inside their hidden layers. ‘Hidden’ in this context just means located between the input and output layers1. Drawing inspiration from neurophysiological techniques applied to the brains of biological organisms, one approach to this problem involves narrowing one’s focus to single neurons, channels, or layers2. By paying close attention to how the activity of these individual model components changes in response to varying the inputs to a neural network, in silico neurophysiologists hope to draw useful inferences about the functional roles that these components play.

The neurons in modern artificial neural networks are not typically Boolean, but instead produce graded responses to inputs as a result of their activation functions [1]. This provides an avenue of attack for researchers hoping to uncover the roles they play. Leveraging optimization techniques, the method of feature visualization involves iteratively tweaking an image (typically initialised as random noise) until it maximises a neuron’s response. The aim is to discover what, if anything, that neuron is specialised for.

The way this works is that, after the response of the neuron to an initial image is observed, the values of the image’s pixels are nudged in whichever direction increases the activation of the neuron. At each step, a pixel might get a little brighter, or a little greener; whatever makes the target neuron more active than it was at the previous step. Repeating this process many times for each pixel can ultimately lead to the generation of meaningful, sometimes rather psychedelic reflections of what a neuron has learned to respond to.

The images that are produced through feature visualization can be striking (see [2] for many examples). Lower layers and their neurons, bearing resemblance to the biological visual system of mammals, seem to be tuned to detect edges, lines, or textures3. As we move further up the processing hierarchy those simple features are composed first into shapes, then simple objects, and finally exemplars of the classes the network is capable of detecting in its output layer (such as ‘balls’, ‘buildings’, or ‘Batman’). These visualizations do not typically have the photographic sharpness of natural images, but instead have a dream-like, hallucinatory quality4.

Visualised features provide compelling evidence for an intuitive theoretical perspective on why deep hierarchical convolutional neural networks perform well at image classification. The idea is that the multi-layer architecture of deep neural networks felicitously matches the hierarchical, compositional structure of natural images5. This suggests an intelligible answer to questions such as ‘how does a deep neural network recognize Batman in an image?’ That answer being: ‘Through recursive composition of initially simple, human interpretable features!’

2. Two Kinds of Iterative Tweaking

Above we used the phrase ‘iterative tweaking’ to describe the way feature visualization works. However, ‘iterative tweaking’ actually applies at two distinct levels of the image generation process6. Straightforwardly, the gradient-based methods used to optimize images rely on iteratively tweaking the values of pixels until some objective is reached. This is how a randomly initialised image can, after several thousand steps, come to reflect the features to which a given neuron is sensitive, be they textures, shapes, or dogs. However, there is also a meta-level of iterative tweaking going in the methodology of feature visualization.

Just as the images which visualize features are iteratively tweaked with respect to some objective function, so the objective functions which guide that process, along with other relevant parameters, are iteratively tweaked by the researchers implementing the algorithms. Feature visualization research does not have a culture of pre-registration, specifying which procedures will be applied to produce visualizations. It is, instead, an exploratory exercise in which researchers draw on a host of theoretical and intuitive ideas to edge closer to a goal, with human evaluation of the visualizations providing the feedback signal. But what is the objective the researchers are trying to optimize? And how should this alter the way we interpret the images published in feature visualization research?

Feature visualization procedures, such as those in the influential work of Chris Olah and colleagues at Distill [3], are implicitly optimized for human intelligibility7. That is, tweaks made to the image producing procedure are kept or built upon further only if the visualizations produced are more intelligible to researchers than those produced by the pre-tweak method. The way iterative tweaking has been applied to optimize visualizations for human intelligibility is clearly laid out in the publications of Olah and his collaborators. For instance:

If you want to get useful visualizations, you need to impose a more natural structure using some kind of prior, regularizer, or constraint.” [3]

Much of the iterative tweaking performed by feature visualization researchers aims to rid the images of high-frequency patterns8. These are regions of images in which a pattern, such as stripes, is present at a small spatial scale, often imperceptible to humans. For some reason or another, the neurons of trained neural networks are often very fond of these high-frequency patterns. This means optimising to maximise a neuron’s activation in a straightforward way often leads to images consisting wholly of such patterns. This gives us a clear-cut example of how optimizing for intelligibility involves biasing feature visualizations away from what ‘naturally’ maximises unit activations. While it may turn out to be true that such high-frequency patterns are not present in natural images, and therefore cannot be employed by classifiers in real-world settings, this has not been conclusively proven and should not be assumed [7].

In the following sections we will discuss what this means for the interpretation of feature visualization research. For now, just notice that the word ‘useful’ is employed here in an instrumental sense. A visualization is only useful (or useless) relative to some goal. In this case, that goal is human intelligibility.

3. Iterative Tweaking, Confirmation, and Exploration

All scientific research involves a degree of iterative tweaking. Models are constructed, their fit to data is assessed, and modifications are made on the basis of those assessments. Methods for measurement are developed, their agreement with pre-existing methods is tested, and improvements are made where possible. Why is the instantiation of iterative tweaking in feature visualization worthy of any special attention?

In section 2 I described the iterative tweaking of feature visualization methods as exploratory. That description calls to mind a distinction that is central to good practice in psychology (and other quantitative experimental sciences). In particular, the distinction between confirmatory and exploratory analyses. Pre-specifying the analyses to be performed on collected data is absolutely essential for testing hypotheses (confirmatory analysis). Otherwise, the degrees of freedom available to researchers in how they perform analyses makes multiple comparisons problems very difficult to avoid [4]. However, running additional analyses on that data is perfectly acceptable, so long as the exploratory nature of that analysis is made clear. The reason it is necessary to explicate which analyses are exploratory is that it affects how results should be interpreted; exploratory analyses do not constitute rigorous statistical tests of hypotheses and should not be interpreted as doing so. This distinction has become particularly relevant in the last decade as the replication crisis has highlighted the questionable methodological practices plaguing much published research [5]. In the terminology of this article, these questionable practices are something like iterative tweaking with respect to the goal of finding significant, and therefore publishable, effects9.

Returning to the domain of feature visualization10, what is the relevance of the distinction between confirmation and exploration? First, let’s make the analogy clear. The ‘data’ of scientific experiments correspond to the properties of the artificial neural network being investigated in feature visualization, the statistical analyses correspond to the procedure of producing visualizations, the results of statistical analyses correspond to the images produced, and the statistical significance threshold corresponds to the subjective sense of intelligibility which researchers optimize their approaches to maximise.

With the analogies between the circumstances clear, we can move on to the implications. If we want to test hypotheses, such as, ‘do the units of artificial neural networks learn human-intelligible features?’, or, ‘are the hidden layers of an artificial neural network interpretable?’, then we need to specify the analyses we will perform ahead of time. In other words, the iterative tweaking methodology used in feature visualization is incompatible with, and should not be taken to implement, the testing of these hypotheses.

4. What Should We Learn from Visualizations Produced by Iterative Tweaking?

Feature visualizations are the outcome of exploratory research optimized for human intelligibility, not the outcome of pre-registered confirmatory analysis. This means we are not justified in treating these images as verifying our intuitive hypotheses about what particular units in a neural network are doing. Further, we must be careful not to infer the intelligibility of neural networks directly from the intelligibility of feature visualizations produced in this manner.

Despite the foregoing considerations, there are many important lessons to learn from feature visualization research. The baby should not be thrown out with the bathwater! Exploratory research is a necessary first step in investigating the complex internal structure of trained neural networks, particularly when clear hypotheses cannot be easily formulated. Further, with regards to the goal of making neural networks intelligible, this avenue of research provides a number of insights which may be implemented directly at the design stage, essentially forcing neural networks only to pick up on human-interpretable patterns.

The way individual units contribute to the function learned by a deep neural network is complicated and multifaceted; it may not be reasonable to expect to capture it in a single image. What feature visualization optimized for human intelligibility does is shine a light on the most intelligible facet of that contribution. So long as this is kept in mind when interpreting research of this kind, it can still have a positive impact on how we think about the internal structure and organisation of deep neural networks.



1 The qualifier ‘deep’ is generally used for any neural networks with at least one hidden layer.

2 In what follows, I will speak mainly of individual neurons, but the points are general.

3 Note that the hierarchical organisation of the mammalian visual system consists of a series of connected neuroanatomical regions. While the mammalian cortex does have a layered structure, those layers are not what the layers of an artificial neural network are typically designed to approximate.

4 Indeed, the original method to produce images of this nature was called DeepDream.

5 The convolutional structure of these networks is also a critical part of this fit, allowing lower layers to detect things in local regions of an image, while higher ones utilize this information to detect what an image depicts as a whole.

6 ‘Iterative tweaking’ is the term used in [3].

7 I use the word ‘implicitly’ here to contrast this form of optimization with the formal methods of machine learning, not to suggest that the researchers in question are deceptive about the nature of their practices; they aren’t.

8 These high-frequency patterns are often described as ‘noise’, but I use the term ‘pattern’ to avoid the implication that they do nothing more than obscure the ‘real’ function of the units under consideration.

9 For example, subtly altering the exclusion criteria for participants in a study may allow researchers to tweak their way below a desired statistical threshold, such as a p-value lower than 0.05.

10 See [6] for discussion of the problem of treating exploratory analysis as confirmatory in the domain of visualization-based analytics.


[1] J. Lederer, ‘Activation Functions in Artificial Neural Networks: A Systematic Overview’, ArXiv, 2021.

[2] C. Olah, A. Mordvintsev, and L. Schubert, ‘Feature Visualization’, Distill, vol. 2, no. 11, p. e7, Nov. 2017, doi: 10.23915/distill.00007.

[3] C. Olah, A. Mordvintsev, and L. Schubert, ‘Feature Visualization’, Distill, vol. 2, no. 11, p. e7, Nov. 2017, doi: 10.23915/distill.00007.

[4] A. Gelman and E. Loken, ‘The garden of forking paths : Why multiple comparisons can be a problem , even when there is no “ fishing expedition ” or “ p-hacking ” and the research hypothesis was posited ahead of time ∗’, 2019. https://www.semanticscholar.org/paper/The-garden-of-forking-paths-%3A-Why-multiple-can-be-a-Gelman-Loken/b63e25900013605c16f4ad74c636cfbd8e9a3e8e (accessed Jul. 13, 2021).

[5] J. P. A. Ioannidis, ‘Why Most Published Research Findings Are False’, PLOS Medicine, vol. 2, no. 8, p. e124, Aug. 2005, doi: 10.1371/journal.pmed.0020124.

[6] X. Pu and M. Kay, ‘The Garden of Forking Paths in Visualization: A Design Space for Reliable Exploratory Visual Analytics : Position Paper’, in 2018 IEEE Evaluation and Beyond – Methodological Approaches for Visualization (BELIV), Oct. 2018, pp. 37–45. doi: 10.1109/BELIV.2018.8634103.

[7] A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry, ‘Adversarial Examples Are Not Bugs, They Are Features’, arXiv:1905.02175 [cs, stat], Aug. 2019, Accessed: Jul. 13, 2021. [Online]. Available: http://arxiv.org/abs/1905.02175