With an increasing demand for explaining black-box models, many different interpretability methods have developed. One of these methods is the Concept Activation Vector (CAV) which is a concept-based approach providing an interpretation of the internal state of neural networks in concepts easily understandable by humans. This method is used as a part of Testing with CAVs (TCAV), which gives quantitative explanations how much a concept, defined by human, was important for a prediction in a trained model, even if the concept was not part of the training. A concept can be any user-defined abstraction, such as forms, gender, colour, or race.
There are four main goals pursued by the authors:
1) Accessibility: Requires little to no ML expertise of user.
2) Customization: Adapts to any concept (e.g., gender) and is not limited to concepts considered during training.
3) Plug-in readiness: Works without any retraining or modification of the ML model
4) Global quantification: Can interpret entire classes or sets of examples with a single quantitative measure, and not just explain individual data inputs.
1. Concept Activation Vector (CAV)
This methodology uses directional derivates to quantify the degree to which a user-defined idea is crucial to a classification result. So, a CAV is simply the numerical representation that generalizes a concept in the activation space of a neural network layer. For calculating a CAV of a certain concept, we first need to prepare different datasets. First there must be a set of gathered examples that represent the concept that shall be tested. Second there must be a dataset of random images. The activation of those images representing the concept at a certain layer of the model are separated by a binary classifier from those activations generated by the random set. The Concept Activation Vector (CAV) is defined as the coefficient vector of the binary classifier separating random images from images representing the concept.
2. Testing with CAVs (TCAV)
This second methodology calculates a quantifiable explanation of each tested concept over the Neural Network. TCAV uses CAVs to compute sensitivity to specific concepts across entire classes of inputs. Contrary to CAVs where conceptual sensitivity of a single data point is calculated, TCAV produces global explanations indicating conceptual sensitivity of entire classes. TCAV calculated the ratio of inputs with positive conceptual sensitivities to the number of inputs for a class. For example, a TCAV equal to 0.7 indicates that 70% of predictions for a class are positively influenced by the defined concept. What remains unclear after this methodology is if the CAVs returned high sensitivity by chance because they are trained by user-selected concept and random datasets. With bad training datasets for the CAVs generated explanations can be misleading. Therefore, the methodology uses many CAVs instead of only training one single CAV by using different random datasets and keeping the concept dataset the same. Finally, for meaningful concepts the distribution has to be statistically different from random.
3. Advantages
First of all, TCAV enables human friendly ways to understand the internal state of ML models. Furthermore, it does not require users to have machine learning expertise since users are only required to collect training data for specific concepts. Compared to other XAI methods this approach generates global explanations and not just local explanations. A global explanation can deliver insight if a model overall behaves properly or not, which usually cannot be done by local explanations. This can be used to identify ill-learned concepts and to improve the model. An additional unique characteristic is its adaptability to users’ interests. Users can investigate any concept as long as there a data points defining the specific concept.
4. Disadvantages
Although you don`t have to be a machine learning expert, application of the method is not trivial and it`s doubtable if layman without any skills in machine learning are able to use the method. For using the method users also have to specify concepts of interest. This specification of concepts is only sensible if the user has knowledge about relevant concepts for a specific classification. It is also difficult to apply TCAV to concepts that are to abstract or general because then it is hard to define a relevant training dataset.