LIME (which is an acronym for Local Interpretable Model-Agnostic Explanations) is one of the most cited techniques in the Explainable Artificial Intelligence (XAI) debate (according to Google Scholar, it was cited around 5.5k times). Indeed, the whole (renewed) XAI debate seems to start off roughly around the time the paper describing LIME (“Why should I trust you? – Explaining the decisions of any classifier”, Riberio et al. 2016) was first published in 2016. In their paper, the inventors of LIME highlight the importance of trusting black-box models and the need for explanations in order to do so. Even until today, trust is one of the most mentioned desiderata that XAI is supposed to deliver. As LIME has had such a huge impact in the XAI debates, it is beneficial to understand the rough workings of this approach. In this line of thought, this article aims to shed some light on LIME and illuminate its functioning for a broad public.
Roughly speaking, the idea of LIME is to use Machine Learning (ML) to explain ML (or opaque algorithms in general). In the debates surrounding XAI, there is the prominent distinction of ante-hoc and post-hoc explainability. Roughly, ante-hoc explainable systems are simple enough for humans to understand their inner workings directly, without the need for additional methods to extract information. Post-hoc explainable systems, on the other hand, need such additional methods. LIME is designed as a post-hoc approach producing an ante-hoc explainable system that locally approximates the behavior of the original black-box system. In this regard, LIME aims to explain single predictions based on any kind of input. In the following, we will first restrict our discussion of LIME to image classification and, subsequently, illuminate how other kinds of data can be used. We will also illustrate the problems associated with this.
The first step for LIME in image classification is to find interpretable superpixels (i.e., clusters of pixels that ‘have something in common’, e.g., they are connected and are of similar color) in the originally classified picture. This segmentation is a common process in computer graphics and there are many algorithms readily available to do it. Some parameters that play a role in such algorithms are, for instance, maximal size of segment and edge-detector sensitivity (how fast/abrupt is the color change that determines an edge etc.). In the next step, the original image is copied many times (roughly 100 or more). These copies are not direct copies, as some randomly chosen superpixels are “disabled”. Disabling here means to gray or black them out (Ribeiro et al. do not specify a certain color). In other words, after the second step we arrive at a certain number of images that differ based on the disabled superpixels. The third step consists in classifying each of these images with the original classifier and comparing this prediction with the original one. So, if we have an original prediction of “toucan” with a confidence of X, each modified image is compared based on the confidence Y it ascribes to this image being a toucan. Intuitively, when the disabled superpixels are those relevant to a classification of “toucan“, such as its beak, we expect the confidence Y to decrease. The differences X-Y are, then, normalized and used to assign weights to the randomly disabled images. Finally, these weights are used to train another, ante-hoc explainable classifier (e.g., a linear model). In our image example, the largest coefficients in such a model correspond to those superpixels that contributed the most to the “toucan” classification. Overall, we can see that, for example, the toucan’s beak was most conducive to the prediction (and judge that the model works reasonably well, at least for toucan predictions similar to the one we tested).
LIME as such can just explain single predictions – in our example the prediction that a certain image depicts a toucan. Ribeiro et al. also suggest how one could use LIME to explain an opaque model as such (and provide what is often called a global explanation). Their hypothesis is that such a global explanation can be provided by presenting explanations of representative single predictions (i.e., predictions that showcase as many features as possible that are conducive to the predicted class). To this end, they assume that a user has a certain budget of time B that they express in terms of explanations of single predictions. So, a user with a budget B = 10 will receive one explanation for each of 10 different, representative predictions. Ribeiro et al. also suggest a way to pick representative instances: sub-modular picking. Basically, sub-modular picking tries to pick instances that, taken together, cover as many possibly relevant features as possible. Essentially, the idea is simply to find the B examples that explain as many features and as globally important features as possible. So a new example is added to the list of representative examples if it (a) explains features that have not been explained before, and (b) these features are as globally relevant as possible across all examples. In user studies, they show that their approach of sub-modular picking fares better than just randomly picking instances.
Nevertheless, LIME is not without problems. In our example, we have limited ourselves to predictions of images. But what about other input data? What constitutes superpixels in other cases? For texts, LIME arguably also works quite well. In this case one could take a ‘bag of words’ (a technical name for unordered sets of words) as superpixel-equivalent. In other cases, this might not be as trivial. Picture a system that takes patient data and predicts, for instance, which illness the patient could have. What should be taken here as superpixel-equivalent? Even more interesting, what does it mean in such cases to “disable” certain values? For images, we can gray out the superpixels, for texts we may replace all letters in a word by one specific letter, but we cannot set blood-pressure to, say, zero. This would significantly alter the prediction. Plausibly, we have to find default values. But which values work? Different age groups have different standard blood pressures. All of this makes it plausible that the model resulting from LIME may not be faithful to the original model. This problem might also occur when we take a step back and have a look at our image-classification: using black to disable the superpixels already has non-trivial implications, for the body of the toucan is black. With this, LIME may attribute an inappropriately low importance to the toucan’s body, such that our result (the beak being the most conducive feature) does not represent the original model’s reasoning.
Overall, the idea behind LIME is as simple as it is ingenious: explain a model by identifying the input data that is most conducive to a certain prediction. However, the execution of LIME rests on non-trivial background assumptions that may lead to the explanation being unfaithful to the original model. Furthermore, the idea that a whole model can be explained by picking a limited number of instances is even more questionable. Overall, LIME may mark the beginning of renewed interest in XAI, but it clearly does not mark its end.