Author: Dániel Darabos
In Graph Convolutional Networks and Explanations, I have introduced our neural network model, its applications, the challenge of its “black box” nature, the tools we can use to better understand it, and the datasets we can use to validate those tools. I have presented the first tool in Feature Visualization on a Graph Convolutional Network. In this last article in the series I will explain the second tool, attribution, and look at its application on synthetic examples and real-world demographic prediction problems.
Feature visualization gave us examples of concepts that the neural network has learned. We can use it to better understand the network in general. But when we have a specific input and the network makes a classification, we can ask more specific questions. Attribution is the question of which parts of the input have led to that classification.
Chris Olah, et al. have fantastic illustrations and examples of attribution for image classifiers in The Building Blocks of Interpretability. In one example they show an input image and a map that shows which parts of the input are associated with the activations of groups of neurons:
Green is associated with the “Labrador Retriever” and “Beagle” classes, blue is just “Labrador Retriever”, while yellow is more related to “tiger cat” and, as it happens, “lynx”.
We applied the same idea to the less fluffy domain of human social networks.
Attribution by integrated gradients is based on interpolating from a neutral baseline input to the actual input. For images the baseline input would be a fully black image. For a GCN we interpolate from all-zero vertex feature vectors to the actual vertex feature vectors. Gradients are propagated back from the output class to the input elements at each step of the interpolation. As we go, we add up all these gradient values to get the attribution score for each part of the input.
First, we wanted to validate this approach and our implementation via the synthetic datasets introduced in Graph Convolutional Networks and Explanations.
In the “three rockstar friends” case we are looking at an example vertex in the social network (circled in red) and its immediate neighborhood. This vertex satisfies the rule of having at least three rockstar neighbors (golden pegs). (It has six of them. Plus the vertex itself is a rockstar.) The neural network correctly predicted the positive label. What part of the input caused this prediction?
If we calculate the attribution scores for the positive label class and depict them as vertex sizes, we can see that all the neighboring rockstars get high attribution:
We see exactly what we were hoping for. Immediate rockstar neighbors get high attribution. Further rockstars (not depicted), non-rockstar neighbors, and the vertex itself get zero attribution. (Zero attribution is depicted as the minimal vertex size. Otherwise we would not see the vertices.)
We get similar results for the “two steps from a rockstar” dataset. We are looking at a vertex (circled in red) that is exactly two steps away from the nearest rockstar. (Actually three rockstars in this example.) The neural network predicts a positive label. What caused this prediction? The three rockstars:
The pictures only show the relevant subgraphs, but in practice these datasets have more than 25,000 vertices. In both cases the algorithm has highlighted all the right vertices and none of the wrong vertices.
We of course get more interesting results when we look at attribution scores on the real dataset of the gender and age prediction problems.
Many examples of both problems look like this one:
The network correctly predicts “female” (purple) for a vertex (circled in red) where the majority of neighbors are also female.
While attribution can give us more information, we are not really interested. We intuitively understand these predictions anyway. We want to know how the correct prediction is produced in more challenging cases. One, where the neural network’s correct prediction goes against the majority label of the neighborhood.
Here is an example where the network correctly predicts “male” (blue) for the circled vertex while the majority of his neighbors are female:
It is hard to see the female majority in the image. There are fewer male neighbors (believe me!), but due to the higher attribution values they are rendered taller. Meanwhile, the low-attribution females hide below the dense network of edges.
It is easy to build a theory for what happened here. Let’s call the circled vertex “Jimmy”. The network has identified a number of Jimmy’s neighbors, most prominently the four males in the middle left, as being most relevant to his gender. Their effect on the prediction dominates the effect from Jimmy’s female neighbors.
This immediately bring us to the follow-up question of why these four guys?
We could interrogate the neural network further, but we wanted to see if they are outliers in any classical graph metrics. We found that these four friends have a particularly low clustering coefficient within the subgraph that is the immediate neighborhood of Jimmy. In other words, they know many of Jimmy’s friends who don’t know each other. They may know him from work and soccer and school. That is a plausible explanation for why Jimmy’s gender would more likely match theirs than that of friends that are less universally embedded in his life.
We have also investigated a curious case in the age prediction problem:
Here the neural network predicts the 29–69 age bucket (purple) for the circled vertex, even though the majority of their neighbors are in the 10–20 age bucket (brown). And the prediction is correct.
This time it does not look like the prediction is due to the overwhelming relevance of a few fellow purple vertices. In fact the picture is dominated by the brown vertices. The prediction here is not in spite of the majority of 10–20 neighbors, but because of them.
A plausible theory is that the tightly connected group of teenagers are a class of high school students and the circled vertex is a teacher. The dataset is entirely anonymized and lacks further attributes, so such a theory cannot be verified. But it is almost certain that such groups would be well represented in the training dataset. It is likely that the model could learn to recognize such patterns, and then we would see attributions not unlike this one.
The research so far has encouraged us to look at more applications of attribution:
This is the last entry in our blog post series about Lynx Analytics at Spark+AI Summit 2018. Find us on Twitter (@LynxAnalytics) or subscribe to our newsletter below if you want to get in touch or keep tabs on our research!