Interpreting Machine Learning Models and Application of Homotopy Methods

Thumbnail Image

Publication or External Link





Neural networks have been criticized for their lack of easy interpretation, which undermines confidence in their use for important applications. We show that a trained neural network can be interpreted using flip points. A flip point is any point that lies on the boundary between two output classes: e.g. for a neural network with a binary yes/no output, a flip point is any input that generates equal scores for yes" and no". The flip point closest to a given input is of particular importance, and this point is the solution to a well-posed optimization problem. We show that computing closest flip points allows us, for example, to systematically investigate the decision boundaries of trained networks, to interpret and audit them with respect to individual inputs and entire datasets, and to find vulnerability against adversarial attacks. We demonstrate that flip points can help identify mistakes made by a model, improve its accuracy, and reveal the most influential features for classifications. We also show that some common assumptions about the decision boundaries of neural networks can be unreliable. Additionally, we present methods for designing the structure of feed-forward networks using matrix conditioning. At the end, we investigate an unsupervised learning method, the Gaussian graphical model, and provide mathematical tools for interpretation.