Fast optimization methods for machine learning, and game-theoretic models of cultural evolution

Thumbnail Image


Publication or External Link





This thesis has two parts. In the first part, we explore fast stochastic optimization methods for machine learning.

Mathematical optimization is a backbone of modern machine learning. Most machine learning problems require optimizing some objective function that measures how well a model matches a data set, with the intention of drawing patterns and making decisions on new unseen data. The success of optimization algorithms in solving these problems is critical to the success of machine learning, and has enabled the research community to explore more complex machine learning problems that require bigger models and larger datasets.

Stochastic gradient descent (SGD) has become the standard optimization routine in machine learning, and in particular in deep neural networks, due to its impressive performance across a wide variety of tasks and models. SGD, however, can often be slow for neural networks with many layers and typically requires careful user oversight for setting hyperparameters properly. While innovations such as batch normalization and skip connections have helped alleviate some of these issues, why such innovations are required eludes full understanding, and it is worthwhile to gain deeper theoretical insights into these problems and to consider more advanced optimization methods specifically tailored towards training large complex models.

In this part of the thesis, we review and analyze some of the recent progress made in this direction, and develop new optimization algorithms that are provably fast, significantly easier to train, and require less user oversight. Then, we will discuss the theory of quantized networks, which use low-precision weights to compress and accelerate neural networks, and when/why they are trainable. Finally, we discuss some recent results on how the convergence of SGD is affected by the architecture of neural nets, and we show using theoretical analysis that wide networks train faster than narrow nets, and deeper networks train slower than shallow nets - an effect often observed in practice.

In the second part of the thesis, we study the evolution of cultural norms in human societies using game-theoretic models, drawing from research in cross-cultural psychology. Understanding human behavior and modeling how cultural norms evolve in different human societies is vital for designing policies and avoiding conflicts around the world. In this part, we explore ways to use computational game-theoretic techniques, and in particular evolutionary game-theoretic (EGT) models, to gain insight into why different human societies have different norms and behaviors.

We first describe an evolutionary game-theoretic model to study how norms change in a society, based on the idea that different strength of norms in societies translate to different game-theoretic interaction structures and incentives. We identify conditions that determine when societies change their existing norms, when they are resistant to such change, and how this depends on the strength of norms in a society.

Next, we extend this study to analyze the evolutionary relationships between the tendency to conform and how quickly a population reacts when conditions make a change in norm desirable. Our analysis identifies conditions when a tipping point is reached in a population, causing norms to change rapidly.

Next we study conditions that affect the existence of group-biased behavior among humans (i.e., favoring others from the same group, and being hostile towards others from different groups). Using an evolutionary game-theoretic model, we show that out-group hostility is dramatically reduced by mobility. Technological and societal advances over the past centuries have greatly increased the degree to which humans change physical locations, and our results show that in highly mobile societies, one's choice of action is more likely to depend on what individual one is interacting with, rather than the group to which the individual belongs.