Multi-Agent Autonomous Decision Making in Artificial Intelligence
Files
Publication or External Link
Date
Authors
Advisor
Goldstein, Tom
Citation
DRUM DOI
Abstract
Multi-Agent Autonomous Decision Making, especially Multi-Agent Reinforcement Learning (MARL), is an emerging area of Artificial Intelligence (AI) where autonomous agents interact with each other, fostering competition and/or cooperation. These AI agents can be useful to solve real world problems like Augmented Reality, Recommender Systems, Supply Chain Orchestration, Climate Conservation, Self-Driving Cars, Sports, Interdiction Games and Real-time Guidance of Cooking, Education, Manufacturing and Robotic Tasks. Challenges of AI Agents include efficiently scaling to multiple agents, solving coordination problems and understanding agentic behavior.
My Ph.D. thesis has the objective to develop and deploy efficient Multi-Agent AI algorithms, for real-world decision making problems. To begin with, a multi-agent approach can be used to model the Human-AI Alignment problem, a major challenge to rapidly deploy AI models. Mis-alignment challenges exist in current AI models like ChatGPT which face serious challenges to plan or reason like performing a 4-digit multiplication of two integers. Concepts from MARL like ad-hoc coordination help humans and autonomous agents to communicate the goals of each planned step correctly and explicitly reason about the strategies humans may utilize when attempting to shape the behavior of AI model agents. These kinds of communications are not often efficient or robust with increasing scale of AI Agents and need theoretical results for efficient Agentic behavior. I have provided formal guarantees for successful and reliable cooperation of AI agents with populations of socially intelligent agents, defining agentic behavior with game-theoretic notions of consistency and compatibility. AI agents cooperate in the above settings with populations of socially intelligent agents that are individually rational while also reliably coordinating with other group members in a general-sum Bayesian game. The AI agents face challenges generalizing from previous interactions that can help them to cooperate with a new partner drawn from such populations. It is theoretically shown that just these assumptions are insufficient to select an AI agent’s strategy that achieves zero-shot coordination with any member in the socially intelligent population, which can be addressed by a proven upper bound on the sample complexity to learn a successful cooperation strategy, based on observing interaction among members of the target population. Lower bounds are derived to show when the multi-agent cooperation setting is needed with respect to the populations’ trajectories, the state space and the length of the learning episodes. These bounds under the assumption of consistency and compatibility are proven to be stronger than a “naive” reduction of this cooperation problem to one of Imitation Learning.
My thesis then shows that such collaborations of AI Agents in Alignment with human goals can have real-world applications like with Augmented Reality and Self-Driving Cars. Multimodal vision-language AI Agents can assist humans proactively by determining when and how the AI Agent will autonomously intervene in real-time to cooperatively solve day-to-day tasks. Augmented Reality (AR) gadgets with distributed edge computing use cases, be it a smartphone or a wearable device, can lead to a major improvement of the user experience in solving procedural day-to-day tasks by introducing egocentric multimodal (audio and video) observational capabilities to AI agents. These AR capabilities help the AI Agents to see and listen to users' actions, thus relating to multimodal capabilities of human users. Current AI Agents, be it Large Language Models (LLMs) or Multimodal Vision-Language Models (VLMs) are mostly reactive in nature, where the AI models cannot take an action without waiting for the human user’s vision-language prompts. Proactivity of AI Agents helps the human users to detect and correct any task mistakes by providing more autonomous assistance, encouraging users when they do tasks correctly or simply engaging in conversation with users - akin to a human teaching or helping another human user. I have created a YET to Intervene (YETI) multimodal agent that focuses on the research question of identifying circumstances in real-time that may require the AI agent to proactively intervene. My trained YETI agent can understand when it can intervene in a conversation with human users to help them correct mistakes on tasks, like cooking, using Augmented Reality. YETI learns scene understanding signals based on interpretable notions of Structural Similarity (SSIM) on consecutive observed video frames. It also learns the alignment signal to identify if the video frames corresponding to users' actions on the task are consistent with their expected actions. These signals are used by the AI Agent to determine when it should proactively intervene. I compare the YETI results on the instances of proactive intervention to the HoloAssist multimodal benchmark for an expert agent guiding a user to complete procedural tasks. Control problems for autonomous AI agents, especially safety-critical applications such as autonomous vehicle control, require robust decision making frameworks to ensure safe navigation in such complex and dynamic environments. This necessitates approaches such as Agentic Model Predictive Control (MPC), which can anticipate future problems and plan for them accordingly. A novel framework has been introduced that integrates MPC with Multimodal VLMs in order to enhance the ability of autonomous vehicles to navigate and respond to real-world scenarios with the ability to take fine-grained actions.
Multi-Agent AI can be pervasive in real world applications, given the foundation of humans and other technological agents to interact with each other, strategize and perform a task. In my thesis, I show that MARL can help to strategize mitigation strategies for climate conservation problems like deforestation mitigation by improving the prediction of deforestation hotspots in Indonesia, one of the two major rain forests in the world. I also share the modeling of another application of multi-agent AI collaboration in Supply Chain Orchestration, creating a simulated environment that is cognizant to seasonal demand and cold chains with improved exploration of strategies to maximize profit. I have created a new intrinsic reward signal, helping to save unnecessary interactions among AI Agents planning inventory in Supply Chain warehouses.
These real-world applications motivate the need to understand why the AI Agents behave the way they do which is addressed with Explainable AI (XAI) agents by addressing the question of which XAI methods should be recommended, subject to user agent goals. Explaining the behavior of AI models becomes important in the context of different factors including their training and inference speed that can determine end-users preferring an AI model over another. MARL has been applied to Explainable AI (XAI) problems using a Multi-Agent RecSys to recommend Explainable AI (XAI) outputs for different AI models that can serve the objectives of the model’s users for building trustworthy safe AI. Goal-Conditioned RL can be applied to model AI users learning XAI outcomes as per their preferences. Research on learning to visualize semantic representations satisfying user objectives provides motivation to improve the visualization of XAI methods satisfying the objectives of different users as targets with MARL representations. To represent MARL targets, much of the control problem can be abstracted for deployment in real-world settings like interdiction games, with a much simpler game theoretic problem. Multi-Agent AI algorithms also help to prune AI model parameters across model layers for efficient learning.
As with humans, a large number of AI agents can take a long time to learn strategies jointly. Multi-Agent RL can be pretty slow with increasing scale of agents. To address this, it is shown that the JAXMARL library leverages JAX-enabled hardware acceleration that can make it 12,500x faster over existing libraries in 8 popular MARL environments. The effectiveness of AI Agents can be improved by a combination of the proposed Multi-Agent Reinforcement Learning, Imitation Learning, Model Predictive Control, and Computational Game Theoretic algorithms in solving problems in real world and simulation environments.