TOWARDS AUTOMATED CONTRACT ANALYSIS: APPLYING LANGUAGE MODELS TO RISK IDENTIFICATION IN THE CONTEXT OF PUBLIC-PRIVATE PARTNERSHIPS
Files
(RESTRICTED ACCESS)
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
Risk management is critical to project success, especially in public-private partnerships (P3s) featuring long-term relationships, uncertainty, and complexity. Poorly handled risk management, especially regarding risk transfer, can lead to incentive distortion, disputes, or even project failure. The contract, serving as the formal and enforceable legal agreement binding on the public and private partners, plays a vital role in transferring risks associated with P3s. Risk identification is an important step in contract risk analysis, since overlooking specific risk clauses may cause detrimental consequences, such as revenue loss, unexpected financial liabilities, and legal disputes for contracting parties. Previous research has extensively examined the identification and allocation of project risks between contracting parties, predominantly employing questionnaire surveys, interviews, or content analysis methods. These studies depict common practices of risk identification and allocation, with some addressing risks stipulated in contracts. Nonetheless, there are notable limitations. Firstly, the findings derived from these traditional approaches often lack replicability. Secondly, given the typical lengthy nature of P3 contracts, conventional methods for analyzing risk-related contract content are labor-intensive and time-consuming. Thirdly, most of the studies do not offer a means to retrieve specific provisions for nuanced scrutiny. Addressing these limitations necessitates the adoption of innovative approaches to gain more granular and replicable results in risk-related contract analysis. The ideal solution should allow for the effortless and consistent extraction of specific contractual provisions related to project risks, providing a microscopic lens to risk allocation practices. With the recent advancements in natural language processing (NLP), especially transformer-based pre-trained language models (PLMs) and cutting-edge large language models (LLMs), there has been a significant breakthrough in the efficiency of processing and extracting information from textual data. Motivated by both the pivotal yet complicated nature of contract documents and the increasingly mature NLP techniques that create new opportunities for text analysis, this research aims to utilize NLP to automate the identification of risk-related aspects in contract documents. Firstly, a risk-related framework of P3 contracts is developed based on a literature review and a contract review. Based on that, a series of NLP-based tools are developed for the automated identification of risks-related contract language, including 1) a rule-based model for contingency liability identification with a weighted F1-score of 88.9%, 2) a fine-tuned PLM (particularly the BERT family) for risk type and allocation identification with a weighted F1-score of 80.6% and 80.5%, and 3) a prompt design with an LLM (particularly GPT-3.5) for risk type and allocation identification with a weighted F1-score of 64.1% and 72.1%. Next, the effectiveness of these different approaches is compared. Finally, we apply the tools to real contract documents to offer risk profiles of P3 contracts. The goal is to foster a more efficient, precise, and in-depth understanding of contract risks by leveraging the capabilities of NLP technologies.