Statistical and Machine Learning Approaches for Estimating Optimal Solution Values in Combinatorial Optimization: Applications to the Traveling Salesman and Vehicle Routing Problems
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
Solving combinatorial optimization problems such as the Traveling Salesman Problem (TSP) and Vehicle Routing Problem (VRP) to optimality is computationally challenging. This motivates the development of efficient methods to estimate optimal solution values for combinatorial optimization problems without solving them exactly. Such estimation approaches offer several benefits. First, the estimated tour lengths can serve as benchmarks to evaluate heuristic algorithms, ensuring solution quality on new instances. Additionally, logistics companies can leverage these estimates in operations to assess potential savings from split deliveries, helping them determine whether to implement the classic VRP scenario or adopt the Split Delivery Vehicle Routing Problem (SDVRP) senario.
This dissertation introduces and rigorously evaluates statistical and machine learning models designed to estimate the optimal tour lengths for the TSP, VRP, and SDVRP with high accuracy. By integrating problem-specific insights with data-driven approaches, we construct predictive models that consistently estimate the optimal solution values within a few percent of the true optimum across diverse problem instances.
A key contribution of this research is a comprehensive comparative analysis of multiple predictive methodologies, including linear regression, random forest regression, multilayer perceptron (MLP) neural networks, and the recently introduced Kolmogorov–Arnold Network (KAN). Our systematic evaluation demonstrates the strengths and weaknesses of each approach. We offer practical insights into the trade-offs among accuracy, interpretability, and generalizability across different modeling paradigms. Linear regression, while highly interpretable and computationally efficient, may lack the flexibility to capture more complex relationships. Conversely, neural network models, particularly KAN, provide a balance of interpretability and accuracy but require more sophisticated modeling and training processes. These insights serve as valuable guidance for practitioners and researchers, facilitating informed decisions about appropriate methods for estimating optimal solution values in logistics planning and research.
Additionally, we present novel high-accuracy estimation models tailored specifically for the TSP, VRP, and SDVRP. These models leverage innovative feature extraction techniques, such as moment statistics derived from random heuristic solutions, to effectively capture the underlying structure of the problems. To further explore the solution space of complex routing problems, we develop a modified Clarke & Wright algorithm with a split deliveries heuristic for the SDVRP. Additionally, for the VRP, we provide an example illustrating how estimation models can be applied to improve the heuristic algorithm for the problem. Extensive empirical experiments highlight the practical utility of our models in accurately estimating optimal values.