STATISTICAL AND OPTIMAL LEARNING WITH APPLICATIONS IN BUSINESS ANALYTICS
Publication or External Link
Statistical learning is widely used in business analytics to discover structure or exploit patterns from historical data, and build models that capture relationships between an outcome of interest and a set of variables. Optimal learning on the other hand, solves the operational side of the problem, by iterating between decision making and data acquisition/learning. All too often the two problems go hand-in-hand, which exhibit a feedback loop between statistics and optimization.
We apply this statistical/optimal learning concept on a context of fundraising marketing campaign problem arising in many non-profit organizations. Many such organizations use direct-mail marketing to cultivate one-time donors and convert them into recurring contributors. Cultivated donors generate much more revenue than new donors, but also lapse with time, making it important to steadily draw in new cultivations. The direct-mail budget is limited, but better-designed mailings can improve success rates without increasing costs.
We first apply statistical learning to analyze the effectiveness of several design approaches used in practice, based on a massive dataset covering 8.6 million direct-mail communications with donors to the American Red Cross during 2009-2011. We find evidence that mailed appeals are more effective when they emphasize disaster preparedness and training efforts over post-disaster cleanup. Including small cards that affirm donors' identity as Red Cross supporters is an effective strategy, while including gift items such as address labels is not. Finally, very recent acquisitions are more likely to respond to appeals that ask them to contribute an amount similar to their most recent donation, but this approach has an adverse effect on donors with a longer history. We show via simulation that a simple design strategy based on these insights has potential to improve success rates from 5.4% to 8.1%.
Given these findings, when new scenario arises, however, new data need to be acquired to update our model and decisions, which is studied under optimal learning framework. The goal becomes discovering a sequential information collection strategy that learns the best campaign design alternative as quickly as possible. Regression structure is used to learn about a set of unknown parameters, which alternates with optimization to design new data points. Such problems have been extensively studied in the ranking and selection (R&S) community, but traditional R&S procedures experience high computational costs when the decision space grows combinatorially. We present a value of information procedure for simultaneously learning unknown regression parameters and unknown sampling noise. We then develop an approximate version of the procedure, based on semi-definite programming relaxation, that retains good performance and scales better to large problems. We also prove the asymptotic consistency of the algorithm in the parametric model, a result that has not previously been available for even the known-variance case.