An Expectation Maximization Approach to Revenue Management on Rail Ticket Data

Thumbnail Image


Publication or External Link





In the world of sale of perishable commodities without regulation, competition causes cut-throat pricing and heavy discounts for the commodity. Even though this is beneficial to the customers, the companies that offer the commodity have to be careful to prevent the offered discounts and cut-throat pricing from cutting into their profits. The science of managing revenues in such scenarios is loosely termed as Revenue Management (RM). RM holds its roots to the competition generated in the American airline industry after deregulation. Since then, it has spread to virtually all industries that deal with perishable commodities such as hotel and hospitality, rental vehicles, and all forms of long distance public transportation, even freight.

The commodities in these industries refer to the items for sale. In a hotel, it may be rooms of different classes and sizes; in vehicle rentals, cars; and in all forms of long distance transportation, seating space. Perishability of these commodities can be understood simply by the fact that after a certain date, a certain commodity will not be available. In long distance transportation, it is easy to imagine that the seats on a vehicle (plane, bus, train or ferry) will not be available after the vehicle has departed on its way. Similarly rooms in a hotel or cars with a rental agency will loose value the longer they are kept empty or unused. The goal of modern day RM is, therefore, to ensure profitable sales of such commodities, such that they are priced at better rates than the competition.

This thesis attempts to apply the theory of Expectation Maximization (EM) to the purchase data from railway industry in a attempt to better the existing pricing logic. The EM algorithm used here was developed by Dr. Kalyan Talluri and Dr. Gareth van Ryzin in their seminal paper published in 2004. In that paper the authors develop the algorithm, derive the mathematics that powers it and apply it to test data sets to prove that it out performs the current industry standard. However, application of that method to a real dataset has never been done, which is the goal of this thesis.

We find, and document herewith, the issues that resulted from applying the EM algorithm directly to the data. Mainly, assumptions in the EM algorithm required heavy data clean up, after which it was found that the results were neither satisfactory nor useful. The reasons for the failure of the model are examined in detail, the primary reason being lack of identifiability in the data. To conclude, the EM algorithm needs substantial modification or additional data in order to lose certain debilitating assumptions and make it more general or reduce the identifiability problem of the data.