USING DEEP GENERATIVE MACHINE LEARNING METHODS TO GENERATE SYNTHETIC POPULATION

dc.contributor.advisorCinzia, Cirilloen_US
dc.contributor.authorYang, Zhichaoen_US
dc.contributor.departmentCivil Engineeringen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2023-02-01T06:47:55Z
dc.date.available2023-02-01T06:47:55Z
dc.date.issued2022en_US
dc.description.abstractPopulation synthesis is an important area of research aiming at generating synthetic data about households and individuals that would be representative of real large populations. Scholars in different fields have worked on synthetic population generation: statisticians, computers scientists, economists, social scientists, and engineers. In transportation modeling, synthetic agents are a key input for agent-based models, that are gradually replacing zone-based aggregate four steps models. Traditional methods for population synthesis include Iterative Population Fitting (IPF), that weights sample data until marginals for the variables of interest match official statistics (often from CENSUS) at a certain geographical area. Recently, Machine Learning algorithms have been tested and compared to IPF, which suffers from several well-known limitations. In this M.S. thesis, advanced deep generative machine learning methods are applied to generate synthetic populations, including CTGAN and TVAE. CTGAN is an advanced GAN algorithm that models tabular data distribution and sample rows from the underlying distribution. It has been shown that CTGAN can solve issues that challenge conventional GAN model, including mixed data types, non-Gaussian distributions, multimodal distributions, learning from sparse one-hot-encoded vectors and highly imbalanced categorical columns. TVAE is also an advanced VAE model that adapts VAE to tabular data by using preprocessing and modifying the loss function. As a case study, this research applies these two machine learning methods to generate synthetic population based on a sample from the American Community Survey relative to the State of Maryland. To demonstrate the performance of the proposed methods, we compare our results to those obtained with IPF and Bayesian Network using metrics that evaluate the ability of the population synthetizer to reproduce the dependency structure and the marginals in the real population and to solve the problem of zero cells in IPF.en_US
dc.identifierhttps://doi.org/10.13016/f3mx-txhl
dc.identifier.urihttp://hdl.handle.net/1903/29649
dc.language.isoenen_US
dc.subject.pqcontrolledTransportationen_US
dc.titleUSING DEEP GENERATIVE MACHINE LEARNING METHODS TO GENERATE SYNTHETIC POPULATIONen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
Yang_umd_0117N_23067.pdf
Size:
755.74 KB
Format:
Adobe Portable Document Format