Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    Item
    INFERENCE AND CONTROL IN NETWORKS FAR FROM EQUILIBRIUM.
    (2022) Sharma, Siddharth; Levy, Doron Prof.; Biophysics (BIPH); Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This thesis focuses on two problems in biophysics.1. Inference in networks far from equilibrium. 2. Optimal transitions between network steady-states of unequal dimensions. The system used for development of the theory and design of computational algorithms is the fully connected and asymmetric version of the widely used Ising model. We begin with the basic concepts of biological networks and their emergence as an analytical paradigm over the last two decades due to advancements in high-throughput experimental methods. Biological systems are open and exchange both energy and matter with their environment. Their dynamics are far from equilibrium and don’t have well characterized steady-state distributions. This is in stark contrast to equilibrium dynamics with the Maxwell-Boltzmann distribution describing the histogram of microstates. The development of inference and control algorithms in this work is for nonequilibrium steady-states without detailed balance. Inferring the Ising model far from equilibrium requires solving the inverse problem in statistical mechanics. As opposed to using a known Hamiltonian to solve for the macroscopic averages, we calculate the couplings and fields, i.e., model parameters, given the microstates or stochastic snapshots as inputs. We first demonstrate a time-series calculation for the inverse problem and use Poisson and Polya-Gamma latent variables to construct a quadratic likelihood function which is then maximized using the expectation-maximization algorithm. In addition to the main calculation, properties of the Polya-Gamma variables are used to solve logistic regression on a Gaussian mixture. This has applications to problems like clustering and community detection. Not all available data in biology is time-ordered. In fact for some systems, e.g., gene-regulatory networks, most of the data is not in time-series. The solution to the inverse problem for such systems (data) is qualitatively different as it involves solving for the thermodynamic arrow of time. The present work uses the definition of a sufficient statistic based on equivalence classes to design a likelihood function through the disjoint cycles of the permutation group. The geometric intuition is provided using dihedral group of the same order. We state and prove that our likelihood function is minimally sufficient and present an optimization algorithm with computational results. The second problem, i.e., optimal network control is solved using optimal transport. We recognize that biological networks have the property to grow and shrink while remaining functional and robust. Recent works that have continued the progress made by earlier sem- inal results have concentrated on systems which do not undergo transitions that alter their dimensions. For example, a network increasing or decreasing its number of nodes. The connection between thermodynamics and optimal transport is well established through the Wasserstein metric being the minimal dissipation for stochastic dynamics. This result depends on narrow convergence which requires that the system size remains the same. Recently introduced Gromov-Wasserstein metric defined on a space of metric measure spaces, makes it possible to design optimal paths between probability distributions of different sizes. In context of networks, the GW metric can define geodesics between two network nonequilibrium steady-states with different number of vertices. The last two chapters discuss the mathematical concepts and results that are required to develop the GW metric on networks and the computational algorithms that follow as a result. We define the probability measures and loss functions as per the physical properties of the Ising model and demonstrate a geodesic calculation between two networks of different sizes.
  • Thumbnail Image
    Item
    An Investigation of the Relationship Between Automated Machine Translation Evaluation Metrics and User Performance on an Information Extraction Task
    (2007-12-04) Tate, Calandra Rilette; Slud, Eric V; Dorr, Bonnie J; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This dissertation applies nonparametric statistical techniques toMachine Translation (MT) Evaluation using data from a MT Evaluation experiment conducted through a joint Army Research Laboratory (ARL) and Center for the Advanced Study of Language (CASL) project. In particular, the relationship between human task performance on an information extraction task with translated documents and well-known automated translation evaluation metric scores for those documents is studied. Findings from a correlation analysis of the connection between autometrics and task-based metrics are presented and contrasted with current strategies for evaluating translations. A novel idea for assessing partial rank correlation within the presence of grouping factors is also introduced. Lastly, this dissertation presents a framework for task-based machine translation (MT) evaluation and predictive modeling of task responses that gives new information about the relative predictive strengths of the different autometrics (and re-coded variants of them) within the statistical Generalized Linear Models developed in analyses of the Information Extraction Task data. This work shows that current autometrics are inadequate with respect to the prediction of task performance but, near adequacy can be accomplished through the use of re-coded autometrics in a logistic regression setting. As a result, a class of automated metrics that are best suitable for predicting performance is established and suggestions are offered about how to utilize metrics to supplement expensive and time-consuming experiments with human participants. Now users can begin to tie the intrinsic automated metrics to the extrinsic metrics for task they perform. The bottom line is that there is a need to average away MT dependence (averaged metrics perform better in overall predictions than original autometrics). Moreover, combinations of recoded metrics performed better than any individual metric. Ultimately, MT evaluation methodology is extended to create new metrics specially relevant to task-based comparisons. A formal method to establish that differences among metrics as predictors are strong enough not to be due by chance remains as future work. Given the lack of connection in the field of MT Evaluation between task utility and the interpretation of automated evaluation metrics, as well as the absence of solid statistical reasoning in evaluating MT, there is a need to bring innovative and interdisciplinary analytical techniques to this problem. Because there are no papers in the MT evaluation literature that have done statistical modeling before or that have linked automated metrics with how well MT supports human tasks, this work is unique and has high potential for benefiting the Machine Translation research community.