Mathematics
Permanent URI for this communityhttp://hdl.handle.net/1903/2261
Browse
3 results
Search Results
Item Combining Evidence from Unconstrained Spoken Term Frequency Estimation for Improved Speech Retrieval(2008-11-21) Olsson, James Scott; Oard, Douglas W; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)This dissertation considers the problem of information retrieval in speech. Today's speech retrieval systems generally use a large vocabulary continuous speech recognition system to first hypothesize the words which were spoken. Because these systems have a predefined lexicon, words which fall outside of the lexicon can significantly reduce search quality---as measured by Mean Average Precision (MAP). This is particularly important because these Out-Of-Vocabulary (OOV) words are often rare and therefore good discriminators for topically relevant speech segments. The focus of this dissertation is on handling these out-of-vocabulary query words. The approach is to combine results from a word-based speech retrieval system with those from vocabulary-independent ranked utterance retrieval. The goal of ranked utterance retrieval is to rank speech utterances by the system's confidence that they contain a particular spoken word, which is accomplished by ranking the utterances by the estimated frequency of the word in the utterance. Several new approaches for estimating this frequency are considered, which are motivated by the disparity between reference and errorfully hypothesized phoneme sequences. The first method learns alternate pronunciations or degradations from actual recognition hypotheses and incorporates these variants into a new generative estimator for term frequency. A second method learns transformations of several easily computed features in a discriminative model for the same task. Both methods significantly improved ranked utterance retrieval in an experimental validation on new speech. The best of these ranked utterance retrieval methods is then combined with a word-based speech retrieval system. The combination approach uses a normalization learned in an additive model, which maps the retrieval status values from each system into estimated probabilities of relevance that are easily combined. Using this combination, much of the MAP lost because of OOV words is recovered. Evaluated on a collection of spontaneous, conversational speech, the system recovers 57.5\% of the MAP lost on short (title-only) queries and 41.3\% on longer (title plus description) queries.Item Estimation theory of a location parameter in small samples(2008-04-22) Yu, Tinghui; Kagan, Abram M; Mathematical Statistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)The topic of this thesis is estimation of a location parameter in small samples. Chapter 1 is an overview of the general theory of statistical estimates of parameters, with a special attention on the Fisher information, Pitman estimator and their polynomial versions. The new results are in Chapters 2 and 3 where the following inequality is proved for the variance of the Pitman estimator t_n from a sample of size n from a population F(x−\theta): nVar(t_n) >= (n+1)Var(t_{n+1}) for any n >= 1, only under the condition of finite second moments(even the absolute continuity of F is not assumed). The result is much stronger than the known Var(t_n) >= Var(t_{n+1}). Among other new results are (i) superadditivity of 1/Var(t_n) with respect to the sample size: 1/Var(t_{m+n}) >= 1/Var(t_m) + 1/Var(t_n), proved as a corollary of a more general result; (ii) superadditivity of Var(t_n) for a fixed n with respect to additive perturbations; (iii) monotonicity of Var(t_n) with respect to the scale parameter of an additive perturbation when the latter belongs to the class of self-decomposable random variables. The technically most difficult result is an inequality for Var(t_n), which is a stronger version of the classical Stam inequality for the Fisher information. As a corollary, an interesting property of the conditional expectation of the sample mean given the residuals is discovered. Some analytical problems arising in connection with the Pitman estimators are studied. Among them, a new version of the Cauchy type functional equation is solved. All results are extended to the case of polynomial Pitman estimators and to the case of multivariate parameters. In Chapter 4 we collect some open problems related to the theory of location parameters.Item Exploring and Modeling Online Auctions Using Functional Data Analysis(2007-05-08) Wang, Shanshan; Jank, Wolfgang; Smith, Paul J.; Mathematics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)In recent years, the increasing popularity of eCommerce, and particularly online auctions has stirred a great amount of scholarly research, especially in information systems, economics, and marketing, but little or no attention has been received from statistics. ECommerce arrives with enormous amounts of rich and clean data as well as statistical challenges. eCommerce not only creates new data challenges, it also motivates the need for innovative models. While there exist many theories about economic behavior of participants in market exchanges, many of these theories have been developed before the appearance of the world wide web and often are not appropriate to be used in explaining modern economic behavior in eCommerce. This calls for new models that describe not only the evolution of a process, but also its dynamics. This research takes a different look at online auctions and proposes to study an auction's price evolution and associated price dynamics from different points of view using functional data analysis techniques. In this dissertation, we develop novel dynamic modeling procedures applicable to online auctions. First, we develop a dynamic forecasting system to predict the price of an ongoing auction. By dynamic we mean that the model can predict the price of an auction ``in-progress" and can update its prediction based on newly arriving information. Our dynamic forecasting model accounts for the special features of online auction data by using modern functional data analysis techniques. We also use the functional context to systematically describe the empirical regularities of auction dynamics. Second, we propose a family of differential equation models to capture the dynamics in online auctions. A novel multiple comparisons test is proposed to compare dynamics models of auction sub-populations. We accomplish the modeling task within the framework of principal differential analysis and functional models. Third, we propose Model-based Functional Differential Equation Trees to better incorporate the different characteristics of the auction, item, bidders and seller into the differential equation. We compare this new tree-method with trees either based on high-dimensional multivariate responses or functional responses. We apply our methods to a novel set of Harry Potter and Microsoft Xbox data for model validation and comparison of method.