Theses and Dissertations from UMD
Permanent URI for this communityhttp://hdl.handle.net/1903/2
New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM
More information is available at Theses and Dissertations at University of Maryland Libraries.
Browse
Search Results
Item Evaluating Machine Intelligence with Question Answering(2021) rodriguez, pedro; Boyd-Graber, Jordan; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Humans ask questions to learn about the world and to test knowledge understanding. The ability to ask questions combines aspects of intelligence unique to humans: language understanding, knowledge representation, and reasoning. Thus, building systems capable of intelligent question answering (QA) is a grand goal of natural language processing (NLP). To measure progress in NLP, we create "exams" for computer systems and compare their effectiveness against a reference point---often based on humans. How precisely we measure progress depends on whether we are building computer systems that optimize human satisfaction in information-seeking tasks or that measure progress towards intelligent QA. In the first part of this dissertation, we explore each goal in turn, how they differ, and describe their relationship to QA formats. As an example of an information-seeking evaluation, we introduce a new dialog QA task paired with a new evaluation method. Afterward, we turn our attention to using QA to evaluate machine intelligence. A good evaluation should be able to discriminate between lesser and more capable QA models. This dissertation explores three ways to improve the discriminative power of QA evaluations: (1) dynamic weighting of test questions, (2) a format that by construction tests multiple levels of knowledge, and (3) evaluation data that is created through human-computer collaboration. By dynamically weighting test questions, we challenge a foundational assumption of the de facto standard in QA evaluation---the leaderboard. Namely, we contend that contrary to nearly all QA and NLP evaluations which implicitly assign equal weights to examples by averaging scores, that examples are not equally useful for estimating machine (or human) QA ability. As any student may tell you, not all questions on an exam are equally difficult and in the worst-case questions are unsolvable. Drawing on decades of research in educational testing, we propose adopting an alternative evaluation methodology---Item Response Theory---that is widely used to score human exams (e.g., the SAT). By dynamically weighting questions, we show that this improves the reliability of leaderboards in discriminating between models of differing QA ability while also being helpful in the construction of new evaluation datasets. Having improved the scoring of models, we next turn to improving the format and data in QA evaluations. Our idea is simple. In most QA tasks (e.g., Jeopardy!), each question tests a single level of knowledge; in our task (the trivia game Quizbowl), we test multiple levels of knowledge with each question. Since each question tests multiple levels of knowledge, this decreases the likelihood that we learn nothing about the difference between two models (i.e., they are both correct or both wrong), which substantially increases discriminative power. Despite the improved format, we next show that while our QA models defeat accomplished trivia players, that they are overly reliant on brittle pattern matching, which indicates a failure to intelligently answer questions. To mitigate this problem, we introduce a new framework for building evaluation data where humans and machines cooperatively craft trivia questions that are difficult to answer through clever pattern matching tricks alone---while being no harder for humans. We conclude by sketching a broader vision for QA evaluation that combines the three components of evaluation we improve---scoring, format, and data---to create living evaluations and re-imagine the role of leaderboards.Item Radio Analytics for Human Computer Interaction(2021) Regani, Sai Deepika; Liu, K.J. Ray; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)WiFi, as we know it, is no more a mere means of communication. Recent advances in research and industry have unleashed the sensing potential of wireless signals. With the constantly expanding availability of the radio frequency spectrum for WiFi, we now envision a future where wireless communication and sensing systems co-exist and continue to facilitate human lives. Radio signals are currently being used to ``sense" or monitor various human activities and vital signs. As Human-Computer Interaction (HCI) continues to form a considerable part of daily activities, it is interesting to investigate the potential of wireless sensing in designing practical HCI applications. This dissertation aims to study and design three different HCI applications, namely, (i) In-car driver authentication, (ii) Device-free gesture recognition through the wall, and (iii) Handwriting tracking by leveraging the radio signals. In the first part of this dissertation, we introduce the idea of in-car driver authentication using wireless sensing and develop a system that can recognize drivers automatically. The proposed system can recognize humans by identifying the unique radio biometric information embedded in the wireless channel state information (CSI) through multipath propagation. However, since the environmental information is also captured in the CSI, radio biometric recognition performance may be degraded by the changing physical environment. To this end, we address the problem of ``in-car changing environments” where the existing wireless sensing-based human identification system fails. We build a long-term driver radio biometric database consisting of radio biometrics of multiple people collected over two months. Machine learning (ML) models built using this database make the proposed system adaptive to new in-car environments. The performance of the in-car driver authentication system is shown to improve with extending multi-antenna and frequency diversities. Long-term experiments demonstrate the feasibility and accuracy of the proposed system. The accuracy achieved in the two-driver scenario is up to 99.13% for the best case compared to 87.7% achieved with the previous work. In the second part, we propose GWrite, a device-free gesture recognition system that can work in a through-the-wall scenario. The sequence of physical perturbations induced by the hand movement influences the multipath propagation and reflects in the CSI time series corresponding to the gesture. Leveraging the statistical properties of the EM wave propagation, we derive a relationship between the similarity of CSIs within the time series and the relative distance moved by the hand. Feature extraction modules are built on this relation to extract features characteristic of the gesture shapes. We built a prototype of GWrite on commercial WiFi devices and achieved a classification accuracy of 90.1\% on a set of 15 gesture shapes consisting of the uppercase English alphabets. We demonstrate that a broader set of gestures could be defined and classified using GWrite as opposed to the existing systems that operate over a limited gesture set. In the final part of this dissertation, we present mmWrite, the first high-precision passive handwriting tracking system using a single commodity millimeter wave (mmWave) radio. Leveraging the short wavelength and large bandwidth of 60 GHz signals and the radar-like capabilities enabled by the large phased array, mmWrite transforms any flat region into an interactive writing surface that supports handwriting tracking at millimeter accuracy. mmWrite employs an end-to-end pipeline of signal processing to enhance the range and spatial resolution limited by the hardware, boost the coverage, and suppress interference from backgrounds and irrelevant objects. Experiments using a commodity 60 GHz device show that mmWrite can track a finger/pen with a median error of 2.8 mm close to the device and thus can reproduce handwritten characters as small as 1 cm X 1 cm, with a coverage of up to 8 m^2 supported. With minimal infrastructure needed, mmWrite promises ubiquitous handwriting tracking for new applications in HCI.