Computer Science Theses and Dissertations
Permanent URI for this collectionhttp://hdl.handle.net/1903/2756
Browse
2 results
Search Results
Item AI Empowered Music Education(2024) Shrestha, Snehesh; Aloimonos, Yiannis; Fermüller, Cornelia; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Learning a musical instrument is a complex process involving years of practice and feedback. However, dropout rates in music programs, particularly among violin students, remain high due to socio-economic barriers and the challenge of mastering the instrument. This work explores the feasibility of accelerating learning and leveraging technology in music education, with a focus on bowed string instruments, specifically the violin. My research identifies workflow gaps and challenges for the stakeholders, aiming to address not only the improvement of learning outcomes but also the provision of opportunities for socioeconomically challenged students. Three key areas are emphasized: designing user studies and creating a comprehensive violin dataset, developing tools and deep learning algorithms for accurate performance assessment, and crafting a practice platform for student feedback. Three fundamental perspectives were essential: a) understanding the stakeholders and their specific challenges, b) understanding how the instrument operates and what actions the player must master to control its functions, and c) addressing the technical challenges associated with constructing and implementing detection and feedback systems. The existing datasets were inadequate for analyzing violin playing, primarily due to their lack of diversity of body types and skill levels, as well as the absence of well-synchronized and calibrated video data, along with corresponding ground truth 3D poses and musical events. Our experiment design was ensured that the collected data would be suitable for subsequent tasks downstream. These considerations played a significant role in determining the metrics used to evaluate the accuracy of the data and the success metrics for the subsequent tasks. At the foundation of movement analysis lies 3D human pose estimation. Unfortunately, the current state-of-the-art algorithms face challenges in accurately estimating monocular 3D poses during instrument playing. These challenges arise from factors such as occlusions, partial views, human-object interactions, limited viewing angles, pixel density, and camera sampling rates. To address these issues, we developed a novel 3D pose estimation algorithm based on the insight that the music produced by the violin is a direct result of the corresponding motions. Our algorithm integrates visual observations with audio inputs to generate precise, high-resolution 3D pose estimates that are temporally consistent and conducive to downstream tasks. Providing effective feedback to learners is a nuanced process that requires balancing encouragement with challenge. Without a user-friendly interface and a motivational strategy, feedback runs the risk of being counterproductive. While current systems excel at detecting pitch and temporal misalignments and visually displaying them for analysis, they often overwhelm players. In this dissertation, we introduce two novel feedback systems. The first is a visual-haptic feedback system that overlays simple augmented cues on the user's body, gently guiding them back to the correct posture. The second is a haptic band synchronized with the music, enhancing students' perception of rhythmic timing and bowing intensities. Additionally, we developed an intuitive user interface for real-time feedback during practice sessions and performance reviews. This data can be shared with teachers for deeper insights into students' struggles and track progress. This research aims to empower both students and teachers. By providing students with feedback during individual practice sessions and equipping teachers with tools to monitor and tailor AI interventions according to their preferences, this work serves as a valuable teaching assistant. By addressing tasks that teachers may not prefer or physically perform, such as personalized feedback and progress tracking, this research endeavors to democratize access to high-quality music education and mitigate dropout rates in music programs.Item Towards Multimodal and Context-Aware Emotion Perception(2023) Mittal, Trisha; Manocha, Dinesh Dr.; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Human emotion perception is a part of affective computing, a branch of computing that studies and develops systems and devices that can recognize, interpret, process, and simulate human affects. Research in human emotion perception, however, has been mostly restricted to psychology-based literature which explores the theoretical aspects of emotion perception, but does not touch upon its practical applications. For instance, human emotion perception plays a pivotal role in an extensive array of sophisticated intelligent systems, encompassing domains such as behavior prediction, social robotics, medicine, surveillance, and entertainment. In order to deploy emotion perception in these applications, extensive research in psychology has demonstrated that humans not only perceive emotions and behavior through diverse human modalities but also glean insights from situational and contextual cues. This dissertation not only enhances the capabilities of existing human emotion perception systems but also forges novel connections between emotion perception and multimedia analysis, social media analysis, and multimedia forensics. Specifically, this work introduces two innovative algorithms that revolutionize the construction of human emotion perception models. These algorithms are then applied to detect falsified multimedia, understand human behavior and psychology on social media networks, and extract the intricate array of emotions evoked by movies. In the first part of this dissertation, we delve into two unique approaches to advance emotion perception models. The first approach capitalizes on the power of multiple modalities to perceive human emotion. The second approach leverages the contextual information, such as the background scene, diverse modalities of the human subject, and intricate socio-dynamic inter-agent interactions. These elements converge to predict perceived emotions with better accuracy, culminating in the development of context-aware human emotion perception models. In the second part of this thesis, we forge connections between emotion perception and three prominent domains of artificial intelligence applications. These domains include video manipulations and deepfake detection, multimedia content analysis, and user behavior analysis on social media platforms. Drawing inspiration from emotion perception, we conceptualize enriched solutions that push the conventional boundaries and redefine the possibilities within these domains. All experiments in this dissertation have been conducted on all state-of-the-art emotion perception datasets, including IEMOCAP, CMU-MOSEI, EMOTIC, SENDv1, MovieGraphs, LIRIS-ACCEDE, DF-TIMIT, DFDC, Intentonomy, MDID, and MET-Meme. In fact, we propose three additional datasets to this list, namely GroupWalk, VideoSham and IntentGram. In addition to providing quantitative results to validate our claims, we conduct user evaluations where applicable, serving as a compelling testament to the remarkable outcomes of our experiments.