Computer Science Theses and Dissertations
Permanent URI for this collectionhttp://hdl.handle.net/1903/2756
Browse
4 results
Search Results
Item Towards Multimodal and Context-Aware Emotion Perception(2023) Mittal, Trisha; Manocha, Dinesh Dr.; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Human emotion perception is a part of affective computing, a branch of computing that studies and develops systems and devices that can recognize, interpret, process, and simulate human affects. Research in human emotion perception, however, has been mostly restricted to psychology-based literature which explores the theoretical aspects of emotion perception, but does not touch upon its practical applications. For instance, human emotion perception plays a pivotal role in an extensive array of sophisticated intelligent systems, encompassing domains such as behavior prediction, social robotics, medicine, surveillance, and entertainment. In order to deploy emotion perception in these applications, extensive research in psychology has demonstrated that humans not only perceive emotions and behavior through diverse human modalities but also glean insights from situational and contextual cues. This dissertation not only enhances the capabilities of existing human emotion perception systems but also forges novel connections between emotion perception and multimedia analysis, social media analysis, and multimedia forensics. Specifically, this work introduces two innovative algorithms that revolutionize the construction of human emotion perception models. These algorithms are then applied to detect falsified multimedia, understand human behavior and psychology on social media networks, and extract the intricate array of emotions evoked by movies. In the first part of this dissertation, we delve into two unique approaches to advance emotion perception models. The first approach capitalizes on the power of multiple modalities to perceive human emotion. The second approach leverages the contextual information, such as the background scene, diverse modalities of the human subject, and intricate socio-dynamic inter-agent interactions. These elements converge to predict perceived emotions with better accuracy, culminating in the development of context-aware human emotion perception models. In the second part of this thesis, we forge connections between emotion perception and three prominent domains of artificial intelligence applications. These domains include video manipulations and deepfake detection, multimedia content analysis, and user behavior analysis on social media platforms. Drawing inspiration from emotion perception, we conceptualize enriched solutions that push the conventional boundaries and redefine the possibilities within these domains. All experiments in this dissertation have been conducted on all state-of-the-art emotion perception datasets, including IEMOCAP, CMU-MOSEI, EMOTIC, SENDv1, MovieGraphs, LIRIS-ACCEDE, DF-TIMIT, DFDC, Intentonomy, MDID, and MET-Meme. In fact, we propose three additional datasets to this list, namely GroupWalk, VideoSham and IntentGram. In addition to providing quantitative results to validate our claims, we conduct user evaluations where applicable, serving as a compelling testament to the remarkable outcomes of our experiments.Item Fusing Multimedia Data Into Dynamic Virtual Environments(2018) Du, Ruofei; Varshney, Amitabh; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)In spite of the dramatic growth of virtual and augmented reality (VR and AR) technology, content creation for immersive and dynamic virtual environments remains a significant challenge. In this dissertation, we present our research in fusing multimedia data, including text, photos, panoramas, and multi-view videos, to create rich and compelling virtual environments. First, we present Social Street View, which renders geo-tagged social media in its natural geo-spatial context provided by 360° panoramas. Our system takes into account visual saliency and uses maximal Poisson-disc placement with spatiotemporal filters to render social multimedia in an immersive setting. We also present a novel GPU-driven pipeline for saliency computation in 360° panoramas using spherical harmonics (SH). Our spherical residual model can be applied to virtual cinematography in 360° videos. We further present Geollery, a mixed-reality platform to render an interactive mirrored world in real time with three-dimensional (3D) buildings, user-generated content, and geo-tagged social media. Our user study has identified several use cases for these systems, including immersive social storytelling, experiencing the culture, and crowd-sourced tourism. We next present Video Fields, a web-based interactive system to create, calibrate, and render dynamic videos overlaid on 3D scenes. Our system renders dynamic entities from multiple videos, using early and deferred texture sampling. Video Fields can be used for immersive surveillance in virtual environments. Furthermore, we present VRSurus and ARCrypt projects to explore the applications of gestures recognition, haptic feedback, and visual cryptography for virtual and augmented reality. Finally, we present our work on Montage4D, a real-time system for seamlessly fusing multi-view video textures with dynamic meshes. We use geodesics on meshes with view-dependent rendering to mitigate spatial occlusion seams while maintaining temporal consistency. Our experiments show significant enhancement in rendering quality, especially for salient regions such as faces. We believe that Social Street View, Geollery, Video Fields, and Montage4D will greatly facilitate several applications such as virtual tourism, immersive telepresence, and remote education.Item Statistical Methods for Analyzing Time Series Data Drawn from Complex Social Systems(2015) Darmon, David; Girvan, Michelle; Rand, William; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)The rise of human interaction in digital environments has lead to an abundance of behavioral traces. These traces allow for model-based investigation of human-human and human-machine interaction `in the wild.' Stochastic models allow us to both predict and understand human behavior. In this thesis, we present statistical procedures for learning such models from the behavioral traces left in digital environments. First, we develop a non-parametric method for smoothing time series data corrupted by serially correlated noise. The method determines the simplest smoothing of the data that simultaneously gives the simplest residuals, where simplicity of the residuals is measured by their statistical complexity. We find that complexity regularized regression outperforms generalized cross validation in the presence of serially correlated noise. Next, we cast the task of modeling individual-level user behavior on social media into a predictive framework. We demonstrate the performance of two contrasting approaches, computational mechanics and echo state networks, on a heterogeneous data set drawn from user behavior on Twitter. We demonstrate that the behavior of users can be well-modeled as processes with self-feedback. We find that the two modeling approaches perform very similarly for most users, but that users where the two methods differ in performance highlight the challenges faced in applying predictive models to dynamic social data. We then expand the predictive problem of the previous work to modeling the aggregate behavior of large collections of users. We use three models, corresponding to seasonal, aggregate autoregressive, and aggregation-of-individual approaches, and find that the performance of the methods at predicting times of high activity depends strongly on the tradeoff between true and false positives, with no method dominating. Our results highlight the challenges and opportunities involved in modeling complex social systems, and demonstrate how influencers interested in forecasting potential user engagement can use complexity modeling to make better decisions. Finally, we turn from a predictive to a descriptive framework, and investigate how well user behavior can be attributed to time of day, self-memory, and social inputs. The models allow us to describe how a user processes their past behavior and their social inputs. We find that despite the diversity of observed user behavior, most models inferred fall into a small subclass of all possible finitary processes. Thus, our work demonstrates that user behavior, while quite complex, belies simple underlying computational structures.Item PREDICTION IN SOCIAL MEDIA FOR MONITORING AND RECOMMENDATION(2012) Wu, Shanchan; Raschid, Louiqa; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Social media including blogs and microblogs provide a rich window into user online activity. Monitoring social media datasets can be expensive due to the scale and inherent noise in such data streams. Monitoring and prediction can provide significant benefit for many applications including brand monitoring and making recommendations. Consider a focal topic and posts on multiple blog channels on this topic. Being able to target a few potentially influential blog channels which will contain relevant posts is valuable. Once these channels have been identified, a user can proactively join the conversation themselves to encourage positive word-of-mouth and to mitigate negative word-of-mouth. Links between different blog channels, and retweets and mentions between different microblog users, are a proxy of information flow and influence. When trying to monitor where information will flow and who will be influenced by a focal user, it is valuable to predict future links, retweets and mentions. Predictions of users who will post on a focal topic or who will be influenced by a focal user can yield valuable recommendations. In this thesis we address the problem of prediction in social media to select social media channels for monitoring and recommendation. Our analysis focuses on individual authors and linkers. We address a series of prediction problems including future author prediction problem and future link prediction problem in the blogosphere, as well as prediction in microblogs such as twitter. For the future author prediction in the blogosphere, where there are network properties and content properties, we develop prediction methods inspired by information retrieval approaches that use historical posts in the blog channel for prediction. We also train a ranking support vector machine (SVM) to solve the problem, considering both network properties and content properties. We identify a number of features which have impact on prediction accuracy. For the future link prediction in the blogosphere, we compare multiple link prediction methods, and show that our proposed solution which combines the network properties of the blog with content properties does better than methods which examine network properties or content properties in isolation. Most of the previous work has only looked at either one or the other. For the prediction in microblogs, where there are follower network, retweet network, and mention network, we propose a prediction model to utilize the hybrid network for prediction. In this model, we define a potential function that reflects the likelihood of a candidate user having a specific type of link to a focal user in the future and identify an optimization problem by the principle of maximum likelihood to determine the parameters in the model. We propose different approximate approaches based on the prediction model. Our approaches are demonstrated to outperform the baseline methods which only consider one network or utilize hybrid networks in a naive way. The prediction model can be applied to other similar problems where hybrid networks exist.