Computer Science Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2756

Browse

Search Results

Now showing 1 - 10 of 11
  • Thumbnail Image
    Item
    Efficient Rendering, Display, and Compression Techniques for Virtual and Augmented Reality
    (2024) Jabbireddy, Susmija; Varshney, Amitabh; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Virtual and augmented reality (VR and AR) are bridging the gap between the physical and the virtual worlds. The ultimate goal of VR and AR technology is to present three-dimensional (3D) images at high frame rates for realistic, immersive, and interactive viewing experiences. As the demand for higher resolution in VR and AR devices increases, the computational complexity and the data requirements also increase. This puts a burden on underlying critical resources, such as memory, processing time, and energy consumption, which are essential for storing, rendering, processing, and displaying information. To address these challenges, this research explores methods that harness the inherent structure and redundancy present in the data. By focusing on three key areas – rendering, displays, and compression – this dissertation aims to enable efficient AR/VR systems that enhance resource utilization without compromising the user experience. First, we focus on developing computationally efficient rendering techniques. We begin by discussing various foveated rendering approaches. With the advent of real-time eye tracking systems and an increase in the resolution and field of view in modern AR and VR headsets, foveated rendering becomes crucial to achieve real-time rendering. We review the current state of the field and provide a taxonomy of various foveation techniques that can be used as a guide for developing foveated rendering methods. Then, we investigate methods to improve the image quality from sparse Monte Carlo samples in volumetric rendering. Monte Carlo path tracing has the potential to create stunning visualizations of volumetric data. However, the computational cost of achieving noise-free images is extremely high due to the large number of samples required per pixel. We show how deep-learning based denoising techniques can be integrated with Monte Carlo volumetric rendering to achieve high-quality images at interactive rates. Next, we present our research towards developing energy-efficient holographic displays. Holographic displays are considered true 3D displays, with the potential to emulate all the depth cues of human vision. Nanophotonic phased array (NPA) is a novel emerging technology for holographic displays with compact sizes and very high refresh rates. However, building a large-scale NPA is limited by the significant power consumption and circuit complexity. We present algorithms to generate sparse holographic patterns and show that we can produce high-resolution images with high visual quality using as few as 10% of the display elements. Finally, we explore techniques for efficient compression of multi-view images. As the quantity of 3D data being acquired and processed continues to expand, it leads to increased storage demands and data transmission challenges. We present a deep learning-based multi-view image compression framework that integrates a novel view-aware entropy model with the recent advancements in single-view image compression. By achieving superior compression performance, our approach facilitates more efficient utilization of memory resources when dealing with multi-view data.
  • Thumbnail Image
    Item
    Enabling On-body Computing Using a Track-Based Wearable
    (2021) Sathya Sai Kumar, Anup; Peng, Huaishu; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    We are seeing an increasing trend in the number of computing devices attached to the body which provide a myriad of data along with additional interaction mechanisms and haptic feedback from these locations. Although this provides more computing locations, the devices are still resigned to stay in those particular locations. We believe that relocatable wearables can reduce the number of devices that the user has to keep track of, while also providing dynamic data by moving around the body. Some attempts have been made to build relocatable wearables, but these attempts are either too bulky or make the use of unreliable and slow locomotion mechanisms. In this thesis, we present Calico, a miniature wearable robot system with fast and precise locomotion for on-body sensing, actuation, and interaction. Calico consists of a two-wheel robot and an on-cloth track system or "railway," on which the robot travels. The robot packs an IMU, a battery and an onboard microcontroller that supports wireless BLE communication. The track system has multiple branches that extend to key areas of the human body, allowing the robot to reach the front and back of the torso, shoulders and arms, and even lower body areas such as legs. The track system also includes rotational switches, or "railway turntables," enabling complex routing options when diverging tracks are presented. We introduce the system design of Calico and report a series of technical evaluations for system performance. To illustrate potential use cases, we present a suite of applications, including remote medical usage for chest auscultation, body posture tracking for training and exercise, a wearable walkie-talkie, and a robot companion living on-body.
  • Thumbnail Image
    Item
    Information Olfactation: Theory, Design, and Evaluation
    (2019) Patnaik, Biswaksen; Elmqvist, Niklas; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Olfactory feedback for analytical tasks is a virtually unexplored area in spite of the advantages it offers for information recall, feature identification, and location detection. Here we introduce the concept of ‘Information Olfactation’ as the fragrant sibling of information visualization, and discuss how scent can be used to convey data. Building on a review of the human olfactory system and mirroring common visualization practice, we propose olfactory marks, the substrate in which they exist, and their olfactory channels that are available to designers. To exemplify this idea, we present ‘viScent(1.0)’: a six-scent stereo olfactory display capable of conveying olfactory glyphs of varying temperature and direction, as well as a corresponding software system that integrates the display with a traditional visualization display. We also conduct a comprehensive perceptual experiment on Information Olfactation: the use of olfactory marks and channels to convey data. More specifically, following the example from graphical perception studies, we design an experiment that studies the perceptual accuracy of four ``olfactory channels''---scent type, scent intensity, airflow, and temperature---for conveying three different types of data---nominal, ordinal, and quantitative. We also present details of an advanced 24-scent olfactory display: ‘viScent(2.0)’ and its software framework that we designed in order to run this experiment. Our results yield a ranking of olfactory channels for each data type that follows similar principles as rankings for visual channels, such as those derived by Mackinlay, Cleveland & McGill, and Bertin.
  • Thumbnail Image
    Item
    Towards Robust, Interpretable and Scalable Visual Representations
    (2017) Li, Ang; Davis, Larry S; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Visual representation is one of the central problems in computer vision. The essential problem is to develop a unified representation that effectively encodes both visual appearance and spatial information so that it can be easily applied to various vision applications such as face recognition, image matching, and multimodal image retrieval. Along with the history of computer vision research, there are four major levels of visual representations, i.e., geometric, low-level, mid-level and high-level. The dissertation comprises four works studying effective visual representations in the four different levels. Multiple approaches are proposed with the aim of improving the robustness, interpretability, and scalability of visual representations. Geometric features are effective in matching images under spatial transformations however their performance is sensitive to the noises. In the first part, we propose to model the uncertainty of geometric representation based on line segments and propose to equip these features with uncertainty modeling so that they could be robustly applied in the image-based geolocation application. We study in the second part the robustness of feature encoding to noisy keypoints. We show that traditional feature encoding is sensitive to background or noisy features. We propose the Selective Encoding framework which learns the relevance distribution of each codeword and incorporate such information with the original codebook model. Our approach is more robust to the localization errors or uncertainty in the active face authentication application. The mission of visual understanding is to express and describe the image content which is essentially relating images to human language. That typically involves finding a common representation inferable from both domains of data. In the third part, we propose a framework to extract a mid-level spatial representation directly from language descriptions and match such spatial layouts to the detected object bounding boxes for retrieving indoor scene images from user text queries. Modern high-level visual features are typically learned from supervised datasets, whose scalability is largely limited by the requirement of dedicated human annotation. In the last part, we propose to learn visual representations from large-scale weakly supervised data for a large number of natural language-based concepts, i.e., n-gram phrases. We propose the differentiable Jelinek-Mercer smoothing loss and train a deep convolutional neural network from images with associated user comments. We show that the learned model can predict a large number of phrase-based concepts from images, can be effectively applied to image-caption applications and transfers well to other visual recognition datasets.
  • Thumbnail Image
    Item
    Leveraging Structure in Activity Recognition: Context and Spatiotemporal Dynamics
    (2015) Khamis, Sameh; Davis, Larry S; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Activity recognition is one of the fundamental problems of computer vision. An activity recognition system aims to identify the actions of humans from an image or a video. This problem has been historically approached in isolation, and typically as part of a multi-stage system, where tracking for instance is another part. However, recent work sheds light on how activity recognition is in fact entangled with other fundamental problems in the field. Tracking is one such instance, where the identity of each person is maintained across a video sequence. Scene classification is another example, where scene properties are identified from image data. Affordance reasoning is yet another, where the objects in the scene are assigned labels representing what types of actions can be performed upon them. In this thesis we build a joint formulation for activity recognition, modeling the aforementioned coupled problems as latent variables. Optimizing the objective function for this formulation allows us to recover a more accurate solution to activity recognition and simultaneously solutions to problems like tracking or scene classification. We first introduce a model that jointly solves tracking and activity recognition from videos. Instead of establishing tracks in a preprocessing step, the model solves a joint optimization problem, recovering actions and identities for every person in a video sequence. We then extend this model to include frame-level cues, where activity labels assigned to people in the same scene are inter-compatible through a scene-level label. In the second half of the thesis we look at an alternative formulation of the same problem, based on probabilistic logic. This new model leverages the same cues, temporal and spatial, through soft logic rules. This joint formulation can be efficiently solved, recovering both action labels and tracks. We finally introduce another model that reformulates action recognition in the multi-label setting, where each person can be performing more than one action at the same time. In this setting, a joint formulation can solve for all the likely actions of a person through explicit modeling of action label correlations. Finally, we conclude with a discussion of several challenges and how they can motivate viable future extensions.
  • Thumbnail Image
    Item
    Affine Loop Optimization Based on Modulo Unrolling in Chapel
    (2014) Sharma, Aroon; Barua, Rajeev; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This work presents modulo unrolling without unrolling (modulo unrolling WU), a method for message aggregation for parallel loops in message passing programs that use affine array accesses in Chapel, a Partitioned Global Address Space (PGAS) parallel programming language. Messages incur a non-trivial run time overhead, a significant component of which is independent of the size of the message. Therefore, aggregating messages improves performance. Our optimization for message aggregation is based on a technique known as modulo unrolling, pioneered by Barua [1] whose purpose was to ensure a statically predictable single tile number for each memory reference for tiled architectures, such as the MIT Raw Machine [2]. Modulo unrolling WU applies to data that is distributed in a cyclic or block-cyclic manner. In this paper, we adapt the aforementioned modulo unrolling technique to the difficult problem of efficiently compiling PGAS languages to message passing architectures. When applied to loops and data distributed cyclically or block-cyclically, modulo unrolling WU can decide when to aggregate messages thereby reducing the overall message count and runtime for a particular loop. Compared to other methods, modulo unrolling WU greatly simplifies the complex problem of automatic code generation of message passing code. It also results in substantial performance improvements in both runtime and communication compared to the non-optimized Chapel compiler. To implement this optimization in Chapel, we modify the Cyclic distribution module's follower iterator and the Block Cyclic distribution module's leader and follower iterators, as opposed to creating a traditional compiler transformation. Results were collected that compare the performance of Chapel programs optimized with modulo unrolling WU and Chapel programs using the existing Chapel data distributions. Data collected on a ten-locale cluster show that on average, modulo unrolling WU used with Chapel's Cyclic distribution results in 64 percent fewer messages and a 36 percent decrease in runtime for our suite of benchmarks. Similarly, modulo unrolling WU used with Chapel's Block Cyclic distribution results in 72 percent fewer messages and a 53 percent decrease in runtime. Finally, the results from three different scaling experiments suggest that the greatest improvements from modulo unrolling WU occur when parallel follower iterator chunks of work contain the greatest number of data elements.
  • Thumbnail Image
    Item
    Fast Numerical and Machine Learning Algorithms for Spatial Audio Reproduction
    (2014) Luo, Yuancheng; Duraiswami, Ramani; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Audio reproduction technologies have underwent several revolutions from a purely mechanical, to electromagnetic, and into a digital process. These changes have resulted in steady improvements in the objective qualities of sound capture/playback on increasingly portable devices. However, most mobile playback devices remove important spatial-directional components of externalized sound which are natural to the subjective experience of human hearing. Fortunately, the missing spatial-directional parts can be integrated back into audio through a combination of computational methods and physical knowledge of how sound scatters off of the listener's anthropometry in the sound-field. The former employs signal processing techniques for rendering the sound-field. The latter employs approximations of the sound-field through the measurement of so-called Head-Related Impulse Responses/Transfer Functions (HRIRs/HRTFs). This dissertation develops several numerical and machine learning algorithms for accelerating and personalizing spatial audio reproduction in light of available mobile computing power. First, spatial audio synthesis between a sound-source and sound-field requires fast convolution algorithms between the audio-stream and the HRIRs. We introduce a novel sparse decomposition algorithm for HRIRs based on non-negative matrix factorization that allows for faster time-domain convolution than frequency-domain fast-Fourier-transform variants. Second, the full sound-field over the spherical coordinate domain must be efficiently approximated from a finite collection of HRTFs. We develop a joint spatial-frequency covariance model for Gaussian process regression (GPR) and sparse-GPR methods that supports the fast interpolation and data fusion of HRTFs across multiple data-sets. Third, the direct measurement of HRTFs requires specialized equipment that is unsuited for widespread acquisition. We ``bootstrap'' the human ability to localize sound in listening tests with Gaussian process active-learning techniques over graphical user interfaces that allows the listener to infer his/her own HRTFs. Experiments are conducted on publicly available HRTF datasets and human listeners.
  • Thumbnail Image
    Item
    THE IMAGE TORQUE OPERATOR FOR MID-LEVEL VISION: THEORY AND EXPERIMENT
    (2012) Nishigaki, Morimichi; Aloimonos, Yiannis; Fermuller, Cornelia; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    A problem central to visual scene understanding and computer vision is to extract semantically meaningful parts of images. A visual scene consists of objects, and the objects and parts of objects are delineated from their surrounding by closed contours. In this thesis a new bottom-up visual operator, called the Torque operator, which captures the concept of closed contours is introduced. Its computation is inspired by the mechanical definition of torque or moment of force, and applied to image edges. It takes as input edges and computes over regions of different size a measure of how well the edges are aligned to form a closed, convex contour. The torque operator is by definition scale independent, and can be seen as an operator of mid-level vision that captures the organizational concept of 'closure' and grouping mechanism of edges. In this thesis, fundamental properties of the torque measure are studied, and experiments are performed to demonstrate and verify that it can be made a useful tool for a variety of applications, including visual attention, segmentation, and boundary edge detection.
  • Thumbnail Image
    Item
    Enhancing Productivity and Performance Portability of General-Purpose Parallel Programming
    (2012) Tzannes, Alexandros; Baua, Rajeev; Vishkin, Uzi; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This work focuses on compiler and run-time techniques for improving the productivity and the performance portability of general-purpose parallel programming. More specifically, we focus on shared-memory task-parallel languages, where the programmer explicitly exposes parallelism in the form of short tasks that may outnumber the cores by orders of magnitude. The compiler, the run-time, and the platform (henceforth the system) are responsible for harnessing this unpredictable amount of parallelism, which can vary from none to excessive, towards efficient execution. The challenge arises from the aspiration to support fine-grained irregular computations and nested parallelism. This work is even more ambitious by also aspiring to lay the foundations to efficiently support declarative code, where the programmer exposes all available parallelism, using high-level language constructs such as parallel loops, reducers or futures. The appeal of declarative code is twofold for general-purpose programming: it is often easier for the programmer who does not have to worry about the granularity of the exposed parallelism, and it achieves better performance portability by avoiding overfitting to a small range of platforms and inputs for which the programmer is coarsening. Furthermore, PRAM algorithms, an important class of parallel algorithms, naturally lend themselves to declarative programming, so supporting it is a necessary condition for capitalizing on the wealth of the PRAM theory. Unfortunately, declarative codes often expose such an overwhelming number of fine-grained tasks that existing systems fail to deliver performance. Our contributions can be partitioned into three components. First, we tackle the issue of coarsening, which declarative code leaves to the system. We identify two goals of coarsening and advocate tackling them separately, using static compiler transformations for one and dynamic run-time approaches for the other. Additionally, we present evidence that the current practice of burdening the programmer with coarsening either leads to codes with poor performance-portability, or to a significantly increased programming effort. This is a ``show-stopper'' for general-purpose programming. To compare the performance portability among approaches, we define an experimental framework and two metrics, and we demonstrate that our approaches are preferable. We close the chapter on coarsening by presenting compiler transformations that automatically coarsen some types of very fine-grained codes. Second, we propose Lazy Scheduling, an innovative run-time scheduling technique that infers the platform load at run-time, using information already maintained. Based on the inferred load, Lazy Scheduling adapts the amount of available parallelism it exposes for parallel execution and, thus, saves parallelism overheads that existing approaches pay. We implement Lazy Scheduling and present experimental results on four different platforms. The results show that Lazy Scheduling is vastly superior for declarative codes and competitive, if not better, for coarsened codes. Moreover, Lazy Scheduling is also superior in terms of performance-portability, supporting our thesis that it is possible to achieve reasonable efficiency and performance portability with declarative codes. Finally, we also implement Lazy Scheduling on XMT, an experimental manycore platform developed at the University of Maryland, which was designed to support codes derived from PRAM algorithms. On XMT, we manage to harness the existing hardware support for scheduling flat parallelism to compose it with Lazy Scheduling, which supports nested parallelism. In the resulting hybrid scheduler, the hardware and software work in synergy to overcome each other's weaknesses. We show the performance composability of the hardware and software schedulers, both in an abstract cost model and experimentally, as the hybrid always performs better than the software scheduler alone. Furthermore, the cost model is validated by using it to predict if it is preferable to execute a code sequentially, with outer parallelism, or with nested parallelism, depending on the input, the available hardware parallelism and the calling context of the parallel code.
  • Thumbnail Image
    Item
    Optimizing for a Many-Core Architecture without Compromising Ease-of-Programming
    (2011) Caragea, George Constantin; Vishkin, Uzi; Barua, Rajeev; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Faced with nearly stagnant clock speed advances, chip manufacturers have turned to parallelism as the source for continuing performance improvements. But even though numerous parallel architectures have already been brought to market, a universally accepted methodology for programming them for general purpose applications has yet to emerge. Existing solutions tend to be hardware-specific, rendering them difficult to use for the majority of application programmers and domain experts, and not providing scalability guarantees for future generations of the hardware. This dissertation advances the validation of the following thesis: it is possible to develop efficient general-purpose programs for a many-core platform using a model recognized for its simplicity. To prove this thesis, we refer to the eXplicit Multi-Threading (XMT) architecture designed and built at the University of Maryland. XMT is an attempt at re-inventing parallel computing with a solid theoretical foundation and an aggressive scalable design. Algorithmically, XMT is inspired by the PRAM (Parallel Random Access Machine) model and the architecture design is focused on reducing inter-task communication and synchronization overheads and providing an easy-to-program parallel model. This thesis builds upon the existing XMT infrastructure to improve support for efficient execution with a focus on ease-of-programming. Our contributions aim at reducing the programmer's effort in developing XMT applications and improving the overall performance. More concretely, we: (1) present a work-flow guiding programmers to produce efficient parallel solutions starting from a high-level problem; (2) introduce an analytical performance model for XMT programs and provide a methodology to project running time from an implementation; (3) propose and evaluate RAP -- an improved resource-aware compiler loop prefetching algorithm targeted at fine-grained many-core architectures; we demonstrate performance improvements of up to 34.79% on average over the GCC loop prefetching implementation and up to 24.61% on average over a simple hardware prefetching scheme; and (4) implement a number of parallel benchmarks and evaluate the overall performance of XMT relative to existing serial and parallel solutions, showing speedups of up to 13.89x vs.~ a serial processor and 8.10x vs.~parallel code optimized for an existing many-core (GPU). We also discuss the implementation and optimization of the Max-Flow algorithm on XMT, a problem which is among the more advanced in terms of complexity, benchmarking and research interest in the parallel algorithms community. We demonstrate better speed-ups compared to a best serial solution than previous attempts on other parallel platforms.