Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 8 of 8
  • Thumbnail Image
    Item
    The First Principles of Deep Learning and Compression
    (2022) Ehrlich, Max Donohue; Shrivastava, Abhinav; Davis, Larry S; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The deep learning revolution incited by the 2012 Alexnet paper has been transformative for the field of computer vision. Many problems which were severely limited using classical solutions are now seeing unprecedented success. The rapid proliferation of deep learning methods has led to a sharp increase in their use in consumer and embedded applications. One consequence of consumer and embedded applications is lossy multimedia compression which is required to engineer the efficient storage and transmission of data in these real-world scenarios. As such, there has been increased interest in a deep learning solution for multimedia compression which would allow for higher compression ratios and increased visual quality. The deep learning approach to multimedia compression, so called Learned Multimedia Compression, involves computing a compressed representation of an image or video using a deep network for the encoder and the decoder. While these techniques have enjoyed impressive academic success, their industry adoption has been essentially non-existent. Classical compression techniques like JPEG and MPEG are too entrenched in modern computing to be easily replaced. This dissertation takes an orthogonal approach and leverages deep learning to improve the compression fidelity of these classical algorithms. This allows the incredible advances in deep learning to be used for multimedia compression without threatening the ubiquity of the classical methods. The key insight of this work is that methods which are motivated by first principles, \ie, the underlying engineering decisions that were made when the compression algorithms were developed, are more effective than general methods. By encoding prior knowledge into the design of the algorithm, the flexibility, performance, and/or accuracy are improved at the cost of generality. While this dissertation focuses on compression, the high level idea can be applied to many different problems with success. Four completed works in this area are reviewed. The first work, which is foundational, unifies the disjoint mathematical theories of compression and deep learning allowing deep networks to operate on compressed data directly. The second work shows how deep learning can be used to correct information loss in JPEG compression over a wide range of compression quality, a problem that is not readily solvable without a first principles approach. This allows images to be encoded at high compression ratios while still maintaining visual fidelity. The third work examines how deep learning based inferencing tasks, like classification, detection, and segmentation, behave in the presence of classical compression and how to mitigate performance loss. As in the previous work, this allows images to be compressed further but this time without accuracy loss on downstream learning tasks. Finally, these ideas are extended to video compression by developing an algorithm to correct video compression artifacts. By incorporating bitstream metadata and mimicking the decoding process with deep learning, the method produces more accurate results with higher throughput than general methods. This allows deep learning to improve the rate-distortion of classical MPEG codecs and competes with fully deep learning based codecs but with a much lower barrier-to-entry.
  • Thumbnail Image
    Item
    Visualizing Transmedia Networks: Links, Paths and Peripheries
    (2012) Ruppel, Marc; Kirschenbaum, Matthew G.; English Language and Literature; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    `Visualizing Transmedia Networks: Links, Paths and Peripheries' examines the increasingly complex rhetorical intersections between narrative and media (`old' and `new') in the creation of transmedia fictions, loosely defined as multisensory and multimodal stories told extensively across a diverse media set. In order to locate the `language' of transmedia expressions, this project calls attention to the formally locatable network structures placed by transmedia producers in disparate media like film, the print novel and video games. Using network visualization software and computational metrics, these structures can be used as data to graph these fictions for both quantitative and qualitative analysis. This study also, however, examines the limits to this approach, arguing that the process of transremediation, where redundancy and multiformity take precedence over networked connection, forms a second axis for understanding transmedia practices, one equally bound to the formation of new modes of meaning and literacy.
  • Thumbnail Image
    Item
    Digital Multimedia Forensics and Anti-Forensics
    (2012) Stamm, Matthew Christopher; Liu, K. J. Ray; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    As the use of digital multimedia content such as images and video has increased, so has the means and the incentive to create digital forgeries. Presently, powerful editing software allows forgers to create perceptually convincing digital forgeries. Accordingly, there is a great need for techniques capable of authenticating digital multimedia content. In response to this, researchers have begun developing digital forensic techniques capable of identifying digital forgeries. These forensic techniques operate by detecting imperceptible traces left by editing operations in digital multimedia content. In this dissertation, we propose several new digital forensic techniques to detect evidence of editing in digital multimedia content. We begin by identifying the fingerprints left by pixel value mappings and show how these can be used to detect the use of contrast enhancement in images. We use these fingerprints to perform a number of additional forensic tasks such as identifying cut-and-paste forgeries, detecting the addition of noise to previously JPEG compressed images, and estimating the contrast enhancement mapping used to alter an image. Additionally, we consider the problem of multimedia security from the forger's point of view. We demonstrate that an intelligent forger can design anti-forensic operations to hide editing fingerprints and fool forensic techniques. We propose an anti-forensic technique to remove compression fingerprints from digital images and show that this technique can be used to fool several state-of-the-art forensic algorithms. We examine the problem of detecting frame deletion in digital video and develop both a technique to detect frame deletion and an anti-forensic technique to hide frame deletion fingerprints. We show that this anti-forensic operation leaves behind fingerprints of its own and propose a technique to detect the use of frame deletion anti-forensics. The ability of a forensic investigator to detect both editing and the use of anti-forensics results in a dynamic interplay between the forger and forensic investigator. We use develop a game theoretic framework to analyze this interplay and identify the set of actions that each party will rationally choose. Additionally, we show that anti-forensics can be used protect against reverse engineering. To demonstrate this, we propose an anti-forensic module that can be integrated into digital cameras to protect color interpolation methods.
  • Thumbnail Image
    Item
    Digital Poetry: Comparative Textual Performances in Trans-medial Spaces
    (2011) Magearu, Mirona; Harrison, Regina; Carlorosi, Silvia; Comparative Literature; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This study extends work on notions of space and performance developed by media and poetry theorists. I particularly analyze how contemporary technologies re-define the writing space of digital poetry making by investigating the configuration and the function of this space in the writing of the digital poem. Thus, I employ David Jay Bolter's concept of "topographic" digital writing and propose the term "trans-medial" space to describe the computer space in which the digital poem exists, emerges, and is experienced. With origins in Italian Futurism, the literary avant-garde of the first half of twentieth century, digital poetry extends the creative repertoire of this experimental poetry tradition using computers in the composition, generation, or presentation of texts. Because these poems convey a perception of space as changeable and multiple (made of computer screen and code spaces), this "trans-medial" space is both self-transformative (forms itself as it self-transforms) and transforming (transforms what it contains). Media scholars such as Espen Aarseth and Stephanie Strickland often explain how computer programming makes such digital works become sites of encounter between agencies such as author, text, or readers. Conversely, I show that this "trans-medial" space is also a mediating agent in the performance of the text along with its readers in the sense that it engages in and with the performance of text. I examine three forms of digital poetry: Gianni Toti's video-poetry, Caterina Davinio's net-poetry, and Loss Pequeno Glazier's JavaScript-based poetry. These Italian and United States poet-scholars are leading figures in digital poetry. As scholars, they articulate the theoretical frameworks of this genre in landmark anthologies. As poets, their digital works are similar in that they are indebted to Italian Futurism; and yet they represent distinct visions of and about poetry in new media spaces. I use their works to think through video-graphic spaces, networked spaces, and scripting spaces as expressions of trans-medial space. In this respect, my comparative analysis opens up new venues for the reading of digital poetry by re-fashioning the concept and the function of the writing space of our digitized world.
  • Thumbnail Image
    Item
    Multimedia Social Networks: Game Theoretic Modeling and Equilibrium Analysis
    (2011) Chen, Yan; Liu, K. J. Ray; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Multimedia content sharing and distribution over multimedia social networks is more popular now than ever before: we download music from Napster, share our images on Flickr, view user-created video on YouTube, and watch peer-to-peer television using Coolstreaming, PPLive and PPStream. Within these multimedia social networks, users share, exchange, and compete for scarce resources such as multimedia data and bandwidth, and thus influence each other's decision and performance. Therefore, to provide fundamental guidelines for the better system design, it is important to analyze the users' behaviors and interactions in a multimedia social network, i.e., how users interact with and respond to each other. Game theory is a mathematical tool that analyzes the strategic interactions among multiple decision makers. It is ideal and essential for studying, analyzing, and modeling the users' behaviors and interactions in social networking. In this thesis, game theory will be used to model users' behaviors in social networks and analyze the corresponding equilibria. Specifically, in this thesis, we first illustrate how to use game theory to analyze and model users' behaviors in multimedia social networks by discussing the following three different scenarios. In the first scenario, we consider a non-cooperative multimedia social network where users in the social network compete for the same resource. We use multiuser rate allocation social network as an example for this scenario. In the second scenario, we consider a cooperative multimedia social network where users in the social network cooperate with each other to obtain the content. We use cooperative peer-to-peer streaming social network as an example for this scenario. In the third scenario, we consider how to use the indirect reciprocity game to stimulate cooperation among users. We use the packet forwarding social network as an example. Moreover, the concept of ``multimedia social networks" can be applied into the field of signal and image processing. If each pixel/sample is treated as a user, then the whole image/signal can be regarded as a multimedia social network. From such a perspective, we introduce a new paradigm for signal and image processing, and develop generalized and unified frameworks for classical signal and image problems. In this thesis, we use image denoising and image interpolation as examples to illustrate how to use game theory to re-formulate the classical signal and image processing problems.
  • Thumbnail Image
    Item
    Haunting Images: Differential Perception and Emotional Response to the Archetypes of News Photography: A Study of Visual Reception Factored by Gender and Expertise
    (2011) Emmett, Arielle Susan; Brown, John H; Beasley, Maurine; Journalism; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This dissertation explores how and why certain news photographs become memorable. Although researchers believe news photos count as forms of media expression, no one knows how influential these images really are in shaping societal attitudes. Social constructionist critics have argued that iconic images are pervasive markers of American collective memory. While icons have become the subject of intense media study, critics have ignored the presence of image archetypes that fall outside of the boundaries of the American iconic canon. They have also followed a top-down procedure of interpretation rather than a bottom-up method of collecting data from actual subjects. As I define it, the news image archetype is an authentically captured image of a human predicament of the greatest magnitude and seriousness showing conflict, tragedy, and occasionally, triumph. Visually these images communicate through physical gestures and facial expressions either directly, when faces are visible, or by implication in panoramic shots. Archetypal images can be iconic but need not be. Whereas icons are presumed to appeal to "everybody" by modeling ideology and "civic performance," archetypes need not exhibit any particular ideology. The common thread is more universally human than political. For this reason their appeal tends to be trans-cultural. This mixed-method study tests audience response to 41 outstanding news photographs including iconic, archetypal and ordinary examples. The purpose is to ascertain whether archetypal images can be distinguished and recalled as outstanding exemplars outside the iconic category; whether image quality preferences vary by visual expertise and gender; and how study subjects "read" the archetype. Using 2X2 ANOVA design, I studied four independent groups: male/female, visual expert/visual non-expert; n = 113. Study data indicate a convergence of ranking preference for some non-iconic archetypes that were rated as highly as famous icons. However, the strongest results show a convergence as to which image qualities (e.g., aesthetics, newsworthiness, emotional arousal etc.) were most important to viewers. The study found statistically significant differences of judgment on image qualities factored by gender and expertise. Qualitative results provided rich insights on factors affecting viewer response while composite data suggest multiple lines of future research.
  • Thumbnail Image
    Item
    Techniques For Video Surveillance: Automatic Video Editing And Target Tracking
    (2009) El-Alfy, Hazem Mohamed; Davis, Larry S; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Typical video surveillance control rooms include a collection of monitors connected to a large camera network, with many fewer operators than monitors. The cameras are usually cycled through the monitors, with provisions for manual over-ride to display a camera of interest. In addition, cameras are often provided with pan, tilt and zoom capabilities to capture objects of interest. In this dissertation, we develop novel ways to control the limited resources by focusing them into acquiring and visualizing the critical information contained in the surveyed scenes. First, we consider the problem of cropping surveillance videos. This process chooses a trajectory that a small sub-window can take through the video, selecting the most important parts of the video for display on a smaller monitor area. We model the information content of the video simply, by whether the image changes at each pixel. Then we show that we can find the globally optimal trajectory for a cropping window by using a shortest path algorithm. In practice, we can speed up this process without affecting the results, by stitching together trajectories computed over short intervals. This also reduces system latency. We then show that we can use a second shortest path formulation to find good cuts from one trajectory to another, improving coverage of interesting events in the video. We describe additional techniques to improve the quality and efficiency of the algorithm, and show results on surveillance videos. Second, we turn our attention to the problem of tracking multiple agents moving amongst obstacles, using multiple cameras. Given an environment with obstacles, and many people moving through it, we construct a separate narrow field of view video for as many people as possible, by stitching together video segments from multiple cameras over time. We employ a novel approach to assign cameras to people as a function of time, with camera switches when needed. The problem is modeled as a bipartite graph and the solution corresponds to a maximum matching. As people move, the solution is efficiently updated by computing an augmenting path rather than by solving for a new matching. This reduces computation time by an order of magnitude. In addition, solving for the shortest augmenting path minimizes the number of camera switches at each update. When not all people can be covered by the available cameras, we cluster as many people as possible into small groups, then assign cameras to groups using a minimum cost matching algorithm. We test our method using numerous runs from different simulators. Third, we relax the restriction of using fixed cameras in tracking agents. In particular, we study the problem of maintaining a good view of an agent moving amongst obstacles by a moving camera, possibly fixed to a pursuing robot. This is known as a two-player pursuit evasion game. Using a mesh discretization of the environment, we develop an algorithm that determines, given initial positions of both pursuer and evader, if the evader can take any moving strategy to go out of sight of the pursuer, and thus win the game. If it is decided that there is no winning strategy for the evader, we also compute a pursuer's trajectory that keeps the evader within sight, for every trajectory that the evader can take. We study the effect of varying the mesh size on both the efficiency and accuracy of our algorithm. Finally, we show some earlier work that has been done in the domain of anomaly detection. Based on modeling co-occurrence statistics of moving objects in time and space, experiments are described on synthetic data, in which time intervals and locations of unusual activity are identified.
  • Thumbnail Image
    Item
    Cross-Layer Resource Allocation Protocols for Multimedia CDMA Networks
    (2004-11-11) Kwasinski, Andres; Farvardin, Nariman; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The design of mechanisms to efficiently allow many users to maintain simultaneous communications while sharing the same transmission medium is a crucial step during a wireless network design. The resource allocation process needs to meet numerous requirements that are sometimes conflicting, such as high efficiency, network utilization and flexibility and good communication quality. Due to limited resources, wireless cellular networks are normally seen as having some limit on the network capacity, in terms of the maximum number of calls that may be supported. Being able to dynamically extend network operation beyond the set limit at the cost of a smooth and small increase in distortion is a valuable and useful idea because it provides the means to flexibly adjust the network to situations where it is more important to service a call rather than to guarantee the best quality. In this thesis we study designs for resource allocation in CDMA networks carrying conversational-type calls. The designs are based on a cross-layer approach where the source encoder, the channel encoder and, in some cases, the processing gains are adapted. The primary focus of the study is on optimally multiplexing multimedia sources. Therefore, we study optimal resource allocation to resolve interference-generated congestion for an arbitrary set of real-time variable-rate source encoders in a multimedia CDMA network. Importantly, we show that the problem could be viewed as the one of statistical multiplexing in source-adapted multimedia CDMA. We present analysis and optimal solutions for different system setups. The result is a flexible system that sets an efficient tradeoff between end-to-end distortion and number of users. Because in the presented cross-layer designs channel-induced errors are kept at a subjectively acceptable level, the proposed designs are able to outperform equivalent CDMA systems where capacity is increased in the traditional way, by allowing a reduction in SINR. An important application and part of this study, is the use of the proposed designs to extend operation of the CDMA network beyond a defined congestion operating point. Also, the general framework for statistical multiplexing in CDMA is used to study some issues in integrated real-time/data networks.