Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 2 of 2
  • Item
    Towards Effective Temporal Modeling for Video Understanding
    (2023) He, Bo; Shrivastava, Abhinav; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The exponential increase of video content available in the world has led to a growing need for advanced systems capable of autonomously analyzing and interpreting video data. Nowadays, video understanding has been a fundamental research topic in computer vision, focusing on the effective extraction and analysis of information from videos. Compared to the image modality, the video modality significantly differs from it with additional temporal dependencies, which provide crucial clues to help understand what happens across time. Therefore, how to effectively model temporal relationships is of vital importance for video understanding research. In this thesis, we aim to advance the field of temporal modeling, making video understanding systems more reliable, accurate, and flexible for various applications. In the first part, we introduce three strategies to model temporal dependencies in different downstream tasks and scenarios, including action recognition, temporal action localization, and video summarization. In the second part, we present a comprehensive large multimodal model for video understanding, constructed using recent advanced large language models. This model is capable of handling various video understanding tasks within a unified and integrated framework. Specifically, we first propose a Global Temporal Attention (GTA) network, which models global temporal relationships by the decoupled spatial and temporal attention operation. This approach significantly enhances the model's capability to recognize actions in videos with reduced computational cost. However, raw videos in the real world are untrimmed and contain many background frames, before correctly recognizing and classifying the action labels, we need to detect and localize the actions of interests in long untrimmed videos. Therefore, we introduce ASM-Loc framework to effectively localize action of interest under the weakly-supervised setting, which significantly reduces the need for labor-intensive manual labeling. Then, given that real-world data often comprises multiple modalities of information, such as video and text. We present A2Summ, which is aimed at tackling the challenge of summarizing both long video and text sequences with time correspondence, providing a solution for summarizing multimodal data. In the first part of this thesis, we focus on developing specialized models for individual video understanding tasks. Each model is specifically designed for a particular task, which limits their generalization ability to other areas and makes them less practical for diverse real-world applications. To address this limitation, in the second part of the thesis, we further present a unified large multimodal model, capable of handling multiple video understanding tasks. This model is built upon the foundation of powerful large language models, making it adaptable for a wide range of video understanding tasks, including video classification, online action prediction, video question answering, and video captioning. This method offers a more versatile and general solution, significantly enhancing the applicability of our models in real-world video understanding scenarios.
  • Thumbnail Image
    Item
    Barcoded Silica Nanotubes for Bioanalysis
    (2007-09-25) He, Bo; Lee, Sang Bok; Chemistry; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Analysis of the chemical/biological species involved in health care is the most important step for diseases diagnosis and new drug screening. Barcoded nano/microparticles are attracting more and more interest for detection and identification of multiplexed chemical/biological species simultaneously. However, the development of barcoded particles is still in an early stage. To solve problems existing in current barcoded particles, such as spectral overlap and degradation of materials, our group has invented barcoded silica nanotubes (SNTs) and applied them to multiplexed immunoassays and cancer marker detection as coding materials. Barcode SNTs are fabricated by a multistep anodization template synthesis method. Each barcoded SNT has several segments with different reflectance values depending on their diameters and wall thicknesses. Therefore, the barcode of each SNT can be "read-out" with a conventional optical microscope. Barcoded SNTs have shown high stability and dispersibility in aqueous buffer media. Suspension arrays with barcoded SNTs have shown high sensitivity and high selectivity for the detection of multianalytes in the multiplexed immunoassays. Magnetic field separation is one promising technique to replace tedious filtration or centrifugation separation for rapid, gentle, and reliable isolation of target analytes. Barcoded SNTs have been coupled with magnetic bead (MB) separation for protein detection and analysis. The species and number of final collected SNTs represent the types and amount of analyte proteins, respectively. By using barcoded SNTs instead of fluorescence as signals, these suspension arrays overcome the problems existing in current MB suspension arrays, such as fluorescence quenching and interference of MBs' autofluorescence. Barcoded magnetic nanotubes (BMNTs) have also been successfully fabricated as dual-functional microcarriers for multiplexed immunoassays and cancer biomarker detection with magnetic separation. BMNTs combine the shape variety of barcoded SNTs and superparamagnetic properties of magnetic nanotubes. BMNTs overcome the problems in the existing dual-functional particles. The iron oxide nanocrystals are evenly dispersed in the inner void of the tubular structures without interference with the optical barcoded patterns. BMNTs have shown high selectivity when applied in multiplexed assays and cancer biomarker detection. The identification of BMNTs with software shows promising results for rapid data analysis. The dual-functional BMNTs provide a promising way for ultrafast, gentle, efficient, and automated detection of target chemical/biochemical molecules for diagnosis and drug screening.