FLEXIBLE INPUT FOR NOVEL VIEW SYNTHESIS

Lin, Geng

FLEXIBLE INPUT FOR NOVEL VIEW SYNTHESIS

dc.contributor.advisor	Zwicker, Matthias	en_US
dc.contributor.author	Lin, Geng	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2025-09-15T05:45:09Z
dc.date.issued	2025	en_US
dc.description.abstract	Synthesizing novel views of a scene from a limited number of input images is a long-standing problem. It has enormous practical applications, such as virtual museums, online meetings, and sports event streaming. The emergence of virtual reality technology has also made it easier for people to observe and interact with virtual environments, highlighting the needs in robust and efficient view synthesis.In recent years, this field of research has seen large advances in neural radiance fields (NeRFs) and 3D Gaussian Splatting (3DGS). Although these methods achieve efficient high-quality view synthesis, they require careful and dense capturing in controlled environments. Therefore, much research effort has been made to relax the requirements, like allowing dynamic scenes and handling variance in lighting. This thesis presents our work along this path of enabling novel view synthesis from challenging inputs. In particular, we focus on three scenarios. First, we tackle the problem of unwanted foreground objects, like moving people or vehicles present in front of a building. As these objects cast shadows and reflections, naively masking the objects leaves artifacts in background reconstruction. We propose a method to decompose foreground objects with their cast effects into separate 2D layers and a clean 3D background layer. Second, we address view synthesis from very few inputs. With as few as three input views, we leverage recent developments in large image and video generation priors to interpolate in-between views for better supervising scene reconstruction methods. To improve both efficiency and quality, we use a feedforward geometry foundation model to obtain a dense point cloud that serves as condition images to the image priors. In addition, we introduce optimizable image warps and a robust view sampling strategy to deal with inconsistencies in generated images. Lastly, we consider an extended problem of inverse rendering which decomposes the scene into geometry, material properties, and environment lighting. It not only enables synthesis of novel views via rendering, but also provides extended capacities like scene editing and relighting. We propose a simple capturing method by rotating the object several times when taking photos. With this setup, we show that the artifacts caused by ambiguity can be drastically reduced. We model the scene with 2D Gaussian primitives for computational efficiency and make use of a proxy geometry as well as a residual constraint to further improve handling of global illumination. The works presented in this thesis improve the quality and robustness of novel view synthesis with challenging input data. Further research efforts can be made along the lines to enable casual capturing and lower the bars for creating and sharing digital content from the real world.	en_US
dc.identifier	https://doi.org/10.13016/icgn-5dos
dc.identifier.uri	http://hdl.handle.net/1903/34698
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pquncontrolled	inverse rendering	en_US
dc.subject.pquncontrolled	novel view synthesis	en_US
dc.title	FLEXIBLE INPUT FOR NOVEL VIEW SYNTHESIS	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Lin_umd_0117E_25533.pdf
Size:: 86.23 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations