FLEXIBLE INPUT FOR NOVEL VIEW SYNTHESIS

dc.contributor.advisorZwicker, Matthiasen_US
dc.contributor.authorLin, Gengen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2025-09-15T05:45:09Z
dc.date.issued2025en_US
dc.description.abstractSynthesizing novel views of a scene from a limited number of input images is a long-standing problem. It has enormous practical applications, such as virtual museums, online meetings, and sports event streaming. The emergence of virtual reality technology has also made it easier for people to observe and interact with virtual environments, highlighting the needs in robust and efficient view synthesis.In recent years, this field of research has seen large advances in neural radiance fields (NeRFs) and 3D Gaussian Splatting (3DGS). Although these methods achieve efficient high-quality view synthesis, they require careful and dense capturing in controlled environments. Therefore, much research effort has been made to relax the requirements, like allowing dynamic scenes and handling variance in lighting. This thesis presents our work along this path of enabling novel view synthesis from challenging inputs. In particular, we focus on three scenarios. First, we tackle the problem of unwanted foreground objects, like moving people or vehicles present in front of a building. As these objects cast shadows and reflections, naively masking the objects leaves artifacts in background reconstruction. We propose a method to decompose foreground objects with their cast effects into separate 2D layers and a clean 3D background layer. Second, we address view synthesis from very few inputs. With as few as three input views, we leverage recent developments in large image and video generation priors to interpolate in-between views for better supervising scene reconstruction methods. To improve both efficiency and quality, we use a feedforward geometry foundation model to obtain a dense point cloud that serves as condition images to the image priors. In addition, we introduce optimizable image warps and a robust view sampling strategy to deal with inconsistencies in generated images. Lastly, we consider an extended problem of inverse rendering which decomposes the scene into geometry, material properties, and environment lighting. It not only enables synthesis of novel views via rendering, but also provides extended capacities like scene editing and relighting. We propose a simple capturing method by rotating the object several times when taking photos. With this setup, we show that the artifacts caused by ambiguity can be drastically reduced. We model the scene with 2D Gaussian primitives for computational efficiency and make use of a proxy geometry as well as a residual constraint to further improve handling of global illumination. The works presented in this thesis improve the quality and robustness of novel view synthesis with challenging input data. Further research efforts can be made along the lines to enable casual capturing and lower the bars for creating and sharing digital content from the real world.en_US
dc.identifierhttps://doi.org/10.13016/icgn-5dos
dc.identifier.urihttp://hdl.handle.net/1903/34698
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledinverse renderingen_US
dc.subject.pquncontrollednovel view synthesisen_US
dc.titleFLEXIBLE INPUT FOR NOVEL VIEW SYNTHESISen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Lin_umd_0117E_25533.pdf
Size:
86.23 MB
Format:
Adobe Portable Document Format