Enhancing Visual and Gestural Fidelity for Effective Virtual Environments

Thumbnail Image


Publication or External Link





A challenge for the virtual reality (VR) industry is facing is that VR is not immersive enough to make people feel a genuine sense of presence: the low frame rate leads to dizziness and the lack of human body visualization limits the human-computer interaction. In this dissertation, I present our research on enhancing visual and gestural fidelity in the virtual environment.

First, I present a new foveated rendering technique: Kernel Foveated Rendering (KFR), which parameterizes foveated rendering by embedding polynomial kernel functions in log-polar space. This GPU-driven technique uses parameterized foveation that mimics the distribution of photoreceptors in the human retina. I present a two-pass kernel foveated rendering pipeline that maps well onto modern GPUs. I have carried out user studies to empirically identify the KFR parameters and have observed a 2.8x-3.2x speedup in rendering on 4K displays.

Second, I explore the rendering acceleration through foveation for 4D light fields, which captures both the spatial and angular rays, thus enabling free-viewpoint rendering and custom selection of the focal plane. I optimize the KFR algorithm by adjusting the weight of each slice in the light field, so that it automatically selects the optimal foveation parameters for different images according to the gaze position. I have validated our approach on the rendering of light fields by carrying out both quantitative experiments and user studies. Our method achieves speedups of 3.47x-7.28x for different levels of foveation and different rendering resolutions.

Thirdly, I present a simple yet effective technique for further reducing the cost of foveated rendering by leveraging ocular dominance - the tendency of the human visual system to prefer scene perception from one eye over the other. Our new approach, eye-dominance-guided foveated rendering (EFR), renders the scene at a lower foveation level (with higher detail) for the dominant eye than the non-dominant eye. Compared with traditional foveated rendering, EFR can be expected to provide superior rendering performance while preserving the same level of perceived visual quality.

Finally, I present an approach to use an end-to-end convolutional neural network, which consists of a concatenation of an encoder and a decoder, to reconstruct a 3D model of a human hand from a single RGB image. Previous research work on hand mesh reconstruction suffers from the lack of training data. To train networks with full supervision, we fit a parametric hand model to 3D annotations, and we train the networks with the RGB image with the fitted parametric model as the supervision. Our approach leads to significantly improved quality compared to state-of-the-art hand mesh reconstruction techniques.