Computer Vision for Scene Text Analaysis
Computer Vision for Scene Text Analaysis
Loading...
Files
Publication or External Link
Date
2004-08-06
Authors
Zandifar, Ali
Advisor
Chellappa, Rama
Citation
DRUM DOI
Abstract
The motivation of this dissertation is to develop a 'Seeing-Eye' video-based
interface for the visually impaired to access environmental text information. We
are concerned with those daily activities of the low-vision people involved with
interpreting 'environmental text' or 'scene text' e.g., reading a newspaper, can labels
and street signs.
First, we discuss the devopement of such a video-based interface. In this
interface, the processed image of a scene text is read by o®-the-shelf OCR and
converted back to speech by Text-to-Speech(TTS) software. Our challenge is to feed
a high quality image of a scene text for o®-the-shelf OCR software under general
pose of the the surface on which text is printed. To achieve this, various problems
related to feature detection, mosaicing, auto-focus, zoom, and systems integration
were solved in the development of the system, and these are described.
We employ the video-based interface for the analysis of video of lectures/posters.
In this application, the text is assumed to be on a plane. It is necessary for automatic
analysis of video content to add modules such as enhancement, text segmentation,
preprocessing video content, metric rectification, etc. We provide qualitative results
to justify the algorithm and system integration.
For more general classes of surfaces that the text is printed on, such as bent or
worked paper, we develop a novel method for 3D structure recovery and unwarping
method. Deformed paper is isometric with a plane and the Gaussian curvature
vanishes on every point on the surface. We show that these constraints lead to a
closed set of equations that allow the recovery of the full geometric structure from a
single image. We prove that these partial di®erential equations can be reduced to the
Hopf equation that arises in non-linear wave propagation, and deformations of the
paper can be interpreted in terms of the characteristics of this equation. A new exact
integration of these equations relates the 3D structure of the surface to an image of
a paper. In addition, we can generate such surfaces using the underlying equations.
This method only uses information derived from the image of the boundary.
Furthermore, we employ the shape-from-texture method as an alternative to
the method above to infer its 3D structure. We showed that for the consistency of
normal vector field, we need to add extra conditions based on the surface model.
Such conditions are are isometry and zero Gaussian curvature of the surface.
The theory underlying the method is novel and it raises new open research
issues in the area of 3D reconstruction from single views. The novel contributions
are: first, it is shown that certain linear and non-linear clues (contour knowledge
information) are su±cient to recover the 3D structure of scene text; second, that
with a priori of a page layout information, we can reconstruct a fronto-parallel view
of a deformed page from di®erential geometric properties of a surface; third, that
with a known cameral model we can recover 3D structure of a bent surface; forth, we
present an integrated framework for analysis and rectification of scene texts from
single views in general format; fifth, we provide the comparison with shape from
texture approach and finally this work can be integrated as a visual prostheses for
the visually impaired.
Our work has many applications in computer vision and computer graphics.
The applications are diverse e.g. a generalized scanning device, digital flattening
of creased documents, 3D reconstruction problem when correspondence fails, 3D
reconstruction of single old photos, bending and creasing virtual paper, object classification,
semantic extraction, scene description and so on.