Information Retrieval on the World Wide Web and Active Logic: A Survey and Problem Definition

View/ Open
Date
2002-08-30Author
Barfourosh, A. Abdollahzadeh
Nezhad, H. R. Motahary
Anderson, M. L.
Perlis, D.
Metadata
Show full item recordAbstract
As more information becomes available on the World Wide Web (there are
currently over 4 billion pages covering most areas of human endeavor), it
becomes more difficult to provide effective search tools for information
access. Today, people access web information through two main kinds of
search interfaces: Browsers (clicking and following hyperlinks) and Query
Engines (queries in the form of a set of keywords showing the topic of
interest). The first process is tentative and time consuming and the second
may not satisfy the user because of many inaccurate and irrelevant results.
Better support is needed for expressing one's information need and returning
high quality search results by web search tools. There appears to be a need
for systems that do reasoning under uncertainty and are flexible enough to
recover from the contradictions, inconsistencies, and irregularities that such reasoning involves.
Active Logic is a formalism that has been developed with real-world
applications and their challenges in mind. Motivating its design is the
thought that one of the factors that supports the flexibility of human
reasoning is that it takes place step-wise, in time. Active Logic is one of
a family of inference engines (step-logics) that explicitly reason in time,
and incorporate a history of their reasoning as they run. This
characteristic makes Active Logic systems more flexible than traditional AI
systems and therefore more suitable for commonsense, real-world reasoning.
In this report we mainly will survey recent advances in machine learning and
crawling problems related to the web. We will review the continuum of
supervised to semi-supervised to unsupervised learning problems, highlight
the specific challenges which distinguish information retrieval in the
hypertext domain and will summarize the key areas of recent and ongoing
research. We will concentrate on topic-specific search engines, focused
crawling, and finally will propose an Information Integration Environment,
based on the Active Logic framework.
Keywords: Web Information Retrieval, Web Crawling, Focused Crawling, Machine
Learning, Active Logic
(Also UMIACS-TR-2001-69)