University of Maryland DRUM  
University of Maryland Digital Repository at the University of Maryland

DRUM >
College of Computer, Mathematical & Natural Sciences >
Computer Science >
Technical Reports from UMIACS >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1903/1153

Title: Information Retrieval on the World Wide Web and Active Logic: A Survey and Problem Definition
Authors: Barfourosh, A. Abdollahzadeh
Nezhad, H. R. Motahary
Anderson, M. L.
Perlis, D.
Type: Technical Report
Issue Date: 30-Aug-2002
Series/Report no.: UM Computer Science Department; CS-TR-4291
UMIACS; UMIACS-TR-2001-69
Abstract: As more information becomes available on the World Wide Web (there are currently over 4 billion pages covering most areas of human endeavor), it becomes more difficult to provide effective search tools for information access. Today, people access web information through two main kinds of search interfaces: Browsers (clicking and following hyperlinks) and Query Engines (queries in the form of a set of keywords showing the topic of interest). The first process is tentative and time consuming and the second may not satisfy the user because of many inaccurate and irrelevant results. Better support is needed for expressing one's information need and returning high quality search results by web search tools. There appears to be a need for systems that do reasoning under uncertainty and are flexible enough to recover from the contradictions, inconsistencies, and irregularities that such reasoning involves. Active Logic is a formalism that has been developed with real-world applications and their challenges in mind. Motivating its design is the thought that one of the factors that supports the flexibility of human reasoning is that it takes place step-wise, in time. Active Logic is one of a family of inference engines (step-logics) that explicitly reason in time, and incorporate a history of their reasoning as they run. This characteristic makes Active Logic systems more flexible than traditional AI systems and therefore more suitable for commonsense, real-world reasoning. In this report we mainly will survey recent advances in machine learning and crawling problems related to the web. We will review the continuum of supervised to semi-supervised to unsupervised learning problems, highlight the specific challenges which distinguish information retrieval in the hypertext domain and will summarize the key areas of recent and ongoing research. We will concentrate on topic-specific search engines, focused crawling, and finally will propose an Information Integration Environment, based on the Active Logic framework. Keywords: Web Information Retrieval, Web Crawling, Focused Crawling, Machine Learning, Active Logic (Also UMIACS-TR-2001-69)
URI: http://hdl.handle.net/1903/1153
Appears in Collections:Technical Reports of the Computer Science Department
Technical Reports from UMIACS

Files in This Item:

File Description SizeFormatNo. of Downloads
CS-TR-4291.pdf305.26 kBAdobe PDF1218View/Open

All items in DRUM are protected by copyright, with all rights reserved.

 

DRUM is brought to you by the University of Maryland Libraries
University of Maryland, College Park, MD 20742-7011 (301)314-1328.
Please send us your comments. -
All Contents