Information Retrieval on the World Wide Web and Active Logic: A Survey and Problem Definition

dc.contributor.authorBarfourosh, A. Abdollahzadehen_US
dc.contributor.authorNezhad, H. R. Motaharyen_US
dc.contributor.authorAnderson, M. L.en_US
dc.contributor.authorPerlis, D.en_US
dc.date.accessioned2004-05-31T23:13:04Z
dc.date.available2004-05-31T23:13:04Z
dc.date.created2002-08en_US
dc.date.issued2002-08-30en_US
dc.description.abstractAs more information becomes available on the World Wide Web (there are currently over 4 billion pages covering most areas of human endeavor), it becomes more difficult to provide effective search tools for information access. Today, people access web information through two main kinds of search interfaces: Browsers (clicking and following hyperlinks) and Query Engines (queries in the form of a set of keywords showing the topic of interest). The first process is tentative and time consuming and the second may not satisfy the user because of many inaccurate and irrelevant results. Better support is needed for expressing one's information need and returning high quality search results by web search tools. There appears to be a need for systems that do reasoning under uncertainty and are flexible enough to recover from the contradictions, inconsistencies, and irregularities that such reasoning involves. Active Logic is a formalism that has been developed with real-world applications and their challenges in mind. Motivating its design is the thought that one of the factors that supports the flexibility of human reasoning is that it takes place step-wise, in time. Active Logic is one of a family of inference engines (step-logics) that explicitly reason in time, and incorporate a history of their reasoning as they run. This characteristic makes Active Logic systems more flexible than traditional AI systems and therefore more suitable for commonsense, real-world reasoning. In this report we mainly will survey recent advances in machine learning and crawling problems related to the web. We will review the continuum of supervised to semi-supervised to unsupervised learning problems, highlight the specific challenges which distinguish information retrieval in the hypertext domain and will summarize the key areas of recent and ongoing research. We will concentrate on topic-specific search engines, focused crawling, and finally will propose an Information Integration Environment, based on the Active Logic framework. Keywords: Web Information Retrieval, Web Crawling, Focused Crawling, Machine Learning, Active Logic (Also UMIACS-TR-2001-69)en_US
dc.format.extent312585 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/1903/1153
dc.language.isoen_US
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_US
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md.)en_US
dc.relation.isAvailableAtTech Reports in Computer Science and Engineeringen_US
dc.relation.isAvailableAtUMIACS Technical Reportsen_US
dc.relation.ispartofseriesUM Computer Science Department; CS-TR-4291en_US
dc.relation.ispartofseriesUMIACS; UMIACS-TR-2001-69en_US
dc.titleInformation Retrieval on the World Wide Web and Active Logic: A Survey and Problem Definitionen_US
dc.typeTechnical Reporten_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CS-TR-4291.pdf
Size:
305.26 KB
Format:
Adobe Portable Document Format