Information Studies Research Works

Permanent URI for this collectionhttp://hdl.handle.net/1903/1632

Browse

Search Results

Now showing 1 - 10 of 33
  • Thumbnail Image
    Item
    PubMed related articles: a probabilistic topic-based model for content similarity
    (Springer Nature, 2007-10-30) Lin, Jimmy; Wilbur, W John
    We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from term frequencies, modeled as Poisson distributions. Unlike previous probabilistic retrieval models, we do not attempt to estimate relevance–but rather our focus is "relatedness", the probability that a user would want to examine a particular document given known interest in another. We also describe a novel technique for estimating parameters that does not require human relevance judgments; instead, the process is based on the existence of MeSH ® in MEDLINE ®. The pmra retrieval model was compared against bm25, a competitive probabilistic model that shares theoretical similarities. Experiments using the test collection from the TREC 2005 genomics track shows a small but statistically significant improvement of pmra over bm25 in terms of precision. Our experiments suggest that the pmra model provides an effective ranking algorithm for related article search.
  • Thumbnail Image
    Item
    PageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval
    (Springer Nature, 2008-06-06) Lin, Jimmy
    Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships from manually-created hyperlinks. We consider the application of these techniques to biomedical text retrieval. In the current PubMed® search interface, a MEDLINE® citation is connected to a number of related citations, which are in turn connected to other citations. Thus, a MEDLINE record represents a node in a vast content-similarity network. This article explores the hypothesis that these networks can be exploited for text retrieval, in the same manner as hyperlink graphs on the Web. We conducted a number of reranking experiments using the TREC 2005 genomics track test collection in which scores extracted from PageRank and HITS analysis were combined with scores returned by an off-the-shelf retrieval engine. Experiments demonstrate that incorporating PageRank scores yields significant improvements in terms of standard ranked-retrieval metrics. The link structure of content-similarity networks can be exploited to improve the effectiveness of information retrieval systems. These results generalize the applicability of graph analysis algorithms to text retrieval in the biomedical domain.
  • Thumbnail Image
    Item
    Is searching full text more effective than searching abstracts?
    (Springer Nature, 2009-02-03) Lin, Jimmy
    With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly access full-text content. Motivated by this emerging trend, I posed the following question: is searching full text more effective than searching abstracts? This question is answered by comparing text retrieval algorithms on MEDLINE® abstracts, full-text articles, and spans (paragraphs) within full-text articles using data from the TREC 2007 genomics track evaluation. Two retrieval models are examined: bm25 and the ranking algorithm implemented in the open-source Lucene search engine. Experiments show that treating an entire article as an indexing unit does not consistently yield higher effectiveness compared to abstract-only search. However, retrieval based on spans, or paragraphs-sized segments of full-text articles, consistently outperforms abstract-only search. Results suggest that highest overall effectiveness may be achieved by combining evidence from spans and full articles. Users searching full text are more likely to find relevant articles than searching only abstracts. This finding affirms the value of full text collections for text retrieval and provides a starting point for future work in exploring algorithms that take advantage of rapidly-growing digital archives. Experimental results also highlight the need to develop distributed text retrieval algorithms, since full-text articles are significantly longer than abstracts and may require the computational resources of multiple machines in a cluster. The MapReduce programming model provides a convenient framework for organizing such computations.
  • Thumbnail Image
    Item
    Harpoon After-Action Report (Vampire I)
    (1985-01-02) Bond, Larry
    Images for Matthew Kirschenbaum's contribution to the MediaCommons/New Everyday Rough Cuts: Media and Design in Process Collection, edited and curated by Kari Kraus and Amalia Levi
  • Thumbnail Image
    Item
    National Online Information Meeting
    (1980-04) Hahn, Trudi Bellardo
    Reports on events at the National Online Information Meeting, held March 25-27, 1980, in New York City.
  • Thumbnail Image
    Item
    Accreditation for Information Science: Has the Time Finally Come?
    (1985-02) Hahn, Trudi Bellardo; Davis, Charles H.
    In September 1984, representatives from 17 American and Canadian library and information science associations met in Chicago, Illinois, to "examine the scope, structure, and costs of accreditation" of library and information science programs.
  • Thumbnail Image
    Item
    User Interfaces for Online Public Access Catalogs: A Research Workshop
    (1992-04) Hahn, Trudi Bellardo
    Describes a workshop held at the Library of Congress in fall 1991 on the design of user interfaces for online library catalogs.
  • Thumbnail Image
    Item
    On-Line Bibliographic System Instruction
    (1978) Hahn, Trudi Bellardo; Kennedy, Gail; Tremoulet, Gretchen
    A course in on-line bibliographic systems was introduced into the curriculum of the College of Library Science at the University of Kentucky. It was taught in five-week sections by three instructors who were practicing librarians and each an expert in one type of bibliographic network: OCLC, MEDLINE, or Lockheed DIALOG. Library space, equipment, and materials were utilized. The over-all goals of the course were to develop terminal skills and related proficiencies and to instill a knowledge of the administrative considerations relative to various kinds of networks. Despite problems encountered related to class size, scheduling, theft of equipment, and supplementary readings, the students evaluated the course highly and the instructors felt it was an over-all success and worth repeating.
  • Thumbnail Image
    Item
    Education and Training for On-Line Searching: A Bibliography
    (1979) Hahn, Trudi Bellardo; Jackson, M. Virginia; Pikoff, Howard
    This annotated bibliography is intended to be used by searchers, educators, library administrators, and other reference department staff who must plan or provide for the training and continuing education of on-line searchers. It was compiled for the MARS Committee on the Education and Training of Search Analysts.
  • Thumbnail Image
    Item
    Text Retrieval Online: Historical Perspective on Web Search Engines
    (American Society for Information Science, 1998-04) Hahn, Trudi Bellardo