IuK2001 The 7th Annual Meeting of the IuK Initiative
Information and Communication of the Learned Societies in Germany
»Cooperative Systems«
Trier, March 11 - 14, 2001
 


Creating a virtual library with HPSearch and Mops

Gerd Hoff and Martin Mundhenk

Universität Trier
FB IV - Informatik
D-54286 Trier (Germany)
hoffg@uni-trier.de
mundhenk@ti.uni-trier.de

The fast dissemination of new research results on the world-wide web is a new challenge for search engines. In many research areas, scientists make their newest results electronically available on their web site, long before the results appear in conference proceedings or in journals. Whereas a decade ago, the state of the art in a research area could be found out by reading conference proceedings and journals in the local library, nowadays it is additionally necessary to find the newest related electronic publications on the web - in other words, to maintain a virtual library of not-yet-printed literature. Traditional search engines do not help for this task. E.g., they do not index postscript documents, which is the electronic format of many preprints appearing on the web. The few existing searchable indices for postscript documents either cover too large fields - all of computer science, for example - to be really helpful, or they depend on some submission procedure which delays the appearance of the documents on the web.

We present a new approach for constructing a virtual library of scientific papers which is specialized in a relatively small research area and allows to find the latest new documents.

  • In the first step, we want to find the places in the web where we expect interesting documents to appear.

    Different from other approaches, we do not search for web pages which contain certain keywords, but we search for web pages which are created by scientists who are active in the research area under consideration. For personal virtual bookshelves, this information can e.g. be hand edited. For a larger virtual library, we prefer an automated approach and obtain the scientists' names from computer science bibliographies on the web, namely from Michael Ley's DBLP server (http://dblp.uni-trier.de/). This allows to find the names of scientists who published at certain specialized conferences or in specialized journals, and therefore the names found can be seen as ``certified.''
    Using these names, our (http://pranger.uni-trier.de/hp/) HPSearch system searches the scientists' Home Pages according to the names.
    Locating these Home Pages is a difficult task, because of the lack of any fixed page construction rules. We determine about 500 characteristics that control the search for the Home Pages. Maintaining that information is a further primary task of HPSearch.

  • In the second step, a virtual library is created from the scientific papers found in the area close to the scientists' Home Pages.

    This is performed by our search engine (http://mops.uni-trier.de/) Mops. It creates an index of these papers and makes it accessible on a web server. Whereas the search index is administered on the Mops server, the scientific papers from which it is extracted remain on the servers of their owners. In this way, a virtual and distributed library is generated.

In this project, we developed and implemented (http://pranger.uni-trier.de/hp/) HPSearch and (http://mops.uni-trier.de/) Mops. We tested our approach by creating two example indices. The research area for the one index is complexity theory, and for the other index it is BDDs (binary decision diagrams, a data structure for VLSI design and verification). Both indices are well used in the respective research communities. The whole software runs on standard PCs.
We conclude that such a focused crawling is very effective for building high-quality virtual libraries, using ordinary desktop hardware.

A more detailed description of the system can be found at http://www.informatik.uni-trier.de/~mundhenk/virt-lib/.

 

The IuK 2001 is organized by the German Psychological Association (DGPs) and the Institute for Psychology Information (ZPID).

Last updated: February 7, 2001 · info@zpid.de · URL: http://www.zpid.de/iuk2001/