EN | FR | DE | CN |
About HON

About the
 Mission & users 
 Who we are 
 Accredited by HAS 
 NGO status 
 Awards & Recognition 
 HON History  
Research &
 In progress 
       Active Health
       Google Co-op
       Joslin Dubai
       iphone HealthPedia
       Hunt Services
       Quality Detector
    Project for
       developing countries
 French-speaking Africa 
 English-speaking Africa 
 Arabic-speaking countries 
 Media Corner 
 Helping HON and Funding 
 Jobs offers  
 Ethical policy 


With the number of World Wide Web sites growing every day, the problem is not just to find information, but to locate the right piece of information. Current solutions for structuring information, subject hierarchies and general search engines have both advantages and drawbacks. Subject hierarchies are precise enough due to manual classification. However, the number of results provided in response to an average query is usually low due to the small amount of documents indexed. General World Wide Web search engines, indexing most of the Web, return a long list of documents, but often to the detriment of precision. The search result is then barely usable because of the large number of answers from different domains and topics. Only complex queries may, in a given situation, produce a limited number of potentially relevant documents. To make searches more efficient and useful to ordinary users, we need intelligent and specialised search engines on the Net.

The primary objective of MARVIN project (Multi-Agent Retrieval Vagabond on Information Networks), started in January 1996, was to reduce the search space by considering and indexing only a given field by filtering Web pages and to support the multilinguality of the Web. MARVIN, HON's own Web-spider, was first applied to the medical domain. Armed with a dictionary of medical terms, MARVIN tirelessly skims the Web for new sources of medical information. MARVIN feeds and constantly updates MedHunt, HON's medical and health search engine. The 16th November 2000, 2'000 visits (different computer) and 8'000 accesses to MedHunt show the effectiveness and utility of this complementary set MARVIN - MedHunt.

MARVIN and MedHunt have been developed and are the property of HON.

How does it work?

MARVIN (Multi-Agent Retrieval Vagabond on Information Networks) searches the Web and selects only documents that are relevant to a specific and chosen domain. Document relevance is computed according to a formula that takes into consideration the number of words from a glossary of significant terms that MARVIN finds in the document, as well as their place in the document. MARVIN has first been applied to healthcare.

MARVIN stores selected documents in a database that users can then query, for example MedHunt, HON's own medical search engine. MARVIN is also applied to a variety of scientific domains, such as molecular biology and 2-D electrophoresis, constantly feeding and updating the different databases.


MARVIN was designed as a multi-agent softbot (Fig.1). Each agent possesses filtering capabilities. The agent downloads Web pages and computes the medical "score" of each page. Using a glossary of medical terms which calculates the frequency of the appearance of words in the glossary.Categorising documents: medical or not?

The score processed by MARVIN defines if a Web page is medical or health-related or not by adding up the number of medical terms in the document, taking into account the different translations and the weight of each medical terms as defined by the built-in glossary.

In the medical domain many thesaurusi and glossaries already existed such as the MeSH (Medical Subject Headings) from the National Library of Medicine (NLM) and the glossary in nine European languages developed at the Heymans Institute of Pharmacology, University of Ghent, Belgium, within the framework of a European project. For our application, HON built its own thesaurus by compiling several of these sources. Starting with bilingal (English/French) medical terms (12,000), the thesaurus was expanded with Danish, Dutch, German, Italian, Portuguese and Spanish, resulting in a thesaurus of 20,000 multilingual medical terms (not counting the 33,000 MeSH terms).

Studies were undertaken to provide an estimate of the relative importance of a term in a document and in a collection of documents, allowing us to weight each medical term included in our medical glossary. 1,000 documents known to be related to the medical and health topics and 1,000 related to other domains except medical and health were analysed. The medical terms included in each Web page were then evaluated. This study, associated with other techniques such as the formula of Wilbur and Yang (An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts, Comp. Bio. Med. 26.3 p. 209-222, 1996) allowed us to define a threshold for each terms contained in our medical glossary.

Using our multilingual medical thesaurus of 50,000 terms, the download of Web pages and the calculation of a score according to the page content, MARVIN generates using a classical inverted index: in which each word is associated with the list of documents containing the word. Matching the requested terms is then a simple and efficient task.


Fig. 1 MARVIN multi-agent architecture


What is it used for ?

Health On the Net Foundation (HON) and the Swiss Institute of Bioinformatics (SIB) at Geneva University Hospital have developed MARVIN -Multi-Agent Retrieval Vagabond on Information Networks- a robot that searches sites and documents. Robots like this are already in use for health and medicine as well as other domains such as molecular biology.

Medical and health domain
  Other domains of application

BioHunt: Molecular Biology search engine BioHunt

2DHunt: 2D Electrophoresis search engine
MARVIN was supported in part by the Swiss National Fund for Scientific Research under grant # 21-43501.95.

Home img About us img MediaCorner img HON newsletter img Site map img Ethical policies img Contact