e-infrastructure Roadmap for Open Science in Agriculture

A bibliometric study

The e-ROSA project seeks to build a shared vision of a future sustainable e-infrastructure for research and education in agriculture in order to promote Open Science in this field and as such contribute to addressing related societal challenges. In order to achieve this goal, e-ROSA’s first objective is to bring together the relevant scientific communities and stakeholders and engage them in the process of coelaboration of an ambitious, practical roadmap that provides the basis for the design and implementation of such an e-infrastructure in the years to come.

This website highlights the results of a bibliometric analysis conducted at a global scale in order to identify key scientists and associated research performing organisations (e.g. public research institutes, universities, Research & Development departments of private companies) that work in the field of agricultural data sources and services. If you have any comment or feedback on the bibliometric study, please use the online form.

You can access and play with the graphs:

Discover all records
Home page


A random walk on an ontology: Using thesaurus structure for automatic subject indexing


Relationships between terms and features are an essential component of thesauri, ontologies, and a range of controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships in a thesaurus or other vocabulary structures. We introduce a methodology for the analysis and modeling of the indexing process based on a weighted random walk algorithm. The primary goal of this research is the analysis of the contribution of thesaurus structure to the indexing process. The resulting models are evaluated in the context of automatic subject indexing using four collections of documents pre-indexed with 4 different thesauri (AGROVOC [UN Food and Agriculture Organization], high-energy physics taxonomy [HEP], National Agricultural Library Thesaurus [NALT], and medical subject headings [MeSH]). We also introduce a thesaurus-centric matching algorithm intended to improve the quality of candidate concepts. In all cases, the weighted random walk improves automatic indexing performance over matching alone with an increase in average precision (AP) of 9% for HEP, 11% for MeSH, 35% for NALT, and 37% for AGROVOC. The results of the analysis support our hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process. The amount that the vocabulary structure contributes was found to differ among the 4 thesauri, possibly due to the vocabulary used in the corresponding thesauri and the structural relationships between the terms. Each of the thesauri and the manual indexing associated with it

  • US
  • Univ_Illinois_Urbana_Champaign (US)
  • Univ_N_Carolina_Chapel_Hill (US)
Data keywords
  • ontology
  • vocabulary
Agriculture keywords
  • agriculture
Data topic
  • semantics
Document type

Inappropriate format for Document type, expected simple value but got array, please use list format

Institutions 10 co-publis
  • Univ_Illinois_Urbana_Champaign (US)
Powered by Lodex 8.20.3
logo commission europeenne
e-ROSA - e-infrastructure Roadmap for Open Science in Agriculture has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 730988.
Disclaimer: The sole responsibility of the material published in this website lies with the authors. The European Union is not responsible for any use that may be made of the information contained therein.