e-infrastructure Roadmap for Open Science in Agriculture

A bibliometric study

The e-ROSA project seeks to build a shared vision of a future sustainable e-infrastructure for research and education in agriculture in order to promote Open Science in this field and as such contribute to addressing related societal challenges. In order to achieve this goal, e-ROSA’s first objective is to bring together the relevant scientific communities and stakeholders and engage them in the process of coelaboration of an ambitious, practical roadmap that provides the basis for the design and implementation of such an e-infrastructure in the years to come.

This website highlights the results of a bibliometric analysis conducted at a global scale in order to identify key scientists and associated research performing organisations (e.g. public research institutes, universities, Research & Development departments of private companies) that work in the field of agricultural data sources and services. If you have any comment or feedback on the bibliometric study, please use the online form.

You can access and play with the graphs:

Discover all records
Home page


A Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins


Background: Members of the phylum Proteobacteria are most prominent among bacteria causing plant diseases that result in a diminution of the quantity and quality of food produced by agriculture. To ameliorate these losses, there is a need to identify infections in early stages. Recent developments in next generation nucleic acid sequencing and mass spectrometry open the door to screening plants by the sequences of their macromolecules. Such an approach requires the ability to recognize the organismal origin of unknown DNA or peptide fragments. There are many ways to approach this problem but none have emerged as the best protocol. Here we attempt a systematic way to determine organismal origins of peptides by using a machine learning algorithm. The algorithm that we implement is a Support Vector Machine (SVM). Result: The amino acid compositions of proteobacterial proteins were found to be different from those of plant proteins. We developed an SVM model based on amino acid and dipeptide compositions to distinguish between a proteobacterial protein and a plant protein. The amino acid composition (AAC) based SVM model had an accuracy of 92.44% with 0.85 Matthews correlation coefficient (MCC) while the dipeptide composition (DC) based SVM model had a maximum accuracy of 94.67% and 0.89 MCC. We also developed SVM models based on a hybrid approach (AAC and DC), which gave a maximum accuracy 94.86% and a 0.90 MCC. The models were tested on unseen or untrained datasets to assess their validity. Conclusion: The results indicate that the SVM based on the AAC and DC hybrid approach can be used to distinguish proteobacterial from plant protein sequences.

  • US
  • Oklahoma_State_Univ_Stillwater (US)
Data keywords
  • machine learning
Agriculture keywords
  • agriculture
Data topic
  • big data
  • information systems
  • modeling
Document type

Inappropriate format for Document type, expected simple value but got array, please use list format

Institutions 10 co-publis
  • Oklahoma_State_Univ_Stillwater (US)
Powered by Lodex 8.20.3
logo commission europeenne
e-ROSA - e-infrastructure Roadmap for Open Science in Agriculture has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 730988.
Disclaimer: The sole responsibility of the material published in this website lies with the authors. The European Union is not responsible for any use that may be made of the information contained therein.