The e-ROSA project seeks to build a shared vision of a future sustainable e-infrastructure for research and education in agriculture in order to promote Open Science in this field and as such contribute to addressing related societal challenges. In order to achieve this goal, e-ROSA’s first objective is to bring together the relevant scientific communities and stakeholders and engage them in the process of coelaboration of an ambitious, practical roadmap that provides the basis for the design and implementation of such an e-infrastructure in the years to come.
This website highlights the results of a bibliometric analysis conducted at a global scale in order to identify key scientists and associated research performing organisations (e.g. public research institutes, universities, Research & Development departments of private companies) that work in the field of agricultural data sources and services. If you have any comment or feedback on the bibliometric study, please use the online form.
You can access and play with the graphs:
- Evolution of the number of publications between 2005 and 2015
- Map of most publishing countries between 2005 and 2015
- Network of country collaborations
- Network of institutional collaborations (+10 publications)
- Network of keywords relating to data - Link
Background: Genomic selection has gained much attention and the main goal is to increase the predictive accuracy and the genetic gain in livestock using dense marker information. Most methods dealing with the large p (number of covariates) small n (number of observations) problem have dealt only with continuous traits, but there are many important traits in livestock that are recorded in a discrete fashion (e. g. pregnancy outcome, disease resistance). It is necessary to evaluate alternatives to analyze discrete traits in a genome-wide prediction context. Methods: This study shows two threshold versions of Bayesian regressions (Bayes A and Bayesian LASSO) and two machine learning algorithms (boosting and random forest) to analyze discrete traits in a genome-wide prediction context. These methods were evaluated using simulated and field data to predict yet-to-be observed records. Performances were compared based on the models' predictive ability. Results: The simulation showed that machine learning had some advantages over Bayesian regressions when a small number of QTL regulated the trait under pure additivity. However, differences were small and disappeared with a large number of QTL. Bayesian threshold LASSO and boosting achieved the highest accuracies, whereas Random Forest presented the highest classification performance. Random Forest was the most consistent method in detecting resistant and susceptible animals, phi correlation was up to 81% greater than Bayesian regressions. Random Forest outperformed other methods in correctly classifying resistant and susceptible animals in the two pure swine lines evaluated. Boosting and Bayes A were more accurate with crossbred data. Conclusions: The results of this study suggest that the best method for genome-wide prediction may depend on the genetic basis of the population analyzed. All methods were less accurate at correctly classifying intermediate animals than extreme animals. Among the different alternatives proposed to analyze discrete traits, machine-learning showed some advantages over Bayesian regressions. Boosting with a pseudo Huber loss function showed high accuracy, whereas Random Forest produced more consistent results and an interesting predictive ability. Nonetheless, the best method may be case-dependent and a initial evaluation of different methods is recommended to deal with a particular problem.
Inappropriate format for Document type, expected simple value but got array, please use list format