Bioprospecting Research

completed Started: 2023-06

Bioprospecting Research - Migenpro

The increasing availability of microbial genomic data and recent development of machine learning and AI methods create a unique opportunity to establish associations between genetic information and phenotypes.

Research Overview

Migenpro serves as a framework for the generation of machine learning models that predict microbial traits from genome sequences. Microbial whole genome sequences have been consistently annotated and genomic features were stored in a semantic framework using the Genome Biology Ontology Language (GBOL).

Phenotype data corresponding to the available genome sequences was retrieved from BacDive and the associations between phenotype and genotype were used to train machine learning models.

Key Results

Successful Predictions

Our approach successfully predicts traits such as motility, Gram stain, optimal growth temperature range, and sporulation capabilities.

Robust Validation

To ensure robustness, five-fold cross-validation was implemented and demonstrated consistent model performance across iterations and did not indicate overfitting.

Comparative Analysis

The framework’s effectiveness was further validated through comparison with previously published models, showing comparable accuracy, with modest variations attributed to differences in datasets rather than methodology.

Feature Analysis

The classification models can be further explored using feature importance characterization to identify biologically relevant genomic features. Migenpro provides an interoperable framework to predict phenotypes from genomic data.

Research Outputs

Published Preprint: DOI: 10.1101/2025.08.21.671437

This publication presents our methodology, results, and discussion of how machine learning can be applied to microbial genomics for bioprospecting applications.

Impact

Bioprospecting is all about discovering valuable genetic and biochemical resources from biodiversity, and it has huge potential for pharmaceuticals, agriculture, and environmental applications. Migenpro aims to make this process more efficient by using machine learning to analyze biological data.

Tags

bioinformaticsmachine-learninggenomicsGWAS
View Project ← Back to Home