Bioprospecting Research

Microbial Genome Prospecting (Migenpro)

Computational framework for predicting microbial traits from genome sequences using machine learning

Research Overview

The increasing availability of microbial genomic data and recent development of machine learning and AI methods create a unique opportunity to establish associations between genetic information and phenotypes. Here we present a computational framework for Microbial Genome Prospecting (Migenpro), that combines phenotype and genomic linked data.

Migenpro serves as a framework for the generation of machine learning models that predict microbial traits from genome sequences. Microbial whole genome sequences have been consistently annotated and genomic features were stored in a semantic framework using the Genome Biology Ontology Language (GBOL). Phenotype data corresponding to the available genome sequences was retrieved from BacDive and the associations between phenotype and genotype were used to train machine learning models.

Workflow Overview