Recently developed bioinformatics tools achieve strain-level resolution by identifying accessory genes or capturing nucleotide variants (SNPs). Yet, these tools are hampered by the extent of available reference genomes that are far from covering all microbial variability. The creation of catalogues of non-redundant genes by de novo assembly followed by clustering of co-abundant genes reveals a part of the microbial dark matter by reconstituting the gene repertoire of species potentially unknown. While existing methods accurately identify core genes present in all the strains of a species, they miss many accessory genes or split them into small gene groups that remain unassociated to core genomes. However, capturing these accessory genes is essential in clinical research and epidemiology because they provide functions specific to certain strains such as pathogenicity or antibiotic resistance.
We propose a new bioinformatics method called MSPminer, which by combining the information from hundreds of metagenomic samples, reconstructed the gene repertoire of more than 1600 species of the human intestinal microbiome, of which more than 70% were previously unknown. The Metagenomic Species Pan-genomes (MSPs) capture and distinguish not only the core genes but also the accessory genes of microbial species.
This work provides a detailed mapped of the composition of the gut microbiome and its inter-individual variability opening the way to a better understanding of the link between the microbiota and diseases.