Human reference gut microbiome catalog including newly assembled genomes from under-represented Asian metagenomes

Chan Yeong Kim, Muyoung Lee, Sunmo Yang, Kyungnam Kim, Dongeun Yong, Hye Ryun Kim, Insuk Lee

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)


Background: Metagenome sampling bias for geographical location and lifestyle is partially responsible for the incomplete catalog of reference genomes of gut microbial species. Thus, genome assembly from currently under-represented populations may effectively expand the reference gut microbiome and improve taxonomic and functional profiling. Methods: We assembled genomes using public whole-metagenomic shotgun sequencing (WMS) data for 110 and 645 fecal samples from India and Japan, respectively. In addition, we assembled genomes from newly generated WMS data for 90 fecal samples collected from Korea. Expecting genome assembly for low-abundance species may require a much deeper sequencing than that usually employed, so we performed ultra-deep WMS (> 30 Gbp or > 100 million read pairs) for the fecal samples from Korea. We consequently assembled 29,082 prokaryotic genomes from 845 fecal metagenomes for the three under-represented Asian countries and combined them with the Unified Human Gastrointestinal Genome (UHGG) to generate an expanded catalog, the Human Reference Gut Microbiome (HRGM). Results: HRGM contains 232,098 non-redundant genomes for 5414 representative prokaryotic species including 780 that are novel, > 103 million unique proteins, and > 274 million single-nucleotide variants. This is an over 10% increase from the UHGG. The new 780 species were enriched for the Bacteroidaceae family, including species associated with high-fiber and seaweed-rich diets. Single-nucleotide variant density was positively associated with the speciation rate of gut commensals. We found that ultra-deep sequencing facilitated the assembly of genomes for low-abundance taxa, and deep sequencing (e.g., > 20 million read pairs) may be needed for the profiling of low-abundance taxa. Importantly, the HRGM significantly improved the taxonomic and functional classification of sequencing reads from fecal samples. Finally, analysis of human self-antigen homologs on the HRGM species genomes suggested that bacterial taxa with high cross-reactivity potential may contribute more to the pathogenesis of gut microbiome-associated diseases than those with low cross-reactivity potential by promoting inflammatory condition. Conclusions: By including gut metagenomes from previously under-represented Asian countries, Korea, India, and Japan, we developed a substantially expanded microbiome catalog, HRGM. Information of the microbial genomes and coding genes is publicly available ( HRGM will facilitate the identification and functional analysis of disease-associated gut microbiota.

Original languageEnglish
Article number134
JournalGenome Medicine
Issue number1
Publication statusPublished - 2021 Dec

Bibliographical note

Funding Information:
This research was supported by the National Research Foundation funded by the Ministry of Science and ICT (2018R1A5A2025079, 2018M3C9A5064709, 2019M3A9B6065192) to IL. The work was supported in part by Brain Korea 21(BK21) FOUR program. We also appreciate the assistance from the KOBIC Research Support Program.

Publisher Copyright:
© 2021, The Author(s).

All Science Journal Classification (ASJC) codes

  • Molecular Medicine
  • Molecular Biology
  • Genetics
  • Genetics(clinical)


Dive into the research topics of 'Human reference gut microbiome catalog including newly assembled genomes from under-represented Asian metagenomes'. Together they form a unique fingerprint.

Cite this