Multi-method gene clusters at species-level resolution for 125 prokaryotic pangenomes
- 1. Barcelona Supercomputing Centre (BSC-CNS) - Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- 2. Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain.
- 3. Centro de Biotecnología y Genómica de Plantas (UPM-INIA)
Description
This dataset contains 9 sets of species-level gene clusters and high-resolution species trees for 125 representative bacterial and archaeal species, encompassing a total of 6,851 nearly complete genomes. Each set represents a different approach to homology-, orthology-, and synteny-based gene clustering as implemented by 6 popular tools for comparative genomics and pangenome analysis (Roary, panX, OrthoFinder, MMseqs2/PanACoTa, CD-HIT, and eggNOG-mapper).
For Escherichia coli, Cutibacterium acnes, Bacteroides uniformis, and Staphylococcus epidermidis, we provide additional sets that combine high-quality genomes with different proportions of medium- and low-quality metagenome-assembled genomes (MAGs).
This dataset is a helpful resource for benchmarking gene clustering tools and pangenome analysis workflows, as well as for testing their robustness with respect to the presence of incomplete or contaminated genomic assemblies.
Reference: Manzano-Morales S, Liu Y, González-Bodí S, Huerta-Cepas J, Iranzo J. 2022. Comparison of gene clustering criteria reveals intrinsic uncertainty in pangenome analyses. bioRxiv doi: 10.1101/2022.09.25.509376
Notes
Files
README.txt
Files
(20.4 GB)
Name | Size | Download all |
---|---|---|
md5:27cda0233a5e576fb549868ffab9f347
|
2.2 MB | Download |
md5:51572ca67755de3013fd100b79119456
|
288.2 kB | Download |
md5:deea8501aa724082e0d61bfa76c70a4a
|
201.1 MB | Download |
md5:9d6f690879a5c7c89773412bcfd7e7af
|
743.6 MB | Download |
md5:7b44a91a9fd2a5d0b9de8b258b11cf90
|
4.4 GB | Download |
md5:79b2743e31821baebee090ecdf09a87f
|
102.4 kB | Download |
md5:0c74fa9aa1b4f572d643cdec1094bdcc
|
918.9 MB | Download |
md5:17f9e9f349821dceeb50234185c01e17
|
5.2 GB | Download |
md5:110fb0c069766b018de45ae9400264f0
|
9.0 GB | Download |
md5:901da7f8d427879735904106fe9fc4e9
|
14.7 kB | Preview Download |
md5:36591f098b09190d9b9750dd0448a364
|
18.5 kB | Download |
md5:03be0a97e5fb0efb800baaccd333c146
|
368.6 kB | Download |
Additional details
Related works
- Is supplement to
- Preprint: 10.1101/2022.09.25.509376 (DOI)