Fork me on GitHub

Data resources

Taxonomy classifiers for use with q2-feature-classifier

Taxonomy classifiers have been moved to the external page https://resources.qiime2.org.

Marker gene reference databases

These marker gene reference databases are formatted for use with QIIME 1 and QIIME 2. If you’re using these databases with QIIME 2, you’ll need to import them into artifacts before using them.

Greengenes (16S rRNA)

Find more information about Greengenes in the DeSantis (2006), McDonald (2012), and McDonald (2023) papers.

License Information can be found on the Greengenes website (prior to 2022) or on the Greengenes2 FTP. Greengenes data (prior to 2022) are released under a Creative Commons Attribution-ShareAlike 3.0 License. Greengenes2 data (2022-) are released under a BSD-3 license.

Silva (16S/18S rRNA)

QIIME-compatible SILVA releases (up to release 132), as well as the licensing information for commercial and non-commercial use, are available at https://www.arb-silva.de/download/archive/qiime.

We also provide pre-formatted SILVA reference sequence and taxonomy files here that were processed using RESCRIPt. See licensing information below if you use these files.

Please cite the following references if you use any of these pre-formatted files:

  • Michael S Robeson II, Devon R O’Rourke, Benjamin D Kaehler, Michal Ziemski, Matthew R Dillon, Jeffrey T Foster, Nicholas A Bokulich. RESCRIPt: Reproducible sequence taxonomy reference database management for the masses. bioRxiv 2020.10.05.326504; doi: https://doi.org/10.1101/2020.10.05.326504

  • See the SILVA website for the latest citation information for SILVA.

Note

The Silva reference files provided here include species-level taxonomy. While Silva annotations do include species, Silva does not curate the species-level taxonomy so this information may be unreliable. In a future version of QIIME 2 we will no longer include species-level information in our Silva reference files. This is discussed on the QIIME 2 Forum here (see Species-labels: caveat emptor!).

License Information:

The pre-formatted SILVA reference sequence and taxonomy files above are available under a Creative Commons Attribution 4.0 License (CC-BY 4.0). See the SILVA license for more information.

The files above were downloaded and processed from the SILVA 138 release data using the RESCRIPt plugin and q2-feature-classifier. Sequences were downloaded, reverse-transcribed, and filtered to remove sequences based on length, presence of ambiguous nucleotides and/or homopolymer. Taxonomy was parsed to generate even 7-level rank taxonomic labels, including species labels. Sequences and taxonomies were dereplicated using RESCRIPt. Sequences and taxonomies representing the 515F/806R region of the 16S SSU rRNA gene were extracted with q2-feature-classifier, followed by dereplication with RESCRIPt.

UNITE (fungal ITS)

All releases are available for download at https://unite.ut.ee/repository.php.

Find more information about UNITE at https://unite.ut.ee.

Microbiome bioinformatics benchmarking

Many microbiome bioinformatics benchmarking studies use mock communities (artificial communities constructed by pooling isolated microorganisms together in known abundances). For example, see Bokulich et al., (2013) and Caporaso et al., (2011). Public mock community data can be downloaded from mockrobiota, which is described in Bokulich et al., (2016).

Public microbiome data

Qiita provides access to many public microbiome datasets. If you’re looking for microbiome data for testing or for meta-analyses, Qiita is a good place to start.

SEPP reference databases

The following databases are intended for use with q2-fragment-insertion, and are constructed directly from the SEPP-Refs project.