Fork me on GitHub

Data resources

Taxonomy classifiers for use with q2-feature-classifier


Pre-trained classifiers that can be used with q2-feature-classifier currently present a security risk. If using a pre-trained classifier such as the ones provided here, you should trust the person who trained the classifier and the person who provided you with the qza file. This security risk will be addressed in a future version of q2-feature-classifier.


These classifiers were trained using scikit-learn 0.22.1, and therefore can only be used with scikit-learn 0.22.1. If you are using a native installation of QIIME 2, before using these classifiers you should run the following to ensure that you are using the correct version of scikit-learn. If you are using a QIIME 2020.2 virtual machine, scikit-learn 0.21.2 will be installed and you do not need to run this command. The scikit-learn version restriction will be relaxed in a future version of q2-feature-classifier.

conda install --override-channels -c defaults scikit-learn=0.22.1


Taxonomic classifiers perform best when they are trained based on your specific sample preparation and sequencing parameters, including the primers that were used for amplification and the length of your sequence reads. Therefore in general you should follow the instructions in Training feature classifiers with q2-feature-classifier to train your own taxonomic classifiers (for example, from the marker gene reference databases below).

Naive Bayes classifiers trained on:

Marker gene reference databases

These marker gene reference databases are formatted for use with QIIME 1 and QIIME 2. If you’re using these databases with QIIME 2, you’ll need to import them into artifacts before using them.

Greengenes (16S rRNA)

Find more information about Greengenes in the DeSantis (2006) and McDonald (2012) papers.

Silva (16S/18S rRNA)

QIIME-compatible SILVA releases, as well as the licensing information for commercial and non-commercial use, are available at

UNITE (fungal ITS)

All releases are available for download at

Find more information about UNITE at

Microbiome bioinformatics benchmarking

Many microbiome bioinformatics benchmarking studies use mock communities (artificial communities constructed by pooling isolated microorganisms together in known abundances). For example, see Bokulich et al., (2013) and Caporaso et al., (2011). Public mock community data can be downloaded from mockrobiota, which is described in Bokulich et al., (2016).

Public microbiome data

Qiita provides access to many public microbiome datasets. If you’re looking for microbiome data for testing or for meta-analyses, Qiita is a good place to start.

SEPP reference databases

The following databases are intended for use with q2-fragment-insertion, and are constructed directly from the SEPP-Refs project.