Taxonomy classifiers for use with q2-feature-classifier¶
Pre-trained classifiers that can be used with
q2-feature-classifier currently present a security risk. If using a pre-trained classifier such as the ones provided here, you should trust the person who trained the classifier and the person who provided you with the qza file. This security risk will be addressed in a future version of
These classifiers were trained using scikit-learn 0.18.1, and therefore can only be used with scikit-learn 0.18.1. If you are using a native installation of QIIME, before using these classifiers you should run the following to ensure that you are using the correct version of scikit-learn. If you are using a QIIME 2.0.6 virtual machine, scikit-learn 0.18.1 will be installed and you do not need to run this command. The scikit-learn version restriction will be relaxed in a future version of
conda install --override-channels -c defaults scikit-learn=0.18.1
Taxonomic classifiers perform best when they are trained based on your specific sample preparation and sequencing parameters, including the primers that were used for amplification and the length of your sequence reads. Therefore in general you should follow the instructions in Training feature classifiers with q2-feature-classifier to train your own taxonomic classifiers (for example, from the marker gene reference databases below). We provided some common classifiers as this step can require a large amount of memory. For example, the Silva classifier linked here required approximately 30 GB of RAM to train.
Naive Bayes classifiers trained on:
- Silva 119 99% OTUs full-length sequences (MD5:
- Silva 119 99% OTUs from 515F/806R region of sequences (MD5:
- Greengenes 13_8 99% OTUs full-length sequences (MD5:
- Greengenes 13_8 99% OTUs from 515F/806R region of sequences (MD5:
Marker gene reference databases¶
These marker gene reference databases are formatted for use with QIIME 1 and QIIME 2. If you’re using these databases with QIIME 2, you’ll need to import them into artifacts before using them.
Greengenes (16S rRNA)¶
Find more information about Greengenes at http://greengenes.secondgenome.com.
Silva (16S/18S rRNA)¶
QIIME-compatible SILVA releases, as well as the licensing information for commercial and non-commercial use, are available at http://www.arb-silva.de/download/archive/qiime.
Microbiome bioinformatics benchmarking¶
Many microbiome bioinformatics benchmarking studies use mock communities (artificial communities constructed by pooling isolated microorganisms together in known abundances). For example, see Bokulich et al., (2013) and Caporaso et al., (2011). Public mock community data can be downloaded from mockrobiota, which is described in Bokulich et al., (2016).