Training feature classifiers with q2-feature-classifier¶

Note

This guide assumes you have installed QIIME 2 using one of the procedures in the install documents.

This tutorial will demonstrate how to train q2-feature-classifier for a particular dataset. We will train the Naive Bayes classifier using Greengenes reference sequences and classify the representative sequences from the Moving Pictures dataset.

We will download and create several files, so first create a working directory.

mkdir training-feature-classifiers
cd training-feature-classifiers


Obtaining and importing reference data sets¶

Two elements are required for training the classifier: the reference sequences and the corresponding taxonomic classifications. To reduce computation time for this tutorial we will use the relatively small Greengenes 13_8 85% OTU data set.

We will also download the representative sequences from the Moving Pictures tutorial to test our classifier.

Save as: 85_otus.fasta

wget -O "85_otus.fasta" "https://data.qiime2.org/2017.2/tutorials/training-feature-classifiers/85_otus.fasta"
curl -sL "https://data.qiime2.org/2017.2/tutorials/training-feature-classifiers/85_otus.fasta" > "85_otus.fasta"

Save as: 85_otu_taxonomy.txt

wget -O "85_otu_taxonomy.txt" "https://data.qiime2.org/2017.2/tutorials/training-feature-classifiers/85_otu_taxonomy.txt"
curl -sL "https://data.qiime2.org/2017.2/tutorials/training-feature-classifiers/85_otu_taxonomy.txt" > "85_otu_taxonomy.txt"

Save as: rep-seqs.qza

wget -O "rep-seqs.qza" "https://data.qiime2.org/2017.2/tutorials/training-feature-classifiers/rep-seqs.qza"
curl -sL "https://data.qiime2.org/2017.2/tutorials/training-feature-classifiers/rep-seqs.qza" > "rep-seqs.qza"

Next we import these data into QIIME 2 Artifacts.

qiime tools import \
--type FeatureData[Sequence] \
--input-path 85_otus.fasta \
--output-path 85_otus.qza

qiime tools import \
--type FeatureData[Taxonomy] \
--input-path 85_otu_taxonomy.txt \
--output-path ref-taxonomy.qza


Output artifacts:

It has been shown that taxonomic classification accuracy improves when a Naive Bayes classifier is trained on only the region of the target sequences that was sequenced (Werner et al., 2012). We know from the Moving Pictures tutorial that the sequence reads that we’re trying to classify are 100-base single-end reads that were amplified with the 515F/806R primer pair. We optimize for that here by extracting reads from the reference database based on matches to this primer pair, and then slicing the result to 100 bases.

qiime feature-classifier extract-reads \
--i-sequences 85_otus.qza \
--p-f-primer GTGCCAGCMGCCGCGGTAA \
--p-r-primer GGACTACHVGGGTWTCTAAT \
--p-length 100 \


Output artifacts:

Train the classifier¶

We can now train a Naive Bayes classifier as follows, using the reference reads and taxonomy that we just created.

qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-taxonomy ref-taxonomy.qza \
--o-classifier classifier.qza


Output artifacts:

Test the classifier¶

Finally, we verify that the classifier works by classifying the representative sequences from the Moving Pictures tutorial and visualizing the resulting taxonomic assignments.

qiime feature-classifier classify \
--i-classifier classifier.qza \