Taxonomic annotation of observed sequences#
In this chapter we’ll perform annotation of the features that were observed in this study by performing taxonomic classification of the sequences. This allows us to assess what organisms are represented by the ASV sequences that we observed in this study.
Taxonomy assignment#
We’ll start with taxonomic classification. In this step we’ll use a pre-trained Naive Bayes taxonomic classifier. This particular classifier was trained on the Greengenes 13-8 database, where sequences were trimmed to represent only the region between the 515F / 806R primers.
You can download pre-trained classifiers from the QIIME 2 documentation Data Resources page.
First, obtain the taxonomic classifier.
url = 'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/020-taxonomy/gg-13-8-99-515-806-nb-classifier.qza'
fn = 'gg-13-8-99-515-806-nb-classifier.qza'
request.urlretrieve(url, fn)
gg_13_8_99_515_806_nb_classifier = Artifact.load(fn)
url <- 'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/020-taxonomy/gg-13-8-99-515-806-nb-classifier.qza'
fn <- 'gg-13-8-99-515-806-nb-classifier.qza'
request$urlretrieve(url, fn)
gg_13_8_99_515_806_nb_classifier <- Artifact$load(fn)
wget \
-O 'gg-13-8-99-515-806-nb-classifier.qza' \
'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/020-taxonomy/gg-13-8-99-515-806-nb-classifier.qza'
def classifier_factory():
from urllib import request
from qiime2 import Artifact
fp, _ = request.urlretrieve(
'https://data.qiime2.org/classifiers/sklearn-1.4.2/greengenes/gg-13-8-99-515-806-nb-classifier.qza',
)
return Artifact.load(fp)
classifier = use.init_artifact('gg-13-8-99-515-806-nb-classifier', classifier_factory)
- Using the
Upload Data
tool: On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.Set “Name” (first text-field) to:
gg-13-8-99-515-806-nb-classifier.qza
In the larger text-area, copy-and-paste: https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/020-taxonomy/gg-13-8-99-515-806-nb-classifier.qza
(“Type”, “Genome”, and “Settings” can be ignored)
Press the
Start
button at the bottom.
Next, use that classifier to assign taxonomic information to the ASV sequences.
import qiime2.plugins.feature_classifier.actions as feature_classifier_actions
taxonomy, = feature_classifier_actions.classify_sklearn(
classifier=gg_13_8_99_515_806_nb_classifier,
reads=filtered_sequences_1,
)
feature_classifier_actions <- import("qiime2.plugins.feature_classifier.actions")
action_results <- feature_classifier_actions$classify_sklearn(
classifier=gg_13_8_99_515_806_nb_classifier,
reads=filtered_sequences_1,
)
taxonomy <- action_results$classification
qiime feature-classifier classify-sklearn \
--i-classifier gg-13-8-99-515-806-nb-classifier.qza \
--i-reads filtered-sequences-1.qza \
--o-classification taxonomy.qza
taxonomy, = use.action(
use.UsageAction(plugin_id='feature_classifier', action_id='classify_sklearn'),
use.UsageInputs(classifier=classifier, reads=filtered_sequences_1),
use.UsageOutputNames(classification='taxonomy'),
)
- Using the
qiime2 feature-classifier classify-sklearn
tool: Set “reads” to
#: filtered-sequences-1.qza
Set “classifier” to
#: gg-13-8-99-515-806-nb-classifier.qza
Press the
Execute
button.
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 feature-classifier classify-sklearn [...] : classification.qza
taxonomy.qza
Finally, generate a human-readable summary of the taxonomic annotations.
taxonomy_as_md_md = taxonomy.view(Metadata)
taxonomy_viz, = metadata_actions.tabulate(
input=taxonomy_as_md_md,
)
taxonomy_as_md_md <- taxonomy$view(Metadata)
action_results <- metadata_actions$tabulate(
input=taxonomy_as_md_md,
)
taxonomy_viz <- action_results$visualization
qiime metadata tabulate \
--m-input-file taxonomy.qza \
--o-visualization taxonomy.qzv
taxonomy_as_md = use.view_as_metadata('taxonomy_as_md', taxonomy)
use.action(
use.UsageAction(plugin_id='metadata', action_id='tabulate'),
use.UsageInputs(input=taxonomy_as_md),
use.UsageOutputNames(visualization='taxonomy'),
)
- Using the
qiime2 metadata tabulate
tool: For “input”:
Perform the following steps.
Change to
Metadata from Artifact
Set “Metadata Source” to
taxonomy.qza
Press the
Execute
button.
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 metadata tabulate [...] : visualization.qzv
taxonomy.qzv
Filtering filters based on their taxonomy#
Taxonomic annotations provide useful information that can also be used in quality filtering of our data. A common step in 16S analysis is to remove sequences from an analysis that aren’t assigned to a phylum. In a human microbiome study such as this, these may for example represent reads of human genome sequence that were unintentionally sequences.
This filtering can be applied as follows by providing the feature table and the
taxonomic annotations that were just created. The include
parameter here
specifies that an annotation must contain the text p__
, which in the
Greengenes taxonomy is the prefix for all phylum-level taxonomy assignments.
Taxonomic labels that don’t contain p__
therefore were maximally assigned to
the domain (i.e., kingdom) level. This will also remove features that are
annotated with p__;
(which means that no named phylum was assigned to the
feature), as well as annotations containing Chloroplast
or Mitochondria
(i.e., organelle 16S sequences).
import qiime2.plugins.taxa.actions as taxa_actions
filtered_table_3, = taxa_actions.filter_table(
table=filtered_table_2,
taxonomy=taxonomy,
mode='contains',
include='p__',
exclude='p__;,Chloroplast,Mitochondria',
)
taxa_actions <- import("qiime2.plugins.taxa.actions")
action_results <- taxa_actions$filter_table(
table=filtered_table_2,
taxonomy=taxonomy,
mode='contains',
include='p__',
exclude='p__;,Chloroplast,Mitochondria',
)
filtered_table_3 <- action_results$filtered_table
qiime taxa filter-table \
--i-table filtered-table-2.qza \
--i-taxonomy taxonomy.qza \
--p-mode contains \
--p-include p__ \
--p-exclude 'p__;,Chloroplast,Mitochondria' \
--o-filtered-table filtered-table-3.qza
filtered_table_3, = use.action(
use.UsageAction(plugin_id='taxa', action_id='filter_table'),
use.UsageInputs(table=filtered_table_2, taxonomy=taxonomy, mode='contains',
include='p__', exclude='p__;,Chloroplast,Mitochondria'),
use.UsageOutputNames(filtered_table='filtered_table_3')
)
- Using the
qiime2 taxa filter-table
tool: Set “table” to
#: filtered-table-2.qza
Set “taxonomy” to
#: taxonomy.qza
Expand the
additional options
sectionSet “include” to
p__
Set “exclude” to
p__;,Chloroplast,Mitochondria
Leave “mode” as its default value of
contains
Press the
Execute
button.
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 taxa filter-table [...] : filtered_table.qza
filtered-table-3.qza
Filtering samples with low sequence counts#
You may have noticed when looking at feature table summaries earlier that some of the samples contained very few ASV sequences. These often represent samples which didn’t amplify or sequence well, and when we start visualizing our data low numbers of sequences can cause misleading results, because the observed composition of the sample may not be reflective of the sample’s actual composition. For this reason it can be helpful to exclude samples with low ASV sequence counts from our samples. Here, we’ll filter out samples from which we have obtained fewer than 10,000 sequences.
filtered_table_4, = feature_table_actions.filter_samples(
table=filtered_table_3,
min_frequency=10000,
)
action_results <- feature_table_actions$filter_samples(
table=filtered_table_3,
min_frequency=10000L,
)
filtered_table_4 <- action_results$filtered_table
qiime feature-table filter-samples \
--i-table filtered-table-3.qza \
--p-min-frequency 10000 \
--o-filtered-table filtered-table-4.qza
filtered_table_4, = use.action(
use.UsageAction(plugin_id='feature_table', action_id='filter_samples'),
use.UsageInputs(table=filtered_table_3, min_frequency=10000),
use.UsageOutputNames(filtered_table='filtered_table_4')
)
- Using the
qiime2 feature-table filter-samples
tool: Set “table” to
#: filtered-table-3.qza
Expand the
additional options
sectionSet “min_frequency” to
10000
Press the
Execute
button.
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 feature-table filter-samples [...] : filtered_table.qza
filtered-table-4.qza
After filtering ASVs that were not assigned a phylum, and filtering samples with low ASV sequence counts, we can remove ASV sequences that are no longer represented in our table from the collection of ASV sequences by filtering to only the features that are contained in the feature table. This step isn’t required, but can help to speed up some downstream steps.
filtered_sequences_2, = feature_table_actions.filter_seqs(
data=rep_seqs,
table=filtered_table_4,
)
action_results <- feature_table_actions$filter_seqs(
data=rep_seqs,
table=filtered_table_4,
)
filtered_sequences_2 <- action_results$filtered_data
qiime feature-table filter-seqs \
--i-data rep-seqs.qza \
--i-table filtered-table-4.qza \
--o-filtered-data filtered-sequences-2.qza
filtered_sequences_2, = use.action(
use.UsageAction(plugin_id='feature_table', action_id='filter_seqs'),
use.UsageInputs(data=feature_sequences, table=filtered_table_4),
use.UsageOutputNames(filtered_data='filtered_sequences_2')
)
- Using the
qiime2 feature-table filter-seqs
tool: Set “data” to
#: rep-seqs.qza
Expand the
additional options
sectionSet “table” to
#: filtered-table-4.qza
Press the
Execute
button.
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 feature-table filter-seqs [...] : filtered_data.qza
filtered-sequences-2.qza
Try summarizing the feature table that was created by this round of filtering. Expand this box if you need help.
Solution to
filtered_table_4_summ_exercise_viz, = feature_table_actions.summarize(
table=filtered_table_4,
sample_metadata=sample_metadata_md,
)
action_results <- feature_table_actions$summarize(
table=filtered_table_4,
sample_metadata=sample_metadata_md,
)
filtered_table_4_summ_exercise_viz <- action_results$visualization
qiime feature-table summarize \
--i-table filtered-table-4.qza \
--m-sample-metadata-file sample-metadata.tsv \
--o-visualization filtered-table-4-summ-exercise.qzv
use.action(
use.UsageAction(plugin_id='feature_table', action_id='summarize'),
use.UsageInputs(table=filtered_table_4, sample_metadata=sample_metadata),
use.UsageOutputNames(visualization='filtered_table_4_summ_exercise'),
)
- Using the
qiime2 feature-table summarize
tool: Set “table” to
#: filtered-table-4.qza
Expand the
additional options
sectionFor “sample_metadata”:
Press the
+ Insert sample_metadata
button to set up the next steps.Leave as
Metadata from TSV
Set “Metadata Source” to
sample-metadata.tsv
Press the
Execute
button.
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 feature-table summarize [...] : visualization.qzv
filtered-table-4-summ-exercise.qzv
Generate taxonomic composition barplots#
We’ll now get one of our first views of our microbiome sample compositions using a taxonomic barplot. This can be generated with the following command.
taxa_bar_plots_1_viz, = taxa_actions.barplot(
table=filtered_table_4,
taxonomy=taxonomy,
metadata=sample_metadata_md,
)
action_results <- taxa_actions$barplot(
table=filtered_table_4,
taxonomy=taxonomy,
metadata=sample_metadata_md,
)
taxa_bar_plots_1_viz <- action_results$visualization
qiime taxa barplot \
--i-table filtered-table-4.qza \
--i-taxonomy taxonomy.qza \
--m-metadata-file sample-metadata.tsv \
--o-visualization taxa-bar-plots-1.qzv
use.action(
use.UsageAction(plugin_id='taxa', action_id='barplot'),
use.UsageInputs(table=filtered_table_4, taxonomy=taxonomy,
metadata=sample_metadata),
use.UsageOutputNames(visualization='taxa_bar_plots_1'),
)
- Using the
qiime2 taxa barplot
tool: Set “table” to
#: filtered-table-4.qza
Expand the
additional options
sectionSet “taxonomy” to
#: taxonomy.qza
For “metadata”:
Press the
+ Insert metadata
button to set up the next steps.Leave as
Metadata from TSV
Set “Metadata Source” to
sample-metadata.tsv
Press the
Execute
button.
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 taxa barplot [...] : visualization.qzv
taxa-bar-plots-1.qzv