Taxonomic annotation of observed sequences#

In this chapter we’ll perform annotation of the features that were observed in this study by performing taxonomic classification of the sequences. This allows us to assess what organisms are represented by the ASV sequences that we observed in this study.

Taxonomy assignment#

We’ll start with taxonomic classification. In this step we’ll use a pre-trained Naive Bayes taxonomic classifier. This particular classifier was trained on the Greengenes 13-8 database, where sequences were trimmed to represent only the region between the 515F / 806R primers.

You can download pre-trained classifiers from the QIIME 2 documentation Data Resources page.

First, obtain the taxonomic classifier.

url = 'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/020-taxonomy/gg-13-8-99-515-806-nb-classifier.qza'
fn = 'gg-13-8-99-515-806-nb-classifier.qza'
request.urlretrieve(url, fn)
gg_13_8_99_515_806_nb_classifier = Artifact.load(fn)
url <- 'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/020-taxonomy/gg-13-8-99-515-806-nb-classifier.qza'
fn <- 'gg-13-8-99-515-806-nb-classifier.qza'
request$urlretrieve(url, fn)
gg_13_8_99_515_806_nb_classifier <- Artifact$load(fn)
wget \
  -O 'gg-13-8-99-515-806-nb-classifier.qza' \
  'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/020-taxonomy/gg-13-8-99-515-806-nb-classifier.qza'
def classifier_factory():
    from urllib import request
    from qiime2 import Artifact
    fp, _ = request.urlretrieve(
        'https://data.qiime2.org/classifiers/sklearn-1.4.2/greengenes/gg-13-8-99-515-806-nb-classifier.qza',
    )

    return Artifact.load(fp)

classifier = use.init_artifact('gg-13-8-99-515-806-nb-classifier', classifier_factory)
Using the Upload Data tool:
  1. On the first tab (Regular), press the Paste/Fetch data button at the bottom.

    1. Set “Name” (first text-field) to: gg-13-8-99-515-806-nb-classifier.qza

    2. In the larger text-area, copy-and-paste: https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/020-taxonomy/gg-13-8-99-515-806-nb-classifier.qza

    3. (“Type”, “Genome”, and “Settings” can be ignored)

  2. Press the Start button at the bottom.

Next, use that classifier to assign taxonomic information to the ASV sequences.

import qiime2.plugins.feature_classifier.actions as feature_classifier_actions

taxonomy, = feature_classifier_actions.classify_sklearn(
    classifier=gg_13_8_99_515_806_nb_classifier,
    reads=filtered_sequences_1,
)
feature_classifier_actions <- import("qiime2.plugins.feature_classifier.actions")

action_results <- feature_classifier_actions$classify_sklearn(
    classifier=gg_13_8_99_515_806_nb_classifier,
    reads=filtered_sequences_1,
)
taxonomy <- action_results$classification
qiime feature-classifier classify-sklearn \
  --i-classifier gg-13-8-99-515-806-nb-classifier.qza \
  --i-reads filtered-sequences-1.qza \
  --o-classification taxonomy.qza
taxonomy, = use.action(
    use.UsageAction(plugin_id='feature_classifier', action_id='classify_sklearn'),
    use.UsageInputs(classifier=classifier, reads=filtered_sequences_1),
    use.UsageOutputNames(classification='taxonomy'),
)
Using the qiime2 feature-classifier classify-sklearn tool:
  1. Set “reads” to #: filtered-sequences-1.qza

  2. Set “classifier” to #: gg-13-8-99-515-806-nb-classifier.qza

  3. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 feature-classifier classify-sklearn [...] : classification.qza

taxonomy.qza

Finally, generate a human-readable summary of the taxonomic annotations.

taxonomy_as_md_md = taxonomy.view(Metadata)
taxonomy_viz, = metadata_actions.tabulate(
    input=taxonomy_as_md_md,
)
taxonomy_as_md_md <- taxonomy$view(Metadata)
action_results <- metadata_actions$tabulate(
    input=taxonomy_as_md_md,
)
taxonomy_viz <- action_results$visualization
qiime metadata tabulate \
  --m-input-file taxonomy.qza \
  --o-visualization taxonomy.qzv
taxonomy_as_md = use.view_as_metadata('taxonomy_as_md', taxonomy)

use.action(
    use.UsageAction(plugin_id='metadata', action_id='tabulate'),
    use.UsageInputs(input=taxonomy_as_md),
    use.UsageOutputNames(visualization='taxonomy'),
)
Using the qiime2 metadata tabulate tool:
  1. For “input”:

    • Perform the following steps.

      1. Change to Metadata from Artifact

      2. Set “Metadata Source” to taxonomy.qza

  2. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 metadata tabulate [...] : visualization.qzv

taxonomy.qzv

Filtering filters based on their taxonomy#

Taxonomic annotations provide useful information that can also be used in quality filtering of our data. A common step in 16S analysis is to remove sequences from an analysis that aren’t assigned to a phylum. In a human microbiome study such as this, these may for example represent reads of human genome sequence that were unintentionally sequences.

This filtering can be applied as follows by providing the feature table and the taxonomic annotations that were just created. The include parameter here specifies that an annotation must contain the text p__, which in the Greengenes taxonomy is the prefix for all phylum-level taxonomy assignments. Taxonomic labels that don’t contain p__ therefore were maximally assigned to the domain (i.e., kingdom) level. This will also remove features that are annotated with p__; (which means that no named phylum was assigned to the feature), as well as annotations containing Chloroplast or Mitochondria (i.e., organelle 16S sequences).

import qiime2.plugins.taxa.actions as taxa_actions

filtered_table_3, = taxa_actions.filter_table(
    table=filtered_table_2,
    taxonomy=taxonomy,
    mode='contains',
    include='p__',
    exclude='p__;,Chloroplast,Mitochondria',
)
taxa_actions <- import("qiime2.plugins.taxa.actions")

action_results <- taxa_actions$filter_table(
    table=filtered_table_2,
    taxonomy=taxonomy,
    mode='contains',
    include='p__',
    exclude='p__;,Chloroplast,Mitochondria',
)
filtered_table_3 <- action_results$filtered_table
qiime taxa filter-table \
  --i-table filtered-table-2.qza \
  --i-taxonomy taxonomy.qza \
  --p-mode contains \
  --p-include p__ \
  --p-exclude 'p__;,Chloroplast,Mitochondria' \
  --o-filtered-table filtered-table-3.qza
filtered_table_3, = use.action(
    use.UsageAction(plugin_id='taxa', action_id='filter_table'),
    use.UsageInputs(table=filtered_table_2, taxonomy=taxonomy, mode='contains',
                    include='p__', exclude='p__;,Chloroplast,Mitochondria'),
    use.UsageOutputNames(filtered_table='filtered_table_3')
)
Using the qiime2 taxa filter-table tool:
  1. Set “table” to #: filtered-table-2.qza

  2. Set “taxonomy” to #: taxonomy.qza

  3. Expand the additional options section

    1. Set “include” to p__

    2. Set “exclude” to p__;,Chloroplast,Mitochondria

    3. Leave “mode” as its default value of contains

  4. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 taxa filter-table [...] : filtered_table.qza

filtered-table-3.qza

Filtering samples with low sequence counts#

You may have noticed when looking at feature table summaries earlier that some of the samples contained very few ASV sequences. These often represent samples which didn’t amplify or sequence well, and when we start visualizing our data low numbers of sequences can cause misleading results, because the observed composition of the sample may not be reflective of the sample’s actual composition. For this reason it can be helpful to exclude samples with low ASV sequence counts from our samples. Here, we’ll filter out samples from which we have obtained fewer than 10,000 sequences.

filtered_table_4, = feature_table_actions.filter_samples(
    table=filtered_table_3,
    min_frequency=10000,
)
action_results <- feature_table_actions$filter_samples(
    table=filtered_table_3,
    min_frequency=10000L,
)
filtered_table_4 <- action_results$filtered_table
qiime feature-table filter-samples \
  --i-table filtered-table-3.qza \
  --p-min-frequency 10000 \
  --o-filtered-table filtered-table-4.qza
filtered_table_4, = use.action(
    use.UsageAction(plugin_id='feature_table', action_id='filter_samples'),
    use.UsageInputs(table=filtered_table_3, min_frequency=10000),
    use.UsageOutputNames(filtered_table='filtered_table_4')
    )
Using the qiime2 feature-table filter-samples tool:
  1. Set “table” to #: filtered-table-3.qza

  2. Expand the additional options section

    • Set “min_frequency” to 10000

  3. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 feature-table filter-samples [...] : filtered_table.qza

filtered-table-4.qza

After filtering ASVs that were not assigned a phylum, and filtering samples with low ASV sequence counts, we can remove ASV sequences that are no longer represented in our table from the collection of ASV sequences by filtering to only the features that are contained in the feature table. This step isn’t required, but can help to speed up some downstream steps.

filtered_sequences_2, = feature_table_actions.filter_seqs(
    data=rep_seqs,
    table=filtered_table_4,
)
action_results <- feature_table_actions$filter_seqs(
    data=rep_seqs,
    table=filtered_table_4,
)
filtered_sequences_2 <- action_results$filtered_data
qiime feature-table filter-seqs \
  --i-data rep-seqs.qza \
  --i-table filtered-table-4.qza \
  --o-filtered-data filtered-sequences-2.qza
filtered_sequences_2, = use.action(
    use.UsageAction(plugin_id='feature_table', action_id='filter_seqs'),
    use.UsageInputs(data=feature_sequences, table=filtered_table_4),
    use.UsageOutputNames(filtered_data='filtered_sequences_2')
    )
Using the qiime2 feature-table filter-seqs tool:
  1. Set “data” to #: rep-seqs.qza

  2. Expand the additional options section

    • Set “table” to #: filtered-table-4.qza

  3. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 feature-table filter-seqs [...] : filtered_data.qza

filtered-sequences-2.qza

Exercise 4

Try summarizing the feature table that was created by this round of filtering. Expand this box if you need help.

Generate taxonomic composition barplots#

We’ll now get one of our first views of our microbiome sample compositions using a taxonomic barplot. This can be generated with the following command.

taxa_bar_plots_1_viz, = taxa_actions.barplot(
    table=filtered_table_4,
    taxonomy=taxonomy,
    metadata=sample_metadata_md,
)
action_results <- taxa_actions$barplot(
    table=filtered_table_4,
    taxonomy=taxonomy,
    metadata=sample_metadata_md,
)
taxa_bar_plots_1_viz <- action_results$visualization
qiime taxa barplot \
  --i-table filtered-table-4.qza \
  --i-taxonomy taxonomy.qza \
  --m-metadata-file sample-metadata.tsv \
  --o-visualization taxa-bar-plots-1.qzv
use.action(
    use.UsageAction(plugin_id='taxa', action_id='barplot'),
    use.UsageInputs(table=filtered_table_4, taxonomy=taxonomy,
                    metadata=sample_metadata),
    use.UsageOutputNames(visualization='taxa_bar_plots_1'),
)
Using the qiime2 taxa barplot tool:
  1. Set “table” to #: filtered-table-4.qza

  2. Expand the additional options section

    1. Set “taxonomy” to #: taxonomy.qza

    2. For “metadata”:

      • Press the + Insert metadata button to set up the next steps.

        1. Leave as Metadata from TSV

        2. Set “Metadata Source” to sample-metadata.tsv

  3. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 taxa barplot [...] : visualization.qzv

taxa-bar-plots-1.qzv