Taxonomic annotation of observed sequences

Taxonomic annotation of observed sequences#

In this chapter we’ll perform annotation of the features that were observed in this study by performing taxonomic classification of the sequences. This allows us to assess what organisms are represented by the ASV sequences that we observed in this study.

Taxonomy assignment#

We’ll start with taxonomic classification. In this step we’ll use a pre-trained Naive Bayes taxonomic classifier. This particular classifier was trained on the Greengenes 13-8 database, where sequences were trimmed to represent only the region between the 515F / 806R primers.

You can download pre-trained classifiers from the QIIME 2 documentation Data Resources page.

First, obtain the taxonomic classifier.

url = 'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/020-taxonomy/gg-13-8-99-515-806-nb-classifier.qza'
fn = 'gg-13-8-99-515-806-nb-classifier.qza'
request.urlretrieve(url, fn)
gg_13_8_99_515_806_nb_classifier = Artifact.load(fn)

url <- 'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/020-taxonomy/gg-13-8-99-515-806-nb-classifier.qza'
fn <- 'gg-13-8-99-515-806-nb-classifier.qza'
request$urlretrieve(url, fn)
gg_13_8_99_515_806_nb_classifier <- Artifact$load(fn)

wget \
  -O 'gg-13-8-99-515-806-nb-classifier.qza' \
  'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/020-taxonomy/gg-13-8-99-515-806-nb-classifier.qza'

def classifier_factory():
    from urllib import request
    from qiime2 import Artifact
    fp, _ = request.urlretrieve(
        'https://data.qiime2.org/classifiers/sklearn-1.4.2/greengenes/gg-13-8-99-515-806-nb-classifier.qza',
    )

    return Artifact.load(fp)

classifier = use.init_artifact('gg-13-8-99-515-806-nb-classifier', classifier_factory)

Using the Upload Data tool:

On the first tab (Regular), press the Paste/Fetch data button at the bottom.
1. Set “Name” (first text-field) to: gg-13-8-99-515-806-nb-classifier.qza
2. In the larger text-area, copy-and-paste: https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/020-taxonomy/gg-13-8-99-515-806-nb-classifier.qza
3. (“Type”, “Genome”, and “Settings” can be ignored)
Press the Start button at the bottom.

gg-13-8-99-515-806-nb-classifier.qza | view | download

Next, use that classifier to assign taxonomic information to the ASV sequences.

import qiime2.plugins.feature_classifier.actions as feature_classifier_actions

taxonomy, = feature_classifier_actions.classify_sklearn(
    classifier=gg_13_8_99_515_806_nb_classifier,
    reads=filtered_sequences_1,
)

feature_classifier_actions <- import("qiime2.plugins.feature_classifier.actions")

action_results <- feature_classifier_actions$classify_sklearn(
    classifier=gg_13_8_99_515_806_nb_classifier,
    reads=filtered_sequences_1,
)
taxonomy <- action_results$classification

qiime feature-classifier classify-sklearn \
  --i-classifier gg-13-8-99-515-806-nb-classifier.qza \
  --i-reads filtered-sequences-1.qza \
  --o-classification taxonomy.qza

taxonomy, = use.action(
    use.UsageAction(plugin_id='feature_classifier', action_id='classify_sklearn'),
    use.UsageInputs(classifier=classifier, reads=filtered_sequences_1),
    use.UsageOutputNames(classification='taxonomy'),
)

Using the qiime2 feature-classifier classify-sklearn tool:

Set “reads” to #: filtered-sequences-1.qza
Set “classifier” to #: gg-13-8-99-515-806-nb-classifier.qza
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 feature-classifier classify-sklearn [...] : classification.qza`	`taxonomy.qza`

taxonomy.qza | view | download

Finally, generate a human-readable summary of the taxonomic annotations.

taxonomy_as_md_md = taxonomy.view(Metadata)
taxonomy_viz, = metadata_actions.tabulate(
    input=taxonomy_as_md_md,
)

taxonomy_as_md_md <- taxonomy$view(Metadata)
action_results <- metadata_actions$tabulate(
    input=taxonomy_as_md_md,
)
taxonomy_viz <- action_results$visualization

qiime metadata tabulate \
  --m-input-file taxonomy.qza \
  --o-visualization taxonomy.qzv

taxonomy_as_md = use.view_as_metadata('taxonomy_as_md', taxonomy)

use.action(
    use.UsageAction(plugin_id='metadata', action_id='tabulate'),
    use.UsageInputs(input=taxonomy_as_md),
    use.UsageOutputNames(visualization='taxonomy'),
)

Using the qiime2 metadata tabulate tool:

For “input”:
- Perform the following steps.
  1. Change to Metadata from Artifact
  2. Set “Metadata Source” to taxonomy.qza
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 metadata tabulate [...] : visualization.qzv`	`taxonomy.qzv`

taxonomy.qzv | view | download

Filtering filters based on their taxonomy#

Taxonomic annotations provide useful information that can also be used in quality filtering of our data. A common step in 16S analysis is to remove sequences from an analysis that aren’t assigned to a phylum. In a human microbiome study such as this, these may for example represent reads of human genome sequence that were unintentionally sequences.

This filtering can be applied as follows by providing the feature table and the taxonomic annotations that were just created. The include parameter here specifies that an annotation must contain the text p__, which in the Greengenes taxonomy is the prefix for all phylum-level taxonomy assignments. Taxonomic labels that don’t contain p__ therefore were maximally assigned to the domain (i.e., kingdom) level. This will also remove features that are annotated with p__; (which means that no named phylum was assigned to the feature), as well as annotations containing Chloroplast or Mitochondria (i.e., organelle 16S sequences).

import qiime2.plugins.taxa.actions as taxa_actions

filtered_table_3, = taxa_actions.filter_table(
    table=filtered_table_2,
    taxonomy=taxonomy,
    mode='contains',
    include='p__',
    exclude='p__;,Chloroplast,Mitochondria',
)

taxa_actions <- import("qiime2.plugins.taxa.actions")

action_results <- taxa_actions$filter_table(
    table=filtered_table_2,
    taxonomy=taxonomy,
    mode='contains',
    include='p__',
    exclude='p__;,Chloroplast,Mitochondria',
)
filtered_table_3 <- action_results$filtered_table

qiime taxa filter-table \
  --i-table filtered-table-2.qza \
  --i-taxonomy taxonomy.qza \
  --p-mode contains \
  --p-include p__ \
  --p-exclude 'p__;,Chloroplast,Mitochondria' \
  --o-filtered-table filtered-table-3.qza

filtered_table_3, = use.action(
    use.UsageAction(plugin_id='taxa', action_id='filter_table'),
    use.UsageInputs(table=filtered_table_2, taxonomy=taxonomy, mode='contains',
                    include='p__', exclude='p__;,Chloroplast,Mitochondria'),
    use.UsageOutputNames(filtered_table='filtered_table_3')
)

Using the qiime2 taxa filter-table tool:

Set “table” to #: filtered-table-2.qza
Set “taxonomy” to #: taxonomy.qza
Expand the additional options section
1. Set “include” to p__
2. Set “exclude” to p__;,Chloroplast,Mitochondria
3. Leave “mode” as its default value of contains
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 taxa filter-table [...] : filtered_table.qza`	`filtered-table-3.qza`

filtered-table-3.qza | view | download

Filtering samples with low sequence counts#

You may have noticed when looking at feature table summaries earlier that some of the samples contained very few ASV sequences. These often represent samples which didn’t amplify or sequence well, and when we start visualizing our data low numbers of sequences can cause misleading results, because the observed composition of the sample may not be reflective of the sample’s actual composition. For this reason it can be helpful to exclude samples with low ASV sequence counts from our samples. Here, we’ll filter out samples from which we have obtained fewer than 10,000 sequences.

filtered_table_4, = feature_table_actions.filter_samples(
    table=filtered_table_3,
    min_frequency=10000,
)

action_results <- feature_table_actions$filter_samples(
    table=filtered_table_3,
    min_frequency=10000L,
)
filtered_table_4 <- action_results$filtered_table

qiime feature-table filter-samples \
  --i-table filtered-table-3.qza \
  --p-min-frequency 10000 \
  --o-filtered-table filtered-table-4.qza

filtered_table_4, = use.action(
    use.UsageAction(plugin_id='feature_table', action_id='filter_samples'),
    use.UsageInputs(table=filtered_table_3, min_frequency=10000),
    use.UsageOutputNames(filtered_table='filtered_table_4')
    )

Using the qiime2 feature-table filter-samples tool:

Set “table” to #: filtered-table-3.qza
Expand the additional options section
- Set “min_frequency” to 10000
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 feature-table filter-samples [...] : filtered_table.qza`	`filtered-table-4.qza`

filtered-table-4.qza | view | download

After filtering ASVs that were not assigned a phylum, and filtering samples with low ASV sequence counts, we can remove ASV sequences that are no longer represented in our table from the collection of ASV sequences by filtering to only the features that are contained in the feature table. This step isn’t required, but can help to speed up some downstream steps.

filtered_sequences_2, = feature_table_actions.filter_seqs(
    data=rep_seqs,
    table=filtered_table_4,
)

action_results <- feature_table_actions$filter_seqs(
    data=rep_seqs,
    table=filtered_table_4,
)
filtered_sequences_2 <- action_results$filtered_data

qiime feature-table filter-seqs \
  --i-data rep-seqs.qza \
  --i-table filtered-table-4.qza \
  --o-filtered-data filtered-sequences-2.qza

filtered_sequences_2, = use.action(
    use.UsageAction(plugin_id='feature_table', action_id='filter_seqs'),
    use.UsageInputs(data=feature_sequences, table=filtered_table_4),
    use.UsageOutputNames(filtered_data='filtered_sequences_2')
    )

Using the qiime2 feature-table filter-seqs tool:

Set “data” to #: rep-seqs.qza
Expand the additional options section
- Set “table” to #: filtered-table-4.qza
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 feature-table filter-seqs [...] : filtered_data.qza`	`filtered-sequences-2.qza`

filtered-sequences-2.qza | view | download

Exercise 4

Try summarizing the feature table that was created by this round of filtering. Expand this box if you need help.

Solution to Exercise 4

filtered_table_4_summ_exercise_viz, = feature_table_actions.summarize(
    table=filtered_table_4,
    sample_metadata=sample_metadata_md,
)

action_results <- feature_table_actions$summarize(
    table=filtered_table_4,
    sample_metadata=sample_metadata_md,
)
filtered_table_4_summ_exercise_viz <- action_results$visualization

qiime feature-table summarize \
  --i-table filtered-table-4.qza \
  --m-sample-metadata-file sample-metadata.tsv \
  --o-visualization filtered-table-4-summ-exercise.qzv

use.action(
    use.UsageAction(plugin_id='feature_table', action_id='summarize'),
    use.UsageInputs(table=filtered_table_4, sample_metadata=sample_metadata),
    use.UsageOutputNames(visualization='filtered_table_4_summ_exercise'),
)

Using the qiime2 feature-table summarize tool:

Set “table” to #: filtered-table-4.qza
Expand the additional options section
- For “sample_metadata”:
  - Press the + Insert sample_metadata button to set up the next steps.
    1. Leave as Metadata from TSV
    2. Set “Metadata Source” to sample-metadata.tsv
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 feature-table summarize [...] : visualization.qzv`	`filtered-table-4-summ-exercise.qzv`

filtered-table-4-summ-exercise.qzv | view | download

Generate taxonomic composition barplots#

We’ll now get one of our first views of our microbiome sample compositions using a taxonomic barplot. This can be generated with the following command.

taxa_bar_plots_1_viz, = taxa_actions.barplot(
    table=filtered_table_4,
    taxonomy=taxonomy,
    metadata=sample_metadata_md,
)

action_results <- taxa_actions$barplot(
    table=filtered_table_4,
    taxonomy=taxonomy,
    metadata=sample_metadata_md,
)
taxa_bar_plots_1_viz <- action_results$visualization

qiime taxa barplot \
  --i-table filtered-table-4.qza \
  --i-taxonomy taxonomy.qza \
  --m-metadata-file sample-metadata.tsv \
  --o-visualization taxa-bar-plots-1.qzv

use.action(
    use.UsageAction(plugin_id='taxa', action_id='barplot'),
    use.UsageInputs(table=filtered_table_4, taxonomy=taxonomy,
                    metadata=sample_metadata),
    use.UsageOutputNames(visualization='taxa_bar_plots_1'),
)

Using the qiime2 taxa barplot tool:

Set “table” to #: filtered-table-4.qza
Expand the additional options section
1. Set “taxonomy” to #: taxonomy.qza
2. For “metadata”:
  - Press the + Insert metadata button to set up the next steps.
    1. Leave as Metadata from TSV
    2. Set “Metadata Source” to sample-metadata.tsv
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 taxa barplot [...] : visualization.qzv`	`taxa-bar-plots-1.qzv`

taxa-bar-plots-1.qzv | view | download

Taxonomic annotation of observed sequences

Contents

Taxonomic annotation of observed sequences#

Taxonomy assignment#

Filtering filters based on their taxonomy#

Filtering samples with low sequence counts#

Generate taxonomic composition barplots#