Denoising sequence data with DADA2#

Performing sequence quality control (i.e., denoising)#

Next, we’ll perform quality control or denoising of the sequence data with DADA2 Callahan et al. [CMR+16], which is accessible through the q2-dada2 plugin. Since our reads are paired end, we’ll use the denoise_paired action in the q2-dada2 plugin. This performs quality filtering, chimera checking, and paired- end read joining.

The denoise_paired action requires a few parameters that you’ll set based on the sequence quality score plots that you previously generated in the summary of the demultiplex reads. You should review those plots and identify where the quality begins to decrease, and use that information to set the trunc_len_* parameters. You’ll set that for both the forward and reverse reads using the trunc_len_f and trunc_len_r parameters, respectively. If you notice a region of lower quality in the beginning of the forward and/or reverse reads, you can optionally trim bases from the beginning of the reads using the trim_left_f and trim_left_r parameters for the forward and reverse reads, respectively.

Spend a couple of minutes reviewing the quality score plots and think about where you might want to truncate the forward and reverse reads, and if you’d like to trim any bases from the beginnings.

import qiime2.plugins.dada2.actions as dada2_actions

feature_table_0, asv_sequences_0, dada2_stats = dada2_actions.denoise_paired(
    demultiplexed_seqs=demultiplexed_sequences,
    trunc_len_f=204,
    trim_left_r=1,
    trunc_len_r=205,
)
dada2_actions <- import("qiime2.plugins.dada2.actions")

action_results <- dada2_actions$denoise_paired(
    demultiplexed_seqs=demultiplexed_sequences,
    trunc_len_f=204L,
    trim_left_r=1L,
    trunc_len_r=205L,
)
asv_sequences_0 <- action_results$representative_sequences
feature_table_0 <- action_results$table
dada2_stats <- action_results$denoising_stats
qiime dada2 denoise-paired \
  --i-demultiplexed-seqs demultiplexed-sequences.qza \
  --p-trunc-len-f 204 \
  --p-trim-left-r 1 \
  --p-trunc-len-r 205 \
  --o-representative-sequences asv-sequences-0.qza \
  --o-table feature-table-0.qza \
  --o-denoising-stats dada2-stats.qza
asv_sequences_0, feature_table_0, dada2_stats = use.action(
    use.UsageAction(plugin_id='dada2', action_id='denoise_paired'),
    use.UsageInputs(demultiplexed_seqs=demultiplexed_sequences,
                    trunc_len_f=204,
                    trim_left_r=1, trunc_len_r=205,),
    use.UsageOutputNames(representative_sequences='asv_sequences_0',
                        table='feature_table_0',
                        denoising_stats='dada2_stats')
)
Using the qiime2 dada2 denoise-paired tool:
  1. Set “demultiplexed_seqs” to #: demultiplexed-sequences.qza

  2. Set “trunc_len_f” to 204

  3. Set “trunc_len_r” to 205

  4. Expand the additional options section

    • Set “trim_left_r” to 1

  5. Press the Execute button.

Once completed, for each new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 dada2 denoise-paired [...] : table.qza

feature-table-0.qza

#: qiime2 dada2 denoise-paired [...] : representative_sequences.qza

asv-sequences-0.qza

#: qiime2 dada2 denoise-paired [...] : denoising_stats.qza

dada2-stats.qza

Reviewing the DADA2 run statistics#

The first output of DADA2 that we’ll look at is the run statistics. You can generate a viewable summary using the following command. This file will tell you how many reads were filtered from each sample and why.

stats_dada2_md_md = dada2_stats.view(Metadata)
dada2_stats_summ_viz, = metadata_actions.tabulate(
    input=stats_dada2_md_md,
)
stats_dada2_md_md <- dada2_stats$view(Metadata)
action_results <- metadata_actions$tabulate(
    input=stats_dada2_md_md,
)
dada2_stats_summ_viz <- action_results$visualization
qiime metadata tabulate \
  --m-input-file dada2-stats.qza \
  --o-visualization dada2-stats-summ.qzv
stats_as_md = use.view_as_metadata('stats_dada2_md', dada2_stats)

use.action(
    use.UsageAction(plugin_id='metadata', action_id='tabulate'),
    use.UsageInputs(input=stats_as_md),
    use.UsageOutputNames(visualization='dada2_stats_summ')
)
Using the qiime2 metadata tabulate tool:
  1. For “input”:

    • Perform the following steps.

      1. Change to Metadata from Artifact

      2. Set “Metadata Source” to dada2-stats.qza

  2. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 metadata tabulate [...] : visualization.qzv

dada2-stats-summ.qzv

Generating and reviewing summaries of the feature table and feature data#

The next two outputs of DADA2 will form the basis of the majority of the microbiome analyses that you’ll run, in connection with your sample metadata. This is the feature table and feature data. The feature table describes which amplicon sequence variants (ASVs) were observed in which samples, and how many times each ASV was observed in each sample. The feature data in this case is the sequence that defines each ASV. Generate and explore the summaries of each of these files.

import qiime2.plugins.feature_table.actions as feature_table_actions

feature_table_0_summ_viz, = feature_table_actions.summarize(
    table=feature_table_0,
    sample_metadata=sample_metadata_md,
)
asv_sequences_0_summ_viz, = feature_table_actions.tabulate_seqs(
    data=asv_sequences_0,
)
feature_table_actions <- import("qiime2.plugins.feature_table.actions")

action_results <- feature_table_actions$summarize(
    table=feature_table_0,
    sample_metadata=sample_metadata_md,
)
feature_table_0_summ_viz <- action_results$visualization
action_results <- feature_table_actions$tabulate_seqs(
    data=asv_sequences_0,
)
asv_sequences_0_summ_viz <- action_results$visualization
qiime feature-table summarize \
  --i-table feature-table-0.qza \
  --m-sample-metadata-file sample-metadata.tsv \
  --o-visualization feature-table-0-summ.qzv
qiime feature-table tabulate-seqs \
  --i-data asv-sequences-0.qza \
  --o-visualization asv-sequences-0-summ.qzv
use.action(
    use.UsageAction(plugin_id='feature_table', action_id='summarize'),
    use.UsageInputs(table=feature_table_0, sample_metadata=sample_metadata),
    use.UsageOutputNames(visualization='feature_table_0_summ'),
)

use.action(
    use.UsageAction(plugin_id='feature_table', action_id='tabulate_seqs'),
    use.UsageInputs(data=asv_sequences_0),
    use.UsageOutputNames(visualization='asv_sequences_0_summ'),
)
Using the qiime2 feature-table summarize tool:
  1. Set “table” to #: feature-table-0.qza

  2. Expand the additional options section

    • For “sample_metadata”:

      • Press the + Insert sample_metadata button to set up the next steps.

        1. Leave as Metadata from TSV

        2. Set “Metadata Source” to sample-metadata.tsv

  3. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 feature-table summarize [...] : visualization.qzv

feature-table-0-summ.qzv

Using the qiime2 feature-table tabulate-seqs tool:
  1. Set “data” to #: asv-sequences-0.qza

  2. Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name

“Name” to set (be sure to press Save)

#: qiime2 feature-table tabulate-seqs [...] : visualization.qzv

asv-sequences-0-summ.qzv

Note

We’ve now reached the end of the upstream tutorial. When we begin working on the downstream tutorial, we’ll work with larger feature table and feature data artifacts representing many more samples. The samples that we worked with in this tutorial are a small subset of what we’ll work with in the downstream tutorial.