Denoising sequence data with DADA2

Denoising sequence data with DADA2#

Performing sequence quality control (i.e., denoising)#

Next, we’ll perform quality control or denoising of the sequence data with DADA2 Callahan et al. [CMR+16], which is accessible through the q2-dada2 plugin. Since our reads are paired end, we’ll use the denoise_paired action in the q2-dada2 plugin. This performs quality filtering, chimera checking, and paired- end read joining.

The denoise_paired action requires a few parameters that you’ll set based on the sequence quality score plots that you previously generated in the summary of the demultiplex reads. You should review those plots and identify where the quality begins to decrease, and use that information to set the trunc_len_* parameters. You’ll set that for both the forward and reverse reads using the trunc_len_f and trunc_len_r parameters, respectively. If you notice a region of lower quality in the beginning of the forward and/or reverse reads, you can optionally trim bases from the beginning of the reads using the trim_left_f and trim_left_r parameters for the forward and reverse reads, respectively.

Spend a couple of minutes reviewing the quality score plots and think about where you might want to truncate the forward and reverse reads, and if you’d like to trim any bases from the beginnings.

import qiime2.plugins.dada2.actions as dada2_actions

feature_table_0, asv_sequences_0, dada2_stats = dada2_actions.denoise_paired(
    demultiplexed_seqs=demultiplexed_sequences,
    trunc_len_f=204,
    trim_left_r=1,
    trunc_len_r=205,
)

dada2_actions <- import("qiime2.plugins.dada2.actions")

action_results <- dada2_actions$denoise_paired(
    demultiplexed_seqs=demultiplexed_sequences,
    trunc_len_f=204L,
    trim_left_r=1L,
    trunc_len_r=205L,
)
asv_sequences_0 <- action_results$representative_sequences
feature_table_0 <- action_results$table
dada2_stats <- action_results$denoising_stats

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs demultiplexed-sequences.qza \
  --p-trunc-len-f 204 \
  --p-trim-left-r 1 \
  --p-trunc-len-r 205 \
  --o-representative-sequences asv-sequences-0.qza \
  --o-table feature-table-0.qza \
  --o-denoising-stats dada2-stats.qza

asv_sequences_0, feature_table_0, dada2_stats = use.action(
    use.UsageAction(plugin_id='dada2', action_id='denoise_paired'),
    use.UsageInputs(demultiplexed_seqs=demultiplexed_sequences,
                    trunc_len_f=204,
                    trim_left_r=1, trunc_len_r=205,),
    use.UsageOutputNames(representative_sequences='asv_sequences_0',
                        table='feature_table_0',
                        denoising_stats='dada2_stats')
)

Using the qiime2 dada2 denoise-paired tool:

Set “demultiplexed_seqs” to #: demultiplexed-sequences.qza
Set “trunc_len_f” to 204
Set “trunc_len_r” to 205
Expand the additional options section
- Set “trim_left_r” to 1
Press the Execute button.

Once completed, for each new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 dada2 denoise-paired [...] : table.qza`	`feature-table-0.qza`
`#: qiime2 dada2 denoise-paired [...] : representative_sequences.qza`	`asv-sequences-0.qza`
`#: qiime2 dada2 denoise-paired [...] : denoising_stats.qza`	`dada2-stats.qza`

asv-sequences-0.qza | view | download
feature-table-0.qza | view | download
dada2-stats.qza | view | download

Reviewing the DADA2 run statistics#

The first output of DADA2 that we’ll look at is the run statistics. You can generate a viewable summary using the following command. This file will tell you how many reads were filtered from each sample and why.

stats_dada2_md_md = dada2_stats.view(Metadata)
dada2_stats_summ_viz, = metadata_actions.tabulate(
    input=stats_dada2_md_md,
)

stats_dada2_md_md <- dada2_stats$view(Metadata)
action_results <- metadata_actions$tabulate(
    input=stats_dada2_md_md,
)
dada2_stats_summ_viz <- action_results$visualization

qiime metadata tabulate \
  --m-input-file dada2-stats.qza \
  --o-visualization dada2-stats-summ.qzv

stats_as_md = use.view_as_metadata('stats_dada2_md', dada2_stats)

use.action(
    use.UsageAction(plugin_id='metadata', action_id='tabulate'),
    use.UsageInputs(input=stats_as_md),
    use.UsageOutputNames(visualization='dada2_stats_summ')
)

Using the qiime2 metadata tabulate tool:

For “input”:
- Perform the following steps.
  1. Change to Metadata from Artifact
  2. Set “Metadata Source” to dada2-stats.qza
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 metadata tabulate [...] : visualization.qzv`	`dada2-stats-summ.qzv`

dada2-stats-summ.qzv | view | download

Generating and reviewing summaries of the feature table and feature data#

The next two outputs of DADA2 will form the basis of the majority of the microbiome analyses that you’ll run, in connection with your sample metadata. This is the feature table and feature data. The feature table describes which amplicon sequence variants (ASVs) were observed in which samples, and how many times each ASV was observed in each sample. The feature data in this case is the sequence that defines each ASV. Generate and explore the summaries of each of these files.

import qiime2.plugins.feature_table.actions as feature_table_actions

feature_table_0_summ_viz, = feature_table_actions.summarize(
    table=feature_table_0,
    sample_metadata=sample_metadata_md,
)
asv_sequences_0_summ_viz, = feature_table_actions.tabulate_seqs(
    data=asv_sequences_0,
)

feature_table_actions <- import("qiime2.plugins.feature_table.actions")

action_results <- feature_table_actions$summarize(
    table=feature_table_0,
    sample_metadata=sample_metadata_md,
)
feature_table_0_summ_viz <- action_results$visualization
action_results <- feature_table_actions$tabulate_seqs(
    data=asv_sequences_0,
)
asv_sequences_0_summ_viz <- action_results$visualization

qiime feature-table summarize \
  --i-table feature-table-0.qza \
  --m-sample-metadata-file sample-metadata.tsv \
  --o-visualization feature-table-0-summ.qzv
qiime feature-table tabulate-seqs \
  --i-data asv-sequences-0.qza \
  --o-visualization asv-sequences-0-summ.qzv

use.action(
    use.UsageAction(plugin_id='feature_table', action_id='summarize'),
    use.UsageInputs(table=feature_table_0, sample_metadata=sample_metadata),
    use.UsageOutputNames(visualization='feature_table_0_summ'),
)

use.action(
    use.UsageAction(plugin_id='feature_table', action_id='tabulate_seqs'),
    use.UsageInputs(data=asv_sequences_0),
    use.UsageOutputNames(visualization='asv_sequences_0_summ'),
)

Using the qiime2 feature-table summarize tool:

Set “table” to #: feature-table-0.qza
Expand the additional options section
- For “sample_metadata”:
  - Press the + Insert sample_metadata button to set up the next steps.
    1. Leave as Metadata from TSV
    2. Set “Metadata Source” to sample-metadata.tsv
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 feature-table summarize [...] : visualization.qzv`	`feature-table-0-summ.qzv`

Using the qiime2 feature-table tabulate-seqs tool:

Set “data” to #: asv-sequences-0.qza
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 feature-table tabulate-seqs [...] : visualization.qzv`	`asv-sequences-0-summ.qzv`

feature-table-0-summ.qzv | view | download
asv-sequences-0-summ.qzv | view | download

Note

We’ve now reached the end of the upstream tutorial. When we begin working on the downstream tutorial, we’ll work with larger feature table and feature data artifacts representing many more samples. The samples that we worked with in this tutorial are a small subset of what we’ll work with in the downstream tutorial.

Denoising sequence data with DADA2

Contents

Denoising sequence data with DADA2#

Performing sequence quality control (i.e., denoising)#

Reviewing the DADA2 run statistics#

Generating and reviewing summaries of the feature table and feature data#