Denoising sequence data with DADA2#
Performing sequence quality control (i.e., denoising)#
Next, we’ll perform quality control or denoising of the sequence data with
DADA2 Callahan et al. [CMR+16], which is accessible through the q2-dada2
plugin.
Since our reads are paired end, we’ll use the denoise_paired
action in the
q2-dada2 plugin. This performs quality filtering, chimera checking, and paired-
end read joining.
The denoise_paired
action requires a few parameters that you’ll set based
on the sequence quality score plots that you previously generated in the
summary of the demultiplex reads. You should review those plots and identify
where the quality begins to decrease, and use that information to set the
trunc_len_*
parameters. You’ll set that for both the forward and reverse
reads using the trunc_len_f
and trunc_len_r
parameters, respectively. If
you notice a region of lower quality in the beginning of the forward and/or
reverse reads, you can optionally trim bases from the beginning of the reads
using the trim_left_f
and trim_left_r
parameters for the forward and
reverse reads, respectively.
Spend a couple of minutes reviewing the quality score plots and think about where you might want to truncate the forward and reverse reads, and if you’d like to trim any bases from the beginnings.
import qiime2.plugins.dada2.actions as dada2_actions
feature_table_0, asv_sequences_0, dada2_stats = dada2_actions.denoise_paired(
demultiplexed_seqs=demultiplexed_sequences,
trunc_len_f=204,
trim_left_r=1,
trunc_len_r=205,
)
dada2_actions <- import("qiime2.plugins.dada2.actions")
action_results <- dada2_actions$denoise_paired(
demultiplexed_seqs=demultiplexed_sequences,
trunc_len_f=204L,
trim_left_r=1L,
trunc_len_r=205L,
)
asv_sequences_0 <- action_results$representative_sequences
feature_table_0 <- action_results$table
dada2_stats <- action_results$denoising_stats
qiime dada2 denoise-paired \
--i-demultiplexed-seqs demultiplexed-sequences.qza \
--p-trunc-len-f 204 \
--p-trim-left-r 1 \
--p-trunc-len-r 205 \
--o-representative-sequences asv-sequences-0.qza \
--o-table feature-table-0.qza \
--o-denoising-stats dada2-stats.qza
asv_sequences_0, feature_table_0, dada2_stats = use.action(
use.UsageAction(plugin_id='dada2', action_id='denoise_paired'),
use.UsageInputs(demultiplexed_seqs=demultiplexed_sequences,
trunc_len_f=204,
trim_left_r=1, trunc_len_r=205,),
use.UsageOutputNames(representative_sequences='asv_sequences_0',
table='feature_table_0',
denoising_stats='dada2_stats')
)
- Using the
qiime2 dada2 denoise-paired
tool: Set “demultiplexed_seqs” to
#: demultiplexed-sequences.qza
Set “trunc_len_f” to
204
Set “trunc_len_r” to
205
Expand the
additional options
sectionSet “trim_left_r” to
1
Press the
Execute
button.
- Once completed, for each new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 dada2 denoise-paired [...] : table.qza
feature-table-0.qza
#: qiime2 dada2 denoise-paired [...] : representative_sequences.qza
asv-sequences-0.qza
#: qiime2 dada2 denoise-paired [...] : denoising_stats.qza
dada2-stats.qza
Reviewing the DADA2 run statistics#
The first output of DADA2 that we’ll look at is the run statistics. You can generate a viewable summary using the following command. This file will tell you how many reads were filtered from each sample and why.
stats_dada2_md_md = dada2_stats.view(Metadata)
dada2_stats_summ_viz, = metadata_actions.tabulate(
input=stats_dada2_md_md,
)
stats_dada2_md_md <- dada2_stats$view(Metadata)
action_results <- metadata_actions$tabulate(
input=stats_dada2_md_md,
)
dada2_stats_summ_viz <- action_results$visualization
qiime metadata tabulate \
--m-input-file dada2-stats.qza \
--o-visualization dada2-stats-summ.qzv
stats_as_md = use.view_as_metadata('stats_dada2_md', dada2_stats)
use.action(
use.UsageAction(plugin_id='metadata', action_id='tabulate'),
use.UsageInputs(input=stats_as_md),
use.UsageOutputNames(visualization='dada2_stats_summ')
)
- Using the
qiime2 metadata tabulate
tool: For “input”:
Perform the following steps.
Change to
Metadata from Artifact
Set “Metadata Source” to
dada2-stats.qza
Press the
Execute
button.
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 metadata tabulate [...] : visualization.qzv
dada2-stats-summ.qzv
Generating and reviewing summaries of the feature table and feature data#
The next two outputs of DADA2 will form the basis of the majority of the microbiome analyses that you’ll run, in connection with your sample metadata. This is the feature table and feature data. The feature table describes which amplicon sequence variants (ASVs) were observed in which samples, and how many times each ASV was observed in each sample. The feature data in this case is the sequence that defines each ASV. Generate and explore the summaries of each of these files.
import qiime2.plugins.feature_table.actions as feature_table_actions
feature_table_0_summ_viz, = feature_table_actions.summarize(
table=feature_table_0,
sample_metadata=sample_metadata_md,
)
asv_sequences_0_summ_viz, = feature_table_actions.tabulate_seqs(
data=asv_sequences_0,
)
feature_table_actions <- import("qiime2.plugins.feature_table.actions")
action_results <- feature_table_actions$summarize(
table=feature_table_0,
sample_metadata=sample_metadata_md,
)
feature_table_0_summ_viz <- action_results$visualization
action_results <- feature_table_actions$tabulate_seqs(
data=asv_sequences_0,
)
asv_sequences_0_summ_viz <- action_results$visualization
qiime feature-table summarize \
--i-table feature-table-0.qza \
--m-sample-metadata-file sample-metadata.tsv \
--o-visualization feature-table-0-summ.qzv
qiime feature-table tabulate-seqs \
--i-data asv-sequences-0.qza \
--o-visualization asv-sequences-0-summ.qzv
use.action(
use.UsageAction(plugin_id='feature_table', action_id='summarize'),
use.UsageInputs(table=feature_table_0, sample_metadata=sample_metadata),
use.UsageOutputNames(visualization='feature_table_0_summ'),
)
use.action(
use.UsageAction(plugin_id='feature_table', action_id='tabulate_seqs'),
use.UsageInputs(data=asv_sequences_0),
use.UsageOutputNames(visualization='asv_sequences_0_summ'),
)
- Using the
qiime2 feature-table summarize
tool: Set “table” to
#: feature-table-0.qza
Expand the
additional options
sectionFor “sample_metadata”:
Press the
+ Insert sample_metadata
button to set up the next steps.Leave as
Metadata from TSV
Set “Metadata Source” to
sample-metadata.tsv
Press the
Execute
button.
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 feature-table summarize [...] : visualization.qzv
feature-table-0-summ.qzv
- Using the
qiime2 feature-table tabulate-seqs
tool: Set “data” to
#: asv-sequences-0.qza
Press the
Execute
button.
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save
)#: qiime2 feature-table tabulate-seqs [...] : visualization.qzv
asv-sequences-0-summ.qzv
Note
We’ve now reached the end of the upstream tutorial. When we begin working on the downstream tutorial, we’ll work with larger feature table and feature data artifacts representing many more samples. The samples that we worked with in this tutorial are a small subset of what we’ll work with in the downstream tutorial.