Docstring:
Usage: qiime dada2 denoise-ccs [OPTIONS]
This method denoises single-end Pacbio CCS sequences, dereplicates them, and
filters chimeras. Tutorial and workflow:
https://github.com/benjjneb/LRASManuscript
Inputs:
--i-demultiplexed-seqs ARTIFACT SampleData[SequencesWithQuality]
The single-end demultiplexed PacBio CCS sequences
to be denoised. [required]
Parameters:
--p-front TEXT Sequence of an adapter ligated to the 5' end. The
adapter and any preceding bases are trimmed. Can
contain IUPAC ambiguous nucleotide codes. Note,
primer direction is 5' to 3'. Primers are removed
before trim and filter step. Reads that do not
contain the primer are discarded. Each read is
re-oriented if the reverse complement of the read is
a better match to the provided primer sequence. This
is recommended for PacBio CCS reads, which come in a
random mix of forward and reverse-complement
orientations. [required]
--p-adapter TEXT Sequence of an adapter ligated to the 3' end. The
adapter and any preceding bases are trimmed. Can
contain IUPAC ambiguous nucleotide codes. Note,
primer direction is 5' to 3'. Primers are removed
before trim and filter step. Reads that do not
contain the primer are discarded. [optional]
--p-max-mismatch INTEGER
The number of mismatches to tolerate when matching
reads to primer sequences - see
http://benjjneb.github.io/dada2/ for complete
details. [default: 2]
--p-indels / --p-no-indels
Allow insertions or deletions of bases when
matching adapters. Note that primer matching can be
significantly slower, currently about 4x slower
[default: False]
--p-trunc-len INTEGER Position at which sequences should be truncated due
to decrease in quality. This truncates the 3' end of
the of the input sequences, which will be the bases
that were sequenced in the last cycles. Reads that
are shorter than this value will be discarded. If 0
is provided, no truncation or length filtering will
be performed. Note: Since Pacbio CCS sequences were
normally with very high quality scores, there is no
need to truncate the Pacbio CCS sequences.
[default: 0]
--p-trim-left INTEGER Position at which sequences should be trimmed due
to low quality. This trims the 5' end of the of the
input sequences, which will be the bases that were
sequenced in the first cycles. [default: 0]
--p-max-ee NUMBER Reads with number of expected errors higher than
this value will be discarded. [default: 2.0]
--p-trunc-q INTEGER Reads are truncated at the first instance of a
quality score less than or equal to this value. If
the resulting read is then shorter than `trunc-len`,
it is discarded. [default: 2]
--p-min-len INTEGER Remove reads with length less than minLen. minLen
is enforced after trimming and truncation. For 16S
Pacbio CCS, suggest 1000. [default: 20]
--p-max-len INTEGER Remove reads prior to trimming or truncation which
are longer than this value. If 0 is provided no
reads will be removed based on length. For 16S
Pacbio CCS, suggest 1600. [default: 0]
--p-pooling-method TEXT Choices('independent', 'pseudo')
The method used to pool samples for denoising.
"independent": Samples are denoised indpendently.
"pseudo": The pseudo-pooling method is used to
approximate pooling of samples. In short, samples
are denoised independently once, ASVs detected in at
least 2 samples are recorded, and samples are
denoised independently a second time, but this time
with prior knowledge of the recorded ASVs and thus
higher sensitivity to those ASVs.
[default: 'independent']
--p-chimera-method TEXT Choices('consensus', 'none', 'pooled')
The method used to remove chimeras. "none": No
chimera removal is performed. "pooled": All reads
are pooled prior to chimera detection. "consensus":
Chimeras are detected in samples individually, and
sequences found chimeric in a sufficient fraction of
samples are removed. [default: 'consensus']
--p-min-fold-parent-over-abundance NUMBER
The minimum abundance of potential parents of a
sequence being tested as chimeric, expressed as a
fold-change versus the abundance of the sequence
being tested. Values should be greater than or equal
to 1 (i.e. parents should be more abundant than the
sequence being tested). Suggest 3.5. This parameter
has no effect if chimera-method is "none".
[default: 3.5]
--p-allow-one-off / --p-no-allow-one-off
Bimeras that are one-off from exact are also
identified if the `allow-one-off` argument is True.
If True, a sequence will be identified as bimera if
it is one mismatch or indel away from an exact
bimera. [default: False]
--p-n-threads NTHREADS The number of threads to use for multithreaded
processing. If 0 is provided, all available cores
will be used. [default: 1]
--p-n-reads-learn INTEGER
The number of reads to use when training the error
model. Smaller numbers will result in a shorter run
time but a less reliable error model.
[default: 1000000]
--p-hashed-feature-ids / --p-no-hashed-feature-ids
If true, the feature ids in the resulting table
will be presented as hashes of the sequences
defining each feature. The hash will always be the
same for the same sequence so this allows feature
tables to be merged across runs of this method. You
should only merge tables if the exact same
parameters are used for each run. [default: True]
--p-retain-all-samples / --p-no-retain-all-samples
If True all samples input to dada2 will be retained
in the output of dada2, if false samples with zero
total frequency are removed from the table.
[default: True]
Outputs:
--o-table ARTIFACT FeatureTable[Frequency]
The resulting feature table. [required]
--o-representative-sequences ARTIFACT FeatureData[Sequence]
The resulting feature sequences. Each feature in
the feature table will be represented by exactly one
sequence. [required]
--o-denoising-stats ARTIFACT SampleData[DADA2Stats]
[required]
Miscellaneous:
--output-dir PATH Output unspecified results to a directory
--verbose / --quiet Display verbose output to stdout and/or stderr
during execution of this action. Or silence output
if execution is successful (silence is golden).
--example-data PATH Write example data and exit.
--citations Show citations and exit.
--use-cache DIRECTORY Specify the cache to be used for the intermediate
work of this action. If not provided, the default
cache under $TMP/qiime2/ will be used.
IMPORTANT FOR HPC USERS: If you are on an HPC system
and are using parallel execution it is important to
set this to a location that is globally accessible
to all nodes in the cluster.
--help Show this message and exit.
Import:
from qiime2.plugins.dada2.methods import denoise_ccs
Docstring:
Denoise and dereplicate single-end Pacbio CCS
This method denoises single-end Pacbio CCS sequences, dereplicates them,
and filters chimeras. Tutorial and workflow:
https://github.com/benjjneb/LRASManuscript
Parameters
----------
demultiplexed_seqs : SampleData[SequencesWithQuality]
The single-end demultiplexed PacBio CCS sequences to be denoised.
front : Str
Sequence of an adapter ligated to the 5' end. The adapter and any
preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide
codes. Note, primer direction is 5' to 3'. Primers are removed before
trim and filter step. Reads that do not contain the primer are
discarded. Each read is re-oriented if the reverse complement of the
read is a better match to the provided primer sequence. This is
recommended for PacBio CCS reads, which come in a random mix of forward
and reverse-complement orientations.
adapter : Str, optional
Sequence of an adapter ligated to the 3' end. The adapter and any
preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide
codes. Note, primer direction is 5' to 3'. Primers are removed before
trim and filter step. Reads that do not contain the primer are
discarded.
max_mismatch : Int, optional
The number of mismatches to tolerate when matching reads to primer
sequences - see http://benjjneb.github.io/dada2/ for complete details.
indels : Bool, optional
Allow insertions or deletions of bases when matching adapters. Note
that primer matching can be significantly slower, currently about 4x
slower
trunc_len : Int, optional
Position at which sequences should be truncated due to decrease in
quality. This truncates the 3' end of the of the input sequences, which
will be the bases that were sequenced in the last cycles. Reads that
are shorter than this value will be discarded. If 0 is provided, no
truncation or length filtering will be performed. Note: Since Pacbio
CCS sequences were normally with very high quality scores, there is no
need to truncate the Pacbio CCS sequences.
trim_left : Int, optional
Position at which sequences should be trimmed due to low quality. This
trims the 5' end of the of the input sequences, which will be the bases
that were sequenced in the first cycles.
max_ee : Float, optional
Reads with number of expected errors higher than this value will be
discarded.
trunc_q : Int, optional
Reads are truncated at the first instance of a quality score less than
or equal to this value. If the resulting read is then shorter than
`trunc_len`, it is discarded.
min_len : Int, optional
Remove reads with length less than minLen. minLen is enforced after
trimming and truncation. For 16S Pacbio CCS, suggest 1000.
max_len : Int, optional
Remove reads prior to trimming or truncation which are longer than this
value. If 0 is provided no reads will be removed based on length. For
16S Pacbio CCS, suggest 1600.
pooling_method : Str % Choices('independent', 'pseudo'), optional
The method used to pool samples for denoising. "independent": Samples
are denoised indpendently. "pseudo": The pseudo-pooling method is used
to approximate pooling of samples. In short, samples are denoised
independently once, ASVs detected in at least 2 samples are recorded,
and samples are denoised independently a second time, but this time
with prior knowledge of the recorded ASVs and thus higher sensitivity
to those ASVs.
chimera_method : Str % Choices('consensus', 'none', 'pooled'), optional
The method used to remove chimeras. "none": No chimera removal is
performed. "pooled": All reads are pooled prior to chimera detection.
"consensus": Chimeras are detected in samples individually, and
sequences found chimeric in a sufficient fraction of samples are
removed.
min_fold_parent_over_abundance : Float, optional
The minimum abundance of potential parents of a sequence being tested
as chimeric, expressed as a fold-change versus the abundance of the
sequence being tested. Values should be greater than or equal to 1
(i.e. parents should be more abundant than the sequence being tested).
Suggest 3.5. This parameter has no effect if chimera_method is "none".
allow_one_off : Bool, optional
Bimeras that are one-off from exact are also identified if the
`allow_one_off` argument is True. If True, a sequence will be
identified as bimera if it is one mismatch or indel away from an exact
bimera.
n_threads : Threads, optional
The number of threads to use for multithreaded processing. If 0 is
provided, all available cores will be used.
n_reads_learn : Int, optional
The number of reads to use when training the error model. Smaller
numbers will result in a shorter run time but a less reliable error
model.
hashed_feature_ids : Bool, optional
If true, the feature ids in the resulting table will be presented as
hashes of the sequences defining each feature. The hash will always be
the same for the same sequence so this allows feature tables to be
merged across runs of this method. You should only merge tables if the
exact same parameters are used for each run.
retain_all_samples : Bool, optional
If True all samples input to dada2 will be retained in the output of
dada2, if false samples with zero total frequency are removed from the
table.
Returns
-------
table : FeatureTable[Frequency]
The resulting feature table.
representative_sequences : FeatureData[Sequence]
The resulting feature sequences. Each feature in the feature table will
be represented by exactly one sequence.
denoising_stats : SampleData[DADA2Stats]