Fork me on GitHub

denoise-paired: Denoise and dereplicate paired-end sequencesΒΆ

Docstring:

Usage: qiime dada2 denoise-paired [OPTIONS]

  This method denoises paired-end sequences, dereplicates them, and filters
  chimeras.

Inputs:
  --i-demultiplexed-seqs ARTIFACT SampleData[PairedEndSequencesWithQuality]
                         The paired-end demultiplexed sequences to be
                         denoised.                                  [required]
Parameters:
  --p-trunc-len-f INTEGER
                         Position at which forward read sequences should be
                         truncated due to decrease in quality. This truncates
                         the 3' end of the of the input sequences, which will
                         be the bases that were sequenced in the last cycles.
                         Reads that are shorter than this value will be
                         discarded. After this parameter is applied there must
                         still be at least a 12 nucleotide overlap between the
                         forward and reverse reads. If 0 is provided, no
                         truncation or length filtering will be performed
                                                                    [required]
  --p-trunc-len-r INTEGER
                         Position at which reverse read sequences should be
                         truncated due to decrease in quality. This truncates
                         the 3' end of the of the input sequences, which will
                         be the bases that were sequenced in the last cycles.
                         Reads that are shorter than this value will be
                         discarded. After this parameter is applied there must
                         still be at least a 12 nucleotide overlap between the
                         forward and reverse reads. If 0 is provided, no
                         truncation or length filtering will be performed
                                                                    [required]
  --p-trim-left-f INTEGER
                         Position at which forward read sequences should be
                         trimmed due to low quality. This trims the 5' end of
                         the input sequences, which will be the bases that
                         were sequenced in the first cycles.      [default: 0]
  --p-trim-left-r INTEGER
                         Position at which reverse read sequences should be
                         trimmed due to low quality. This trims the 5' end of
                         the input sequences, which will be the bases that
                         were sequenced in the first cycles.      [default: 0]
  --p-max-ee-f NUMBER    Forward reads with number of expected errors higher
                         than this value will be discarded.     [default: 2.0]
  --p-max-ee-r NUMBER    Reverse reads with number of expected errors higher
                         than this value will be discarded.     [default: 2.0]
  --p-trunc-q INTEGER    Reads are truncated at the first instance of a
                         quality score less than or equal to this value. If
                         the resulting read is then shorter than `trunc-len-f`
                         or `trunc-len-r` (depending on the direction of the
                         read) it is discarded.                   [default: 2]
  --p-min-overlap INTEGER
    Range(4, None)       The minimum length of the overlap required for
                         merging the forward and reverse reads.  [default: 12]
  --p-pooling-method TEXT Choices('independent', 'pseudo')
                         The method used to pool samples for denoising.
                         "independent": Samples are denoised indpendently.
                         "pseudo": The pseudo-pooling method is used to
                         approximate pooling of samples. In short, samples are
                         denoised independently once, ASVs detected in at
                         least 2 samples are recorded, and samples are
                         denoised independently a second time, but this time
                         with prior knowledge of the recorded ASVs and thus
                         higher sensitivity to those ASVs.
                                                      [default: 'independent']
  --p-chimera-method TEXT Choices('consensus', 'none', 'pooled')
                         The method used to remove chimeras. "none": No
                         chimera removal is performed. "pooled": All reads are
                         pooled prior to chimera detection. "consensus":
                         Chimeras are detected in samples individually, and
                         sequences found chimeric in a sufficient fraction of
                         samples are removed.           [default: 'consensus']
  --p-min-fold-parent-over-abundance NUMBER
                         The minimum abundance of potential parents of a
                         sequence being tested as chimeric, expressed as a
                         fold-change versus the abundance of the sequence
                         being tested. Values should be greater than or equal
                         to 1 (i.e. parents should be more abundant than the
                         sequence being tested). This parameter has no effect
                         if chimera-method is "none".           [default: 1.0]
  --p-allow-one-off / --p-no-allow-one-off
                         Bimeras that are one-off from exact are also
                         identified if the `allow-one-off` argument is TrueIf
                         True, a sequence will be identified as bimera if it
                         is one mismatch or indel away from an exact bimera.
                                                              [default: False]
  --p-n-threads INTEGER  The number of threads to use for multithreaded
                         processing. If 0 is provided, all available cores
                         will be used.                            [default: 1]
  --p-n-reads-learn INTEGER
                         The number of reads to use when training the error
                         model. Smaller numbers will result in a shorter run
                         time but a less reliable error model.
                                                            [default: 1000000]
  --p-hashed-feature-ids / --p-no-hashed-feature-ids
                         If true, the feature ids in the resulting table will
                         be presented as hashes of the sequences defining each
                         feature. The hash will always be the same for the
                         same sequence so this allows feature tables to be
                         merged across runs of this method. You should only
                         merge tables if the exact same parameters are used
                         for each run.                         [default: True]
Outputs:
  --o-table ARTIFACT FeatureTable[Frequency]
                         The resulting feature table.               [required]
  --o-representative-sequences ARTIFACT FeatureData[Sequence]
                         The resulting feature sequences. Each feature in the
                         feature table will be represented by exactly one
                         sequence, and these sequences will be the joined
                         paired-end sequences.                      [required]
  --o-denoising-stats ARTIFACT SampleData[DADA2Stats]
                                                                    [required]
Miscellaneous:
  --output-dir PATH      Output unspecified results to a directory
  --verbose / --quiet    Display verbose output to stdout and/or stderr
                         during execution of this action. Or silence output if
                         execution is successful (silence is golden).
  --example-data PATH    Write example data and exit.
  --citations            Show citations and exit.
  --help                 Show this message and exit.

Examples:
  # ### example: denoise paired
  qiime dada2 denoise-paired \
    --i-demultiplexed-seqs demux-paired.qza \
    --p-trunc-len-f 150 \
    --p-trunc-len-r 140 \
    --o-representative-sequences representative-sequences.qza \
    --o-table table.qza \
    --o-denoising-stats denoising-stats.qza

Import:

from qiime2.plugins.dada2.methods import denoise_paired

Docstring:

Denoise and dereplicate paired-end sequences

This method denoises paired-end sequences, dereplicates them, and filters
chimeras.

Parameters
----------
demultiplexed_seqs : SampleData[PairedEndSequencesWithQuality]
    The paired-end demultiplexed sequences to be denoised.
trunc_len_f : Int
    Position at which forward read sequences should be truncated due to
    decrease in quality. This truncates the 3' end of the of the input
    sequences, which will be the bases that were sequenced in the last
    cycles. Reads that are shorter than this value will be discarded. After
    this parameter is applied there must still be at least a 12 nucleotide
    overlap between the forward and reverse reads. If 0 is provided, no
    truncation or length filtering will be performed
trunc_len_r : Int
    Position at which reverse read sequences should be truncated due to
    decrease in quality. This truncates the 3' end of the of the input
    sequences, which will be the bases that were sequenced in the last
    cycles. Reads that are shorter than this value will be discarded. After
    this parameter is applied there must still be at least a 12 nucleotide
    overlap between the forward and reverse reads. If 0 is provided, no
    truncation or length filtering will be performed
trim_left_f : Int, optional
    Position at which forward read sequences should be trimmed due to low
    quality. This trims the 5' end of the input sequences, which will be
    the bases that were sequenced in the first cycles.
trim_left_r : Int, optional
    Position at which reverse read sequences should be trimmed due to low
    quality. This trims the 5' end of the input sequences, which will be
    the bases that were sequenced in the first cycles.
max_ee_f : Float, optional
    Forward reads with number of expected errors higher than this value
    will be discarded.
max_ee_r : Float, optional
    Reverse reads with number of expected errors higher than this value
    will be discarded.
trunc_q : Int, optional
    Reads are truncated at the first instance of a quality score less than
    or equal to this value. If the resulting read is then shorter than
    `trunc_len_f` or `trunc_len_r` (depending on the direction of the read)
    it is discarded.
min_overlap : Int % Range(4, None), optional
    The minimum length of the overlap required for merging the forward and
    reverse reads.
pooling_method : Str % Choices('independent', 'pseudo'), optional
    The method used to pool samples for denoising. "independent": Samples
    are denoised indpendently. "pseudo": The pseudo-pooling method is used
    to approximate pooling of samples. In short, samples are denoised
    independently once, ASVs detected in at least 2 samples are recorded,
    and samples are denoised independently a second time, but this time
    with prior knowledge of the recorded ASVs and thus higher sensitivity
    to those ASVs.
chimera_method : Str % Choices('consensus', 'none', 'pooled'), optional
    The method used to remove chimeras. "none": No chimera removal is
    performed. "pooled": All reads are pooled prior to chimera detection.
    "consensus": Chimeras are detected in samples individually, and
    sequences found chimeric in a sufficient fraction of samples are
    removed.
min_fold_parent_over_abundance : Float, optional
    The minimum abundance of potential parents of a sequence being tested
    as chimeric, expressed as a fold-change versus the abundance of the
    sequence being tested. Values should be greater than or equal to 1
    (i.e. parents should be more abundant than the sequence being tested).
    This parameter has no effect if chimera_method is "none".
allow_one_off : Bool, optional
    Bimeras that are one-off from exact are also identified if the
    `allow_one_off` argument is TrueIf True, a sequence will be identified
    as bimera if it is one mismatch or indel away from an exact bimera.
n_threads : Int, optional
    The number of threads to use for multithreaded processing. If 0 is
    provided, all available cores will be used.
n_reads_learn : Int, optional
    The number of reads to use when training the error model. Smaller
    numbers will result in a shorter run time but a less reliable error
    model.
hashed_feature_ids : Bool, optional
    If true, the feature ids in the resulting table will be presented as
    hashes of the sequences defining each feature. The hash will always be
    the same for the same sequence so this allows feature tables to be
    merged across runs of this method. You should only merge tables if the
    exact same parameters are used for each run.

Returns
-------
table : FeatureTable[Frequency]
    The resulting feature table.
representative_sequences : FeatureData[Sequence]
    The resulting feature sequences. Each feature in the feature table will
    be represented by exactly one sequence, and these sequences will be the
    joined paired-end sequences.
denoising_stats : SampleData[DADA2Stats]