Fork me on GitHub

denoise-pyro: Denoise and dereplicate single-end pyrosequencesΒΆ

Docstring:

Usage: qiime dada2 denoise-pyro [OPTIONS]

  This method denoises single-end pyrosequencing sequences, dereplicates
  them, and filters chimeras.

Options:
  --i-demultiplexed-seqs ARTIFACT PATH SampleData[SequencesWithQuality]
                                  The single-end demultiplexed pyrosequencing
                                  sequences (e.g. 454, IonTorrent) to be
                                  denoised.  [required]
  --p-trunc-len INTEGER           Position at which sequences should be
                                  truncated due to decrease in quality. This
                                  truncates the 3' end of the of the input
                                  sequences, which will be the bases that were
                                  sequenced in the last cycles. Reads that are
                                  shorter than this value will be discarded.
                                  If 0 is provided, no truncation or length
                                  filtering will be performed  [required]
  --p-trim-left INTEGER           Position at which sequences should be
                                  trimmed due to low quality. This trims the
                                  5' end of the of the input sequences, which
                                  will be the bases that were sequenced in the
                                  first cycles.  [default: 0]
  --p-max-ee FLOAT                Reads with number of expected errors higher
                                  than this value will be discarded.
                                  [default: 2.0]
  --p-trunc-q INTEGER             Reads are truncated at the first instance of
                                  a quality score less than or equal to this
                                  value. If the resulting read is then shorter
                                  than `trunc_len`, it is discarded.
                                  [default: 2]
  --p-max-len INTEGER             Remove reads prior to trimming or truncation
                                  which are longer than this value. If 0 is
                                  provided no reads will be removed based on
                                  length.  [default: 0]
  --p-chimera-method [consensus|pooled|none]
                                  The method used to remove chimeras. "none":
                                  No chimera removal is performed. "pooled":
                                  All reads are pooled prior to chimera
                                  detection. "consensus": Chimeras are
                                  detected in samples individually, and
                                  sequences found chimeric in a sufficient
                                  fraction of samples are removed.  [default:
                                  consensus]
  --p-min-fold-parent-over-abundance FLOAT
                                  The minimum abundance of potential parents
                                  of a sequence being tested as chimeric,
                                  expressed as a fold-change versus the
                                  abundance of the sequence being tested.
                                  Values should be greater than or equal to 1
                                  (i.e. parents should be more abundant than
                                  the sequence being tested). This parameter
                                  has no effect if chimera_method is "none".
                                  [default: 1.0]
  --p-n-threads INTEGER           The number of threads to use for
                                  multithreaded processing. If 0 is provided,
                                  all available cores will be used.  [default:
                                  1]
  --p-n-reads-learn INTEGER       The number of reads to use when training the
                                  error model. Smaller numbers will result in
                                  a shorter run time but a less reliable error
                                  model.  [default: 250000]
  --p-hashed-feature-ids / --p-no-hashed-feature-ids
                                  If true, the feature ids in the resulting
                                  table will be presented as hashes of the
                                  sequences defining each feature. The hash
                                  will always be the same for the same
                                  sequence so this allows feature tables to be
                                  merged across runs of this method. You
                                  should only merge tables if the exact same
                                  parameters are used for each run.  [default:
                                  True]
  --o-table ARTIFACT PATH FeatureTable[Frequency]
                                  The resulting feature table.  [required if
                                  not passing --output-dir]
  --o-representative-sequences ARTIFACT PATH FeatureData[Sequence]
                                  The resulting feature sequences. Each
                                  feature in the feature table will be
                                  represented by exactly one sequence.
                                  [required if not passing --output-dir]
  --o-denoising-stats ARTIFACT PATH SampleData[DADA2Stats]
                                  [required if not passing --output-dir]
  --output-dir DIRECTORY          Output unspecified results to a directory
  --cmd-config FILE               Use config file for command options
  --verbose                       Display verbose output to stdout and/or
                                  stderr during execution of this action.
                                  [default: False]
  --quiet                         Silence output if execution is successful
                                  (silence is golden).  [default: False]
  --citations                     Show citations and exit.
  --help                          Show this message and exit.

Import:

from qiime2.plugins.dada2.methods import denoise_pyro

Docstring:

Denoise and dereplicate single-end pyrosequences

This method denoises single-end pyrosequencing sequences, dereplicates
them, and filters chimeras.

Parameters
----------
demultiplexed_seqs : SampleData[SequencesWithQuality]
    The single-end demultiplexed pyrosequencing sequences (e.g. 454,
    IonTorrent) to be denoised.
trunc_len : Int
    Position at which sequences should be truncated due to decrease in
    quality. This truncates the 3' end of the of the input sequences, which
    will be the bases that were sequenced in the last cycles. Reads that
    are shorter than this value will be discarded. If 0 is provided, no
    truncation or length filtering will be performed
trim_left : Int, optional
    Position at which sequences should be trimmed due to low quality. This
    trims the 5' end of the of the input sequences, which will be the bases
    that were sequenced in the first cycles.
max_ee : Float, optional
    Reads with number of expected errors higher than this value will be
    discarded.
trunc_q : Int, optional
    Reads are truncated at the first instance of a quality score less than
    or equal to this value. If the resulting read is then shorter than
    `trunc_len`, it is discarded.
max_len : Int, optional
    Remove reads prior to trimming or truncation which are longer than this
    value. If 0 is provided no reads will be removed based on length.
chimera_method : Str % Choices({'consensus', 'none', 'pooled'}), optional
    The method used to remove chimeras. "none": No chimera removal is
    performed. "pooled": All reads are pooled prior to chimera detection.
    "consensus": Chimeras are detected in samples individually, and
    sequences found chimeric in a sufficient fraction of samples are
    removed.
min_fold_parent_over_abundance : Float, optional
    The minimum abundance of potential parents of a sequence being tested
    as chimeric, expressed as a fold-change versus the abundance of the
    sequence being tested. Values should be greater than or equal to 1
    (i.e. parents should be more abundant than the sequence being tested).
    This parameter has no effect if chimera_method is "none".
n_threads : Int, optional
    The number of threads to use for multithreaded processing. If 0 is
    provided, all available cores will be used.
n_reads_learn : Int, optional
    The number of reads to use when training the error model. Smaller
    numbers will result in a shorter run time but a less reliable error
    model.
hashed_feature_ids : Bool, optional
    If true, the feature ids in the resulting table will be presented as
    hashes of the sequences defining each feature. The hash will always be
    the same for the same sequence so this allows feature tables to be
    merged across runs of this method. You should only merge tables if the
    exact same parameters are used for each run.

Returns
-------
table : FeatureTable[Frequency]
    The resulting feature table.
representative_sequences : FeatureData[Sequence]
    The resulting feature sequences. Each feature in the feature table will
    be represented by exactly one sequence.
denoising_stats : SampleData[DADA2Stats]