Warning
This site has been replaced by the new QIIME 2 “amplicon distribution” documentation, as of the 2025.4 release of QIIME 2. You can still access the content from the “old docs” here for the QIIME 2 2024.10 and earlier releases, but we recommend that you transition to the new documentation at https://amplicon-docs.qiime2.org. Content on this site is no longer updated and may be out of date.
Are you looking for:
the QIIME 2 homepage? That’s https://qiime2.org.
learning resources for microbiome marker gene (i.e., amplicon) analysis? See the QIIME 2 amplicon distribution documentation.
learning resources for microbiome metagenome analysis? See the MOSHPIT documentation.
installation instructions, plugins, books, videos, workshops, or resources? See the QIIME 2 Library.
general help? See the QIIME 2 Forum.
Old content beyond this point… 👴👵
denoise-ccs: Denoise and dereplicate single-end Pacbio CCS¶
Docstring:
Usage: qiime dada2 denoise-ccs [OPTIONS] This method denoises single-end Pacbio CCS sequences, dereplicates them, and filters chimeras. Tutorial and workflow: https://github.com/benjjneb/LRASManuscript Inputs: --i-demultiplexed-seqs ARTIFACT SampleData[SequencesWithQuality] The single-end demultiplexed PacBio CCS sequences to be denoised. [required] Parameters: --p-front TEXT Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded. Each read is re-oriented if the reverse complement of the read is a better match to the provided primer sequence. This is recommended for PacBio CCS reads, which come in a random mix of forward and reverse-complement orientations. [required] --p-adapter TEXT Sequence of an adapter ligated to the 3' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded. [optional] --p-max-mismatch INTEGER The number of mismatches to tolerate when matching reads to primer sequences - see http://benjjneb.github.io/dada2/ for complete details. [default: 2] --p-indels / --p-no-indels Allow insertions or deletions of bases when matching adapters. Note that primer matching can be significantly slower, currently about 4x slower [default: False] --p-trunc-len INTEGER Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed. Note: Since Pacbio CCS sequences were normally with very high quality scores, there is no need to truncate the Pacbio CCS sequences. [default: 0] --p-trim-left INTEGER Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles. [default: 0] --p-max-ee NUMBER Reads with number of expected errors higher than this value will be discarded. [default: 2.0] --p-trunc-q INTEGER Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than `trunc-len`, it is discarded. [default: 2] --p-min-len INTEGER Remove reads with length less than minLen. minLen is enforced after trimming and truncation. For 16S Pacbio CCS, suggest 1000. [default: 20] --p-max-len INTEGER Remove reads prior to trimming or truncation which are longer than this value. If 0 is provided no reads will be removed based on length. For 16S Pacbio CCS, suggest 1600. [default: 0] --p-pooling-method TEXT Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised indpendently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs. [default: 'independent'] --p-chimera-method TEXT Choices('consensus', 'none', 'pooled') The method used to remove chimeras. "none": No chimera removal is performed. "pooled": All reads are pooled prior to chimera detection. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed. [default: 'consensus'] --p-min-fold-parent-over-abundance NUMBER The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). Suggest 3.5. This parameter has no effect if chimera-method is "none". [default: 3.5] --p-allow-one-off / --p-no-allow-one-off Bimeras that are one-off from exact are also identified if the `allow-one-off` argument is True. If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera. [default: False] --p-n-threads NTHREADS The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used. [default: 1] --p-n-reads-learn INTEGER The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model. [default: 1000000] --p-hashed-feature-ids / --p-no-hashed-feature-ids If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run. [default: True] --p-retain-all-samples / --p-no-retain-all-samples If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table. [default: True] Outputs: --o-table ARTIFACT FeatureTable[Frequency] The resulting feature table. [required] --o-representative-sequences ARTIFACT FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence. [required] --o-denoising-stats ARTIFACT SampleData[DADA2Stats] [required] Miscellaneous: --output-dir PATH Output unspecified results to a directory --verbose / --quiet Display verbose output to stdout and/or stderr during execution of this action. Or silence output if execution is successful (silence is golden). --example-data PATH Write example data and exit. --citations Show citations and exit. --use-cache DIRECTORY Specify the cache to be used for the intermediate work of this action. If not provided, the default cache under $TMP/qiime2/will be used. IMPORTANT FOR HPC USERS: If you are on an HPC system and are using parallel execution it is important to set this to a location that is globally accessible to all nodes in the cluster. --help Show this message and exit.
Import:
from qiime2.plugins.dada2.methods import denoise_ccs
Docstring:
Denoise and dereplicate single-end Pacbio CCS This method denoises single-end Pacbio CCS sequences, dereplicates them, and filters chimeras. Tutorial and workflow: https://github.com/benjjneb/LRASManuscript Parameters ---------- demultiplexed_seqs : SampleData[SequencesWithQuality] The single-end demultiplexed PacBio CCS sequences to be denoised. front : Str Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded. Each read is re-oriented if the reverse complement of the read is a better match to the provided primer sequence. This is recommended for PacBio CCS reads, which come in a random mix of forward and reverse-complement orientations. adapter : Str, optional Sequence of an adapter ligated to the 3' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded. max_mismatch : Int, optional The number of mismatches to tolerate when matching reads to primer sequences - see http://benjjneb.github.io/dada2/ for complete details. indels : Bool, optional Allow insertions or deletions of bases when matching adapters. Note that primer matching can be significantly slower, currently about 4x slower trunc_len : Int, optional Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed. Note: Since Pacbio CCS sequences were normally with very high quality scores, there is no need to truncate the Pacbio CCS sequences. trim_left : Int, optional Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles. max_ee : Float, optional Reads with number of expected errors higher than this value will be discarded. trunc_q : Int, optional Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than `trunc_len`, it is discarded. min_len : Int, optional Remove reads with length less than minLen. minLen is enforced after trimming and truncation. For 16S Pacbio CCS, suggest 1000. max_len : Int, optional Remove reads prior to trimming or truncation which are longer than this value. If 0 is provided no reads will be removed based on length. For 16S Pacbio CCS, suggest 1600. pooling_method : Str % Choices('independent', 'pseudo'), optional The method used to pool samples for denoising. "independent": Samples are denoised indpendently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs. chimera_method : Str % Choices('consensus', 'none', 'pooled'), optional The method used to remove chimeras. "none": No chimera removal is performed. "pooled": All reads are pooled prior to chimera detection. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed. min_fold_parent_over_abundance : Float, optional The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). Suggest 3.5. This parameter has no effect if chimera_method is "none". allow_one_off : Bool, optional Bimeras that are one-off from exact are also identified if the `allow_one_off` argument is True. If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera. n_threads : Threads, optional The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used. n_reads_learn : Int, optional The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model. hashed_feature_ids : Bool, optional If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run. retain_all_samples : Bool, optional If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table. Returns ------- table : FeatureTable[Frequency] The resulting feature table. representative_sequences : FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence. denoising_stats : SampleData[DADA2Stats]