Warning
This site has been replaced by the new QIIME 2 “amplicon distribution” documentation, as of the 2025.4 release of QIIME 2. You can still access the content from the “old docs” here for the QIIME 2 2024.10 and earlier releases, but we recommend that you transition to the new documentation at https://amplicon-docs.qiime2.org. Content on this site is no longer updated and may be out of date.
Are you looking for:
the QIIME 2 homepage? That’s https://qiime2.org.
learning resources for microbiome marker gene (i.e., amplicon) analysis? See the QIIME 2 amplicon distribution documentation.
learning resources for microbiome metagenome analysis? See the MOSHPIT documentation.
installation instructions, plugins, books, videos, workshops, or resources? See the QIIME 2 Library.
general help? See the QIIME 2 Forum.
Old content beyond this point… 👴👵
cull-seqs: Removes sequences that contain at least the specified number of degenerate bases and/or homopolymers of a given length.¶
Docstring:
Usage: qiime rescript cull-seqs [OPTIONS] Filter DNA or RNA sequences that contain ambiguous bases and homopolymers, and output filtered DNA sequences. Removes DNA sequences that have the specified number, or more, of IUPAC compliant degenerate bases. Remaining sequences are removed if they contain homopolymers equal to or longer than the specified length. If the input consists of RNA sequences, they are reverse transcribed to DNA before filtering. Inputs: --i-sequences ARTIFACT FeatureData[Sequence | RNASequence] DNA or RNA Sequences to be screened for removal based on degenerate base and homopolymer screening criteria. [required] Parameters: --p-num-degenerates INTEGER Range(1, None) Sequences with N, or more, degenerate bases will be removed. [default: 5] --p-homopolymer-length INTEGER Range(2, None) Sequences containing a homopolymer sequence of length N, or greater, will be removed. [default: 8] --p-n-jobs INTEGER Number of concurrent processes to use while Range(1, None) processing sequences. More is faster but typically should not be higher than the number of available CPUs. Output sequence order may change when using multiple jobs. [default: 1] Outputs: --o-clean-sequences ARTIFACT FeatureData[Sequence] The resulting DNA sequences that pass degenerate base and homopolymer screening criteria. [required] Miscellaneous: --output-dir PATH Output unspecified results to a directory --verbose / --quiet Display verbose output to stdout and/or stderr during execution of this action. Or silence output if execution is successful (silence is golden). --example-data PATH Write example data and exit. --citations Show citations and exit. --use-cache DIRECTORY Specify the cache to be used for the intermediate work of this action. If not provided, the default cache under $TMP/qiime2/will be used. IMPORTANT FOR HPC USERS: If you are on an HPC system and are using parallel execution it is important to set this to a location that is globally accessible to all nodes in the cluster. --help Show this message and exit.
Import:
from qiime2.plugins.rescript.methods import cull_seqs
Docstring:
Removes sequences that contain at least the specified number of degenerate bases and/or homopolymers of a given length. Filter DNA or RNA sequences that contain ambiguous bases and homopolymers, and output filtered DNA sequences. Removes DNA sequences that have the specified number, or more, of IUPAC compliant degenerate bases. Remaining sequences are removed if they contain homopolymers equal to or longer than the specified length. If the input consists of RNA sequences, they are reverse transcribed to DNA before filtering. Parameters ---------- sequences : FeatureData[Sequence | RNASequence] DNA or RNA Sequences to be screened for removal based on degenerate base and homopolymer screening criteria. num_degenerates : Int % Range(1, None), optional Sequences with N, or more, degenerate bases will be removed. homopolymer_length : Int % Range(2, None), optional Sequences containing a homopolymer sequence of length N, or greater, will be removed. n_jobs : Int % Range(1, None), optional Number of concurrent processes to use while processing sequences. More is faster but typically should not be higher than the number of available CPUs. Output sequence order may change when using multiple jobs. Returns ------- clean_sequences : FeatureData[Sequence] The resulting DNA sequences that pass degenerate base and homopolymer screening criteria.