Fork me on GitHub

cull-seqs: Removes sequences that contain at least the specified number of degenerate bases and/or homopolymers of a given length.ΒΆ

Docstring:

Usage: qiime rescript cull-seqs [OPTIONS]

  Filter DNA or RNA sequences that contain ambiguous bases and homopolymers,
  and output filtered DNA sequences. Removes DNA sequences that have the
  specified number, or more, of IUPAC compliant degenerate bases. Remaining
  sequences are removed if they contain homopolymers equal to or longer than
  the specified length. If the input consists of RNA sequences, they are
  reverse transcribed to DNA before filtering.

Inputs:
  --i-sequences ARTIFACT FeatureData[Sequence | RNASequence]
                          DNA or RNA Sequences to be screened for removal
                          based on degenerate base and homopolymer screening
                          criteria.                                 [required]
Parameters:
  --p-num-degenerates INTEGER
    Range(1, None)        Sequences with N, or more, degenerate bases will be
                          removed.                                [default: 5]
  --p-homopolymer-length INTEGER
    Range(2, None)        Sequences containing a homopolymer sequence of
                          length N, or greater, will be removed.  [default: 8]
  --p-n-jobs INTEGER      Number of concurrent processes to use while
    Range(1, None)        processing sequences. More is faster but typically
                          should not be higher than the number of available
                          CPUs. Output sequence order may change when using
                          multiple jobs.                          [default: 1]
Outputs:
  --o-clean-sequences ARTIFACT FeatureData[Sequence]
                          The resulting DNA sequences that pass degenerate
                          base and homopolymer screening criteria.  [required]
Miscellaneous:
  --output-dir PATH       Output unspecified results to a directory
  --verbose / --quiet     Display verbose output to stdout and/or stderr
                          during execution of this action. Or silence output
                          if execution is successful (silence is golden).
  --example-data PATH     Write example data and exit.
  --citations             Show citations and exit.
  --use-cache DIRECTORY   Specify the cache to be used for the intermediate
                          work of this action. If not provided, the default
                          cache under $TMP/qiime2/ will be used.
                          IMPORTANT FOR HPC USERS: If you are on an HPC system
                          and are using parallel execution it is important to
                          set this to a location that is globally accessible
                          to all nodes in the cluster.
  --help                  Show this message and exit.

Import:

from qiime2.plugins.rescript.methods import cull_seqs

Docstring:

Removes sequences that contain at least the specified number of degenerate
bases and/or homopolymers of a given length.

Filter DNA or RNA sequences that contain ambiguous bases and homopolymers,
and output filtered DNA sequences. Removes DNA sequences that have the
specified number, or more, of IUPAC compliant degenerate bases. Remaining
sequences are removed if they contain homopolymers equal to or longer than
the specified length. If the input consists of RNA sequences, they are
reverse transcribed to DNA before filtering.

Parameters
----------
sequences : FeatureData[Sequence | RNASequence]
    DNA or RNA Sequences to be screened for removal based on degenerate
    base and homopolymer screening criteria.
num_degenerates : Int % Range(1, None), optional
    Sequences with N, or more, degenerate bases will be removed.
homopolymer_length : Int % Range(2, None), optional
    Sequences containing a homopolymer sequence of length N, or greater,
    will be removed.
n_jobs : Int % Range(1, None), optional
    Number of concurrent processes to use while processing sequences. More
    is faster but typically should not be higher than the number of
    available CPUs. Output sequence order may change when using multiple
    jobs.

Returns
-------
clean_sequences : FeatureData[Sequence]
    The resulting DNA sequences that pass degenerate base and homopolymer
    screening criteria.