Fork me on GitHub

orient-seqs: Orient input sequences by comparison against reference.

Citations
  • Torbjørn Rognes, Tomáš Flouri, Ben Nichols, Christopher Quince, and Frédéric Mahé. Vsearch: a versatile open source tool for metagenomics. PeerJ, 4:e2584, 2016. doi:10.7717/peerj.2584.

Docstring:

Usage: qiime rescript orient-seqs [OPTIONS]

  Orient input sequences by comparison against a set of reference sequences
  using VSEARCH. This action can also be used to quickly filter out sequences
  that (do not) match a set of reference sequences in either orientation.
  Alternatively, if no reference sequences are provided as input, all input
  sequences will be reverse-complemented. In this case, no alignment is
  performed, and all alignment parameters (`dbmask`, `relabel`,
  `relabel_keep`, `relabel_md5`, `relabel_self`, `relabel_sha1`, `sizein`,
  `sizeout` and `threads`) are ignored.

Inputs:
  --i-sequences ARTIFACT FeatureData[Sequence]
                          Sequences to be oriented.                 [required]
  --i-reference-sequences ARTIFACT FeatureData[Sequence]
                          Reference sequences to orient against. If no
                          reference is provided, all the sequences will be
                          reverse complemented and all parameters will be
                          ignored.                                  [optional]
Parameters:
  --p-threads INTEGER     Number of computation threads to use (1 to 256).
    Range(1, 256)         The number of threads should be lesser or equal to
                          the number of available CPU cores.      [default: 1]
  --p-dbmask TEXT Choices('none', 'dust', 'soft')
                          Mask regions in the target database sequences using
                          the dust method, or do not mask (none). When using
                          soft masking, search commands become case sensitive.
                                                                    [optional]
  --p-relabel TEXT        Relabel sequences using the prefix string and a
                          ticker (1, 2, 3, etc.) to construct the new headers.
                          Use --sizeout to conserve the abundance annotations.
                                                                    [optional]
  --p-relabel-keep / --p-no-relabel-keep
                          When relabeling, keep the original identifier in
                          the header after a space.                 [optional]
  --p-relabel-md5 / --p-no-relabel-md5
                          When relabeling, use the MD5 digest of the sequence
                          as the new identifier. Use --sizeout to conserve the
                          abundance annotations.                    [optional]
  --p-relabel-self / --p-no-relabel-self
                          Relabel sequences using the sequence itself as a
                          label.                                    [optional]
  --p-relabel-sha1 / --p-no-relabel-sha1
                          When relabeling, use the SHA1 digest of the
                          sequence as the new identifier. The probability of a
                          collision is smaller than the MD5 algorithm.
                                                                    [optional]
  --p-sizein / --p-no-sizein
                          In de novo mode, abundance annotations (pattern
                          `[>;]size=integer[;]`) present in sequence headers
                          are taken into account.                   [optional]
  --p-sizeout / --p-no-sizeout
                          Add abundance annotations to the output FASTA
                          files.                                    [optional]
Outputs:
  --o-oriented-seqs ARTIFACT FeatureData[Sequence]
                          Query sequences in same orientation as top matching
                          reference sequence.                       [required]
  --o-unmatched-seqs ARTIFACT FeatureData[Sequence]
                          Query sequences that fail to match at least one
                          reference sequence in either + or - orientation.
                          This will be empty if no refrence is provided.
                                                                    [required]
Miscellaneous:
  --output-dir PATH       Output unspecified results to a directory
  --verbose / --quiet     Display verbose output to stdout and/or stderr
                          during execution of this action. Or silence output
                          if execution is successful (silence is golden).
  --example-data PATH     Write example data and exit.
  --citations             Show citations and exit.
  --use-cache DIRECTORY   Specify the cache to be used for the intermediate
                          work of this action. If not provided, the default
                          cache under $TMP/qiime2/ will be used.
                          IMPORTANT FOR HPC USERS: If you are on an HPC system
                          and are using parallel execution it is important to
                          set this to a location that is globally accessible
                          to all nodes in the cluster.
  --help                  Show this message and exit.

Import:

from qiime2.plugins.rescript.methods import orient_seqs

Docstring:

Orient input sequences by comparison against reference.

Orient input sequences by comparison against a set of reference sequences
using VSEARCH. This action can also be used to quickly filter out sequences
that (do not) match a set of reference sequences in either orientation.
Alternatively, if no reference sequences are provided as input, all input
sequences will be reverse-complemented. In this case, no alignment is
performed, and all alignment parameters (`dbmask`, `relabel`,
`relabel_keep`, `relabel_md5`, `relabel_self`, `relabel_sha1`, `sizein`,
`sizeout` and `threads`) are ignored.

Parameters
----------
sequences : FeatureData[Sequence]
    Sequences to be oriented.
reference_sequences : FeatureData[Sequence], optional
    Reference sequences to orient against. If no reference is provided, all
    the sequences will be reverse complemented and all parameters will be
    ignored.
threads : Int % Range(1, 256), optional
    Number of computation threads to use (1 to 256). The number of threads
    should be lesser or equal to the number of available CPU cores.
dbmask : Str % Choices('none', 'dust', 'soft'), optional
    Mask regions in the target database sequences using the dust method, or
    do not mask (none). When using soft masking, search commands become
    case sensitive.
relabel : Str, optional
    Relabel sequences using the prefix string and a ticker (1, 2, 3, etc.)
    to construct the new headers. Use --sizeout to conserve the abundance
    annotations.
relabel_keep : Bool, optional
    When relabeling, keep the original identifier in the header after a
    space.
relabel_md5 : Bool, optional
    When relabeling, use the MD5 digest of the sequence as the new
    identifier. Use --sizeout to conserve the abundance annotations.
relabel_self : Bool, optional
    Relabel sequences using the sequence itself as a label.
relabel_sha1 : Bool, optional
    When relabeling, use the SHA1 digest of the sequence as the new
    identifier. The probability of a collision is smaller than the MD5
    algorithm.
sizein : Bool, optional
    In de novo mode, abundance annotations (pattern `[>;]size=integer[;]`)
    present in sequence headers are taken into account.
sizeout : Bool, optional
    Add abundance annotations to the output FASTA files.

Returns
-------
oriented_seqs : FeatureData[Sequence]
    Query sequences in same orientation as top matching reference sequence.
unmatched_seqs : FeatureData[Sequence]
    Query sequences that fail to match at least one reference sequence in
    either + or - orientation. This will be empty if no refrence is
    provided.