Docstring:
Usage: qiime rescript orient-seqs [OPTIONS]
Orient input sequences by comparison against a set of reference sequences
using VSEARCH. This action can also be used to quickly filter out sequences
that (do not) match a set of reference sequences in either orientation.
Alternatively, if no reference sequences are provided as input, all input
sequences will be reverse-complemented. In this case, no alignment is
performed, and all alignment parameters (`dbmask`, `relabel`,
`relabel_keep`, `relabel_md5`, `relabel_self`, `relabel_sha1`, `sizein`,
`sizeout` and `threads`) are ignored.
Inputs:
--i-sequences ARTIFACT FeatureData[Sequence]
Sequences to be oriented. [required]
--i-reference-sequences ARTIFACT FeatureData[Sequence]
Reference sequences to orient against. If no
reference is provided, all the sequences will be
reverse complemented and all parameters will be
ignored. [optional]
Parameters:
--p-threads INTEGER Number of computation threads to use (1 to 256).
Range(1, 256) The number of threads should be lesser or equal to
the number of available CPU cores. [default: 1]
--p-dbmask TEXT Choices('none', 'dust', 'soft')
Mask regions in the target database sequences using
the dust method, or do not mask (none). When using
soft masking, search commands become case sensitive.
[optional]
--p-relabel TEXT Relabel sequences using the prefix string and a
ticker (1, 2, 3, etc.) to construct the new headers.
Use --sizeout to conserve the abundance annotations.
[optional]
--p-relabel-keep / --p-no-relabel-keep
When relabeling, keep the original identifier in
the header after a space. [optional]
--p-relabel-md5 / --p-no-relabel-md5
When relabeling, use the MD5 digest of the sequence
as the new identifier. Use --sizeout to conserve the
abundance annotations. [optional]
--p-relabel-self / --p-no-relabel-self
Relabel sequences using the sequence itself as a
label. [optional]
--p-relabel-sha1 / --p-no-relabel-sha1
When relabeling, use the SHA1 digest of the
sequence as the new identifier. The probability of a
collision is smaller than the MD5 algorithm.
[optional]
--p-sizein / --p-no-sizein
In de novo mode, abundance annotations (pattern
`[>;]size=integer[;]`) present in sequence headers
are taken into account. [optional]
--p-sizeout / --p-no-sizeout
Add abundance annotations to the output FASTA
files. [optional]
Outputs:
--o-oriented-seqs ARTIFACT FeatureData[Sequence]
Query sequences in same orientation as top matching
reference sequence. [required]
--o-unmatched-seqs ARTIFACT FeatureData[Sequence]
Query sequences that fail to match at least one
reference sequence in either + or - orientation.
This will be empty if no refrence is provided.
[required]
Miscellaneous:
--output-dir PATH Output unspecified results to a directory
--verbose / --quiet Display verbose output to stdout and/or stderr
during execution of this action. Or silence output
if execution is successful (silence is golden).
--example-data PATH Write example data and exit.
--citations Show citations and exit.
--use-cache DIRECTORY Specify the cache to be used for the intermediate
work of this action. If not provided, the default
cache under $TMP/qiime2/ will be used.
IMPORTANT FOR HPC USERS: If you are on an HPC system
and are using parallel execution it is important to
set this to a location that is globally accessible
to all nodes in the cluster.
--help Show this message and exit.
Import:
from qiime2.plugins.rescript.methods import orient_seqs
Docstring:
Orient input sequences by comparison against reference.
Orient input sequences by comparison against a set of reference sequences
using VSEARCH. This action can also be used to quickly filter out sequences
that (do not) match a set of reference sequences in either orientation.
Alternatively, if no reference sequences are provided as input, all input
sequences will be reverse-complemented. In this case, no alignment is
performed, and all alignment parameters (`dbmask`, `relabel`,
`relabel_keep`, `relabel_md5`, `relabel_self`, `relabel_sha1`, `sizein`,
`sizeout` and `threads`) are ignored.
Parameters
----------
sequences : FeatureData[Sequence]
Sequences to be oriented.
reference_sequences : FeatureData[Sequence], optional
Reference sequences to orient against. If no reference is provided, all
the sequences will be reverse complemented and all parameters will be
ignored.
threads : Int % Range(1, 256), optional
Number of computation threads to use (1 to 256). The number of threads
should be lesser or equal to the number of available CPU cores.
dbmask : Str % Choices('none', 'dust', 'soft'), optional
Mask regions in the target database sequences using the dust method, or
do not mask (none). When using soft masking, search commands become
case sensitive.
relabel : Str, optional
Relabel sequences using the prefix string and a ticker (1, 2, 3, etc.)
to construct the new headers. Use --sizeout to conserve the abundance
annotations.
relabel_keep : Bool, optional
When relabeling, keep the original identifier in the header after a
space.
relabel_md5 : Bool, optional
When relabeling, use the MD5 digest of the sequence as the new
identifier. Use --sizeout to conserve the abundance annotations.
relabel_self : Bool, optional
Relabel sequences using the sequence itself as a label.
relabel_sha1 : Bool, optional
When relabeling, use the SHA1 digest of the sequence as the new
identifier. The probability of a collision is smaller than the MD5
algorithm.
sizein : Bool, optional
In de novo mode, abundance annotations (pattern `[>;]size=integer[;]`)
present in sequence headers are taken into account.
sizeout : Bool, optional
Add abundance annotations to the output FASTA files.
Returns
-------
oriented_seqs : FeatureData[Sequence]
Query sequences in same orientation as top matching reference sequence.
unmatched_seqs : FeatureData[Sequence]
Query sequences that fail to match at least one reference sequence in
either + or - orientation. This will be empty if no refrence is
provided.