Warning
This site has been replaced by the new QIIME 2 “amplicon distribution” documentation, as of the 2025.4 release of QIIME 2. You can still access the content from the “old docs” here for the QIIME 2 2024.10 and earlier releases, but we recommend that you transition to the new documentation at https://amplicon-docs.qiime2.org. Content on this site is no longer updated and may be out of date.
Are you looking for:
the QIIME 2 homepage? That’s https://qiime2.org.
learning resources for microbiome marker gene (i.e., amplicon) analysis? See the QIIME 2 amplicon distribution documentation.
learning resources for microbiome metagenome analysis? See the MOSHPIT documentation.
installation instructions, plugins, books, videos, workshops, or resources? See the QIIME 2 Library.
general help? See the QIIME 2 Forum.
Old content beyond this point… 👴👵
orient-seqs: Orient input sequences by comparison against reference.¶
Citations |
|
---|
Docstring:
Usage: qiime rescript orient-seqs [OPTIONS] Orient input sequences by comparison against a set of reference sequences using VSEARCH. This action can also be used to quickly filter out sequences that (do not) match a set of reference sequences in either orientation. Alternatively, if no reference sequences are provided as input, all input sequences will be reverse-complemented. In this case, no alignment is performed, and all alignment parameters (`dbmask`, `relabel`, `relabel_keep`, `relabel_md5`, `relabel_self`, `relabel_sha1`, `sizein`, `sizeout` and `threads`) are ignored. Inputs: --i-sequences ARTIFACT FeatureData[Sequence] Sequences to be oriented. [required] --i-reference-sequences ARTIFACT FeatureData[Sequence] Reference sequences to orient against. If no reference is provided, all the sequences will be reverse complemented and all parameters will be ignored. [optional] Parameters: --p-threads INTEGER Number of computation threads to use (1 to 256). Range(1, 256) The number of threads should be lesser or equal to the number of available CPU cores. [default: 1] --p-dbmask TEXT Choices('none', 'dust', 'soft') Mask regions in the target database sequences using the dust method, or do not mask (none). When using soft masking, search commands become case sensitive. [optional] --p-relabel TEXT Relabel sequences using the prefix string and a ticker (1, 2, 3, etc.) to construct the new headers. Use --sizeout to conserve the abundance annotations. [optional] --p-relabel-keep / --p-no-relabel-keep When relabeling, keep the original identifier in the header after a space. [optional] --p-relabel-md5 / --p-no-relabel-md5 When relabeling, use the MD5 digest of the sequence as the new identifier. Use --sizeout to conserve the abundance annotations. [optional] --p-relabel-self / --p-no-relabel-self Relabel sequences using the sequence itself as a label. [optional] --p-relabel-sha1 / --p-no-relabel-sha1 When relabeling, use the SHA1 digest of the sequence as the new identifier. The probability of a collision is smaller than the MD5 algorithm. [optional] --p-sizein / --p-no-sizein In de novo mode, abundance annotations (pattern `[>;]size=integer[;]`) present in sequence headers are taken into account. [optional] --p-sizeout / --p-no-sizeout Add abundance annotations to the output FASTA files. [optional] Outputs: --o-oriented-seqs ARTIFACT FeatureData[Sequence] Query sequences in same orientation as top matching reference sequence. [required] --o-unmatched-seqs ARTIFACT FeatureData[Sequence] Query sequences that fail to match at least one reference sequence in either + or - orientation. This will be empty if no refrence is provided. [required] Miscellaneous: --output-dir PATH Output unspecified results to a directory --verbose / --quiet Display verbose output to stdout and/or stderr during execution of this action. Or silence output if execution is successful (silence is golden). --example-data PATH Write example data and exit. --citations Show citations and exit. --use-cache DIRECTORY Specify the cache to be used for the intermediate work of this action. If not provided, the default cache under $TMP/qiime2/will be used. IMPORTANT FOR HPC USERS: If you are on an HPC system and are using parallel execution it is important to set this to a location that is globally accessible to all nodes in the cluster. --help Show this message and exit.
Import:
from qiime2.plugins.rescript.methods import orient_seqs
Docstring:
Orient input sequences by comparison against reference. Orient input sequences by comparison against a set of reference sequences using VSEARCH. This action can also be used to quickly filter out sequences that (do not) match a set of reference sequences in either orientation. Alternatively, if no reference sequences are provided as input, all input sequences will be reverse-complemented. In this case, no alignment is performed, and all alignment parameters (`dbmask`, `relabel`, `relabel_keep`, `relabel_md5`, `relabel_self`, `relabel_sha1`, `sizein`, `sizeout` and `threads`) are ignored. Parameters ---------- sequences : FeatureData[Sequence] Sequences to be oriented. reference_sequences : FeatureData[Sequence], optional Reference sequences to orient against. If no reference is provided, all the sequences will be reverse complemented and all parameters will be ignored. threads : Int % Range(1, 256), optional Number of computation threads to use (1 to 256). The number of threads should be lesser or equal to the number of available CPU cores. dbmask : Str % Choices('none', 'dust', 'soft'), optional Mask regions in the target database sequences using the dust method, or do not mask (none). When using soft masking, search commands become case sensitive. relabel : Str, optional Relabel sequences using the prefix string and a ticker (1, 2, 3, etc.) to construct the new headers. Use --sizeout to conserve the abundance annotations. relabel_keep : Bool, optional When relabeling, keep the original identifier in the header after a space. relabel_md5 : Bool, optional When relabeling, use the MD5 digest of the sequence as the new identifier. Use --sizeout to conserve the abundance annotations. relabel_self : Bool, optional Relabel sequences using the sequence itself as a label. relabel_sha1 : Bool, optional When relabeling, use the SHA1 digest of the sequence as the new identifier. The probability of a collision is smaller than the MD5 algorithm. sizein : Bool, optional In de novo mode, abundance annotations (pattern `[>;]size=integer[;]`) present in sequence headers are taken into account. sizeout : Bool, optional Add abundance annotations to the output FASTA files. Returns ------- oriented_seqs : FeatureData[Sequence] Query sequences in same orientation as top matching reference sequence. unmatched_seqs : FeatureData[Sequence] Query sequences that fail to match at least one reference sequence in either + or - orientation. This will be empty if no refrence is provided.