extract-reads: Extract reads from reference sequences.ΒΆ
Docstring:
Usage: qiime feature-classifier extract-reads [OPTIONS] Extract simulated amplicon reads from a reference database. Performs in- silico PCR to extract simulated amplicons from reference sequences that match the input primer sequences (within the mismatch threshold specified by `identity`). Both primer sequences must be in the 5' -> 3' orientation. Sequences that fail to match both primers will be excluded. Reads are extracted, trimmed, and filtered in the following order: 1. reads are extracted in specified orientation; 2. primers are removed; 3. reads longer than `max_length` are removed; 4. reads are trimmed with `trim_right`; 5. reads are truncated to `trunc_len`; 6. reads are trimmed with `trim_left`; 7. reads shorter than `min_length` are removed. Inputs: --i-sequences ARTIFACT FeatureData[Sequence] [required] Parameters: --p-f-primer TEXT forward primer sequence (5' -> 3'). [required] --p-r-primer TEXT reverse primer sequence (5' -> 3'). Do not use reverse-complemented primer sequence. [required] --p-trim-right INTEGER trim-right nucleotides are removed from the 3' end if trim-right is positive. Applied before trunc-len and trim-left. [default: 0] --p-trunc-len INTEGER read is cut to trunc-len if trunc-len is positive. Applied after trim-right but before trim-left. [default: 0] --p-trim-left INTEGER trim-left nucleotides are removed from the 5' end if trim-left is positive. Applied after trim-right and trunc-len. [default: 0] --p-identity NUMBER minimum combined primer match identity threshold. [default: 0.8] --p-min-length INTEGER Minimum amplicon length. Shorter amplicons are Range(0, None) discarded. Applied after trimming and truncation, so be aware that trimming may impact sequence retention. Set to zero to disable min length filtering. [default: 50] --p-max-length INTEGER Maximum amplicon length. Longer amplicons are Range(0, None) discarded. Applied before trimming and truncation, so plan accordingly. Set to zero (default) to disable max length filtering. [default: 0] --p-n-jobs INTEGER Number of seperate processes to run. Range(1, None) [default: 1] --p-batch-size VALUE Int % Range(1, None) | Str % Choices('auto') Number of sequences to process in a batch. The `auto` option is calculated from the number of sequences and number of jobs specified. [default: 'auto'] --p-read-orientation TEXT Choices('both', 'forward', 'reverse') Orientation of primers relative to the sequences: "forward" searches for primer hits in the forward direction, "reverse" searches reverse-complement, and "both" searches both directions. [default: 'both'] Outputs: --o-reads ARTIFACT FeatureData[Sequence] [required] Miscellaneous: --output-dir PATH Output unspecified results to a directory --verbose / --quiet Display verbose output to stdout and/or stderr during execution of this action. Or silence output if execution is successful (silence is golden). --example-data PATH Write example data and exit. --citations Show citations and exit. --use-cache DIRECTORY Specify the cache to be used for the intermediate work of this action. If not provided, the default cache under $TMP/qiime2/will be used. IMPORTANT FOR HPC USERS: If you are on an HPC system and are using parallel execution it is important to set this to a location that is globally accessible to all nodes in the cluster. --help Show this message and exit.
Import:
from qiime2.plugins.feature_classifier.methods import extract_reads
Docstring:
Extract reads from reference sequences. Extract simulated amplicon reads from a reference database. Performs in- silico PCR to extract simulated amplicons from reference sequences that match the input primer sequences (within the mismatch threshold specified by `identity`). Both primer sequences must be in the 5' -> 3' orientation. Sequences that fail to match both primers will be excluded. Reads are extracted, trimmed, and filtered in the following order: 1. reads are extracted in specified orientation; 2. primers are removed; 3. reads longer than `max_length` are removed; 4. reads are trimmed with `trim_right`; 5. reads are truncated to `trunc_len`; 6. reads are trimmed with `trim_left`; 7. reads shorter than `min_length` are removed. Parameters ---------- sequences : FeatureData[Sequence] f_primer : Str forward primer sequence (5' -> 3'). r_primer : Str reverse primer sequence (5' -> 3'). Do not use reverse-complemented primer sequence. trim_right : Int, optional trim_right nucleotides are removed from the 3' end if trim_right is positive. Applied before trunc_len and trim_left. trunc_len : Int, optional read is cut to trunc_len if trunc_len is positive. Applied after trim_right but before trim_left. trim_left : Int, optional trim_left nucleotides are removed from the 5' end if trim_left is positive. Applied after trim_right and trunc_len. identity : Float, optional minimum combined primer match identity threshold. min_length : Int % Range(0, None), optional Minimum amplicon length. Shorter amplicons are discarded. Applied after trimming and truncation, so be aware that trimming may impact sequence retention. Set to zero to disable min length filtering. max_length : Int % Range(0, None), optional Maximum amplicon length. Longer amplicons are discarded. Applied before trimming and truncation, so plan accordingly. Set to zero (default) to disable max length filtering. n_jobs : Int % Range(1, None), optional Number of seperate processes to run. batch_size : Int % Range(1, None) | Str % Choices('auto'), optional Number of sequences to process in a batch. The `auto` option is calculated from the number of sequences and number of jobs specified. read_orientation : Str % Choices('both', 'forward', 'reverse'), optional Orientation of primers relative to the sequences: "forward" searches for primer hits in the forward direction, "reverse" searches reverse- complement, and "both" searches both directions. Returns ------- reads : FeatureData[Sequence]