Fork me on GitHub

extract-reads: Extract reads from reference sequences.ΒΆ

Docstring:

Usage: qiime feature-classifier extract-reads [OPTIONS]

  Extract simulated amplicon reads from a reference database. Performs in-
  silico PCR to extract simulated amplicons from reference sequences that
  match the input primer sequences (within the mismatch threshold specified by
  `identity`). Both primer sequences must be in the 5' -> 3' orientation.
  Sequences that fail to match both primers will be excluded. Reads are
  extracted, trimmed, and filtered in the following order: 1. reads are
  extracted in specified orientation; 2. primers are removed; 3. reads longer
  than `max_length` are removed; 4. reads are trimmed with `trim_right`; 5.
  reads are truncated to `trunc_len`; 6. reads are trimmed with `trim_left`;
  7. reads shorter than `min_length` are removed.

Inputs:
  --i-sequences ARTIFACT FeatureData[Sequence]
                                                                    [required]
Parameters:
  --p-f-primer TEXT       forward primer sequence (5' -> 3').       [required]
  --p-r-primer TEXT       reverse primer sequence (5' -> 3'). Do not use
                          reverse-complemented primer sequence.     [required]
  --p-trim-right INTEGER  trim-right nucleotides are removed from the 3' end
                          if trim-right is positive. Applied before trunc-len
                          and trim-left.                          [default: 0]
  --p-trunc-len INTEGER   read is cut to trunc-len if trunc-len is positive.
                          Applied after trim-right but before trim-left.
                                                                  [default: 0]
  --p-trim-left INTEGER   trim-left nucleotides are removed from the 5' end
                          if trim-left is positive. Applied after trim-right
                          and trunc-len.                          [default: 0]
  --p-identity NUMBER     minimum combined primer match identity threshold.
                                                                [default: 0.8]
  --p-min-length INTEGER  Minimum amplicon length. Shorter amplicons are
    Range(0, None)        discarded. Applied after trimming and truncation, so
                          be aware that trimming may impact sequence
                          retention. Set to zero to disable min length
                          filtering.                             [default: 50]
  --p-max-length INTEGER  Maximum amplicon length. Longer amplicons are
    Range(0, None)        discarded. Applied before trimming and truncation,
                          so plan accordingly. Set to zero (default) to
                          disable max length filtering.           [default: 0]
  --p-n-jobs INTEGER      Number of seperate processes to run.
    Range(1, None)                                                [default: 1]
  --p-batch-size VALUE Int % Range(1, None) | Str % Choices('auto')
                          Number of sequences to process in a batch. The
                          `auto` option is calculated from the number of
                          sequences and number of jobs specified.
                                                             [default: 'auto']
  --p-read-orientation TEXT Choices('both', 'forward', 'reverse')
                          Orientation of primers relative to the sequences:
                          "forward" searches for primer hits in the forward
                          direction, "reverse" searches reverse-complement,
                          and "both" searches both directions.
                                                             [default: 'both']
Outputs:
  --o-reads ARTIFACT FeatureData[Sequence]
                                                                    [required]
Miscellaneous:
  --output-dir PATH       Output unspecified results to a directory
  --verbose / --quiet     Display verbose output to stdout and/or stderr
                          during execution of this action. Or silence output
                          if execution is successful (silence is golden).
  --example-data PATH     Write example data and exit.
  --citations             Show citations and exit.
  --use-cache DIRECTORY   Specify the cache to be used for the intermediate
                          work of this action. If not provided, the default
                          cache under $TMP/qiime2/ will be used.
                          IMPORTANT FOR HPC USERS: If you are on an HPC system
                          and are using parallel execution it is important to
                          set this to a location that is globally accessible
                          to all nodes in the cluster.
  --help                  Show this message and exit.

Import:

from qiime2.plugins.feature_classifier.methods import extract_reads

Docstring:

Extract reads from reference sequences.

Extract simulated amplicon reads from a reference database. Performs in-
silico PCR to extract simulated amplicons from reference sequences that
match the input primer sequences (within the mismatch threshold specified
by `identity`). Both primer sequences must be in the 5' -> 3' orientation.
Sequences that fail to match both primers will be excluded. Reads are
extracted, trimmed, and filtered in the following order: 1. reads are
extracted in specified orientation; 2. primers are removed; 3. reads longer
than `max_length` are removed; 4. reads are trimmed with `trim_right`; 5.
reads are truncated to `trunc_len`; 6. reads are trimmed with `trim_left`;
7. reads shorter than `min_length` are removed.

Parameters
----------
sequences : FeatureData[Sequence]
f_primer : Str
    forward primer sequence (5' -> 3').
r_primer : Str
    reverse primer sequence (5' -> 3'). Do not use reverse-complemented
    primer sequence.
trim_right : Int, optional
    trim_right nucleotides are removed from the 3' end if trim_right is
    positive. Applied before trunc_len and trim_left.
trunc_len : Int, optional
    read is cut to trunc_len if trunc_len is positive. Applied after
    trim_right but before trim_left.
trim_left : Int, optional
    trim_left nucleotides are removed from the 5' end if trim_left is
    positive. Applied after trim_right and trunc_len.
identity : Float, optional
    minimum combined primer match identity threshold.
min_length : Int % Range(0, None), optional
    Minimum amplicon length. Shorter amplicons are discarded. Applied after
    trimming and truncation, so be aware that trimming may impact sequence
    retention. Set to zero to disable min length filtering.
max_length : Int % Range(0, None), optional
    Maximum amplicon length. Longer amplicons are discarded. Applied before
    trimming and truncation, so plan accordingly. Set to zero (default) to
    disable max length filtering.
n_jobs : Int % Range(1, None), optional
    Number of seperate processes to run.
batch_size : Int % Range(1, None) | Str % Choices('auto'), optional
    Number of sequences to process in a batch. The `auto` option is
    calculated from the number of sequences and number of jobs specified.
read_orientation : Str % Choices('both', 'forward', 'reverse'), optional
    Orientation of primers relative to the sequences: "forward" searches
    for primer hits in the forward direction, "reverse" searches reverse-
    complement, and "both" searches both directions.

Returns
-------
reads : FeatureData[Sequence]