Fork me on GitHub

exclude-seqs: Exclude sequences by alignment

Citations
  • Christiam Camacho, George Coulouris, Vahram Avagyan, Ning Ma, Jason Papadopoulos, Kevin Bealer, and Thomas L Madden. Blast+: architecture and applications. BMC bioinformatics, 10(1):421, 2009. doi:10.1186/1471-2105-10-421.

Docstring:

Usage: qiime quality-control exclude-seqs [OPTIONS]

  This method aligns feature sequences to a set of reference sequences to
  identify sequences that hit/miss the reference within a specified
  perc_identity, evalue, and perc_query_aligned. This method could be used to
  define a positive filter, e.g., extract only feature sequences that align to
  a certain clade of bacteria; or to define a negative filter, e.g., identify
  sequences that align to contaminant or human DNA sequences that should be
  excluded from subsequent analyses. Note that filtering is performed based on
  the perc_identity, perc_query_aligned, and evalue thresholds (the latter
  only if method==BLAST and an evalue is set). Set perc_identity==0 and/or
  perc_query_aligned==0 to disable these filtering thresholds as necessary.

Inputs:
  --i-query-sequences ARTIFACT FeatureData[Sequence]
                          Sequences to test for exclusion           [required]
  --i-reference-sequences ARTIFACT FeatureData[Sequence]
                          Reference sequences to align against feature
                          sequences                                 [required]
Parameters:
  --p-method VALUE Str % Choices('blast', 'blastn-short')¹ | Str %
    Choices('vsearch')²   Alignment method to use for matching feature
                          sequences against reference sequences
                                                            [default: 'blast']
  --p-perc-identity PROPORTION Range(0.0, 1.0, inclusive_end=True)
                          Reject match if percent identity to reference is
                          lower. Must be in range [0.0, 1.0]   [default: 0.97]
  --p-evalue NUMBER       BLAST expectation (E) value threshold for saving
                          hits. Reject if E value is higher than threshold.
                          This threshold is disabled by default.    [optional]
  --p-perc-query-aligned NUMBER
                          Percent of query sequence that must align to
                          reference in order to be accepted as a hit.
                                                               [default: 0.97]
  --p-threads NTHREADS    Number of threads to use. Only applies to vsearch
                          method.                                 [default: 1]
  --p-left-justify VALUE Bool % Choices(False)¹ | Bool²
                          Reject match if the pairwise alignment begins with
                          gaps                                [default: False]
Outputs:
  --o-sequence-hits ARTIFACT FeatureData[Sequence]
                          Subset of feature sequences that align to reference
                          sequences                                 [required]
  --o-sequence-misses ARTIFACT FeatureData[Sequence]
                          Subset of feature sequences that do not align to
                          reference sequences                       [required]
Miscellaneous:
  --output-dir PATH       Output unspecified results to a directory
  --verbose / --quiet     Display verbose output to stdout and/or stderr
                          during execution of this action. Or silence output
                          if execution is successful (silence is golden).
  --example-data PATH     Write example data and exit.
  --citations             Show citations and exit.
  --help                  Show this message and exit.

Import:

from qiime2.plugins.quality_control.methods import exclude_seqs

Docstring:

Exclude sequences by alignment

This method aligns feature sequences to a set of reference sequences to
identify sequences that hit/miss the reference within a specified
perc_identity, evalue, and perc_query_aligned. This method could be used to
define a positive filter, e.g., extract only feature sequences that align
to a certain clade of bacteria; or to define a negative filter, e.g.,
identify sequences that align to contaminant or human DNA sequences that
should be excluded from subsequent analyses. Note that filtering is
performed based on the perc_identity, perc_query_aligned, and evalue
thresholds (the latter only if method==BLAST and an evalue is set). Set
perc_identity==0 and/or perc_query_aligned==0 to disable these filtering
thresholds as necessary.

Parameters
----------
query_sequences : FeatureData[Sequence]
    Sequences to test for exclusion
reference_sequences : FeatureData[Sequence]
    Reference sequences to align against feature sequences
method : Str % Choices('blast', 'blastn-short')¹ | Str % Choices('vsearch')², optional
    Alignment method to use for matching feature sequences against
    reference sequences
perc_identity : Float % Range(0.0, 1.0, inclusive_end=True), optional
    Reject match if percent identity to reference is lower. Must be in
    range [0.0, 1.0]
evalue : Float, optional
    BLAST expectation (E) value threshold for saving hits. Reject if E
    value is higher than threshold. This threshold is disabled by default.
perc_query_aligned : Float, optional
    Percent of query sequence that must align to reference in order to be
    accepted as a hit.
threads : Threads, optional
    Number of threads to use. Only applies to vsearch method.
left_justify : Bool % Choices(False)¹ | Bool², optional
    Reject match if the pairwise alignment begins with gaps

Returns
-------
sequence_hits : FeatureData[Sequence]
    Subset of feature sequences that align to reference sequences
sequence_misses : FeatureData[Sequence]
    Subset of feature sequences that do not align to reference sequences