Fork me on GitHub

exclude-seqs: Exclude sequences by alignment

Citations
  • Christiam Camacho, George Coulouris, Vahram Avagyan, Ning Ma, Jason Papadopoulos, Kevin Bealer, and Thomas L Madden. Blast+: architecture and applications. BMC bioinformatics, 10(1):421, 2009. doi:10.1186/1471-2105-10-421.

Docstring:

Usage: qiime quality-control exclude-seqs [OPTIONS]

  This method aligns feature sequences to a set of reference sequences to
  identify sequences that hit/miss the reference within a specified
  perc_identity, evalue, and perc_query_aligned. This method could be used
  to define a positive filter, e.g., extract only feature sequences that
  align to a certain clade of bacteria; or to define a negative filter,
  e.g., identify sequences that align to contaminant or human DNA sequences
  that should be excluded from subsequent analyses. Note that filtering is
  performed based on the perc_identity, perc_query_aligned, and evalue
  thresholds (the latter only if method==BLAST and an evalue is set). Set
  perc_identity==0 and/or perc_query_aligned==0 to disable these filtering
  thresholds as necessary.

Inputs:
  --i-query-sequences ARTIFACT FeatureData[Sequence]
                       Sequences to test for exclusion              [required]
  --i-reference-sequences ARTIFACT FeatureData[Sequence]
                       Reference sequences to align against feature sequences
                                                                    [required]
Parameters:
  --p-method TEXT Choices('blast', 'vsearch', 'blastn-short')
                       Alignment method to use for matching feature sequences
                       against reference sequences          [default: 'blast']
  --p-perc-identity PROPORTION Range(0.0, 1.0, inclusive_end=True)
                       Reject match if percent identity to reference is
                       lower. Must be in range [0.0, 1.0]      [default: 0.97]
  --p-evalue NUMBER    BLAST expectation (E) value threshold for saving hits.
                       Reject if E value is higher than threshold. This
                       threshold is disabled by default.            [optional]
  --p-perc-query-aligned NUMBER
                       Percent of query sequence that must align to reference
                       in order to be accepted as a hit.       [default: 0.97]
  --p-threads INTEGER  Number of jobs to execute. Only applies to vsearch
    Range(1, None)     method.                                    [default: 1]
Outputs:
  --o-sequence-hits ARTIFACT FeatureData[Sequence]
                       Subset of feature sequences that align to reference
                       sequences                                    [required]
  --o-sequence-misses ARTIFACT FeatureData[Sequence]
                       Subset of feature sequences that do not align to
                       reference sequences                          [required]
Miscellaneous:
  --output-dir PATH    Output unspecified results to a directory
  --verbose / --quiet  Display verbose output to stdout and/or stderr during
                       execution of this action. Or silence output if
                       execution is successful (silence is golden).
  --citations          Show citations and exit.
  --help               Show this message and exit.

Import:

from qiime2.plugins.quality_control.methods import exclude_seqs

Docstring:

Exclude sequences by alignment

This method aligns feature sequences to a set of reference sequences to
identify sequences that hit/miss the reference within a specified
perc_identity, evalue, and perc_query_aligned. This method could be used to
define a positive filter, e.g., extract only feature sequences that align
to a certain clade of bacteria; or to define a negative filter, e.g.,
identify sequences that align to contaminant or human DNA sequences that
should be excluded from subsequent analyses. Note that filtering is
performed based on the perc_identity, perc_query_aligned, and evalue
thresholds (the latter only if method==BLAST and an evalue is set). Set
perc_identity==0 and/or perc_query_aligned==0 to disable these filtering
thresholds as necessary.

Parameters
----------
query_sequences : FeatureData[Sequence]
    Sequences to test for exclusion
reference_sequences : FeatureData[Sequence]
    Reference sequences to align against feature sequences
method : Str % Choices('blast', 'vsearch', 'blastn-short'), optional
    Alignment method to use for matching feature sequences against
    reference sequences
perc_identity : Float % Range(0.0, 1.0, inclusive_end=True), optional
    Reject match if percent identity to reference is lower. Must be in
    range [0.0, 1.0]
evalue : Float, optional
    BLAST expectation (E) value threshold for saving hits. Reject if E
    value is higher than threshold. This threshold is disabled by default.
perc_query_aligned : Float, optional
    Percent of query sequence that must align to reference in order to be
    accepted as a hit.
threads : Int % Range(1, None), optional
    Number of jobs to execute. Only applies to vsearch method.

Returns
-------
sequence_hits : FeatureData[Sequence]
    Subset of feature sequences that align to reference sequences
sequence_misses : FeatureData[Sequence]
    Subset of feature sequences that do not align to reference sequences