Fork me on GitHub

exclude-seqs: Exclude sequences by alignmentΒΆ

Citations

[quality-control:exclude-seqs:CCA+09]Christiam Camacho, George Coulouris, Vahram Avagyan, Ning Ma, Jason Papadopoulos, Kevin Bealer, and Thomas L Madden. Blast+: architecture and applications. BMC bioinformatics, 10(1):421, 2009. doi:10.1186/1471-2105-10-421.

Docstring:

Usage: qiime quality-control exclude-seqs [OPTIONS]

  This method aligns feature sequences to a set of reference sequences to
  identify sequences that hit/miss the reference within a specified
  perc_identity, evalue, and perc_query_aligned. This method could be used
  to define a positive filter, e.g., extract only feature sequences that
  align to a certain clade of bacteria; or to define a negative filter,
  e.g., identify sequences that align to contaminant or human DNA sequences
  that should be excluded from subsequent analyses. Note that filtering is
  performed based on the perc_identity, perc_query_aligned, and evalue
  thresholds (the latter only if method==BLAST and an evalue is set). Set
  perc_identity==0 and/or perc_query_aligned==0 to disable these filtering
  thresholds as necessary.

Options:
  --i-query-sequences ARTIFACT PATH FeatureData[Sequence]
                                  Sequences to test for exclusion  [required]
  --i-reference-sequences ARTIFACT PATH FeatureData[Sequence]
                                  Reference sequences to align against feature
                                  sequences  [required]
  --p-method [vsearch|blastn-short|blast]
                                  Alignment method to use for matching feature
                                  sequences against reference sequences
                                  [default: blast]
  --p-perc-identity FLOAT         Reject match if percent identity to
                                  reference is lower. Must be in range [0.0,
                                  1.0]  [default: 0.97]
  --p-evalue FLOAT                BLAST expectation (E) value threshold for
                                  saving hits. Reject if E value is higher
                                  than threshold. This threshold is disabled
                                  by default.  [optional]
  --p-perc-query-aligned FLOAT    Percent of query sequence that must align to
                                  reference in order to be accepted as a hit.
                                  [default: 0.97]
  --p-threads INTEGER RANGE       Number of jobs to execute. Only applies to
                                  vsearch method.  [default: 1]
  --o-sequence-hits ARTIFACT PATH FeatureData[Sequence]
                                  Subset of feature sequences that align to
                                  reference sequences  [required if not
                                  passing --output-dir]
  --o-sequence-misses ARTIFACT PATH FeatureData[Sequence]
                                  Subset of feature sequences that do not
                                  align to reference sequences  [required if
                                  not passing --output-dir]
  --output-dir DIRECTORY          Output unspecified results to a directory
  --cmd-config FILE               Use config file for command options
  --verbose                       Display verbose output to stdout and/or
                                  stderr during execution of this action.
                                  [default: False]
  --quiet                         Silence output if execution is successful
                                  (silence is golden).  [default: False]
  --citations                     Show citations and exit.
  --help                          Show this message and exit.

Import:

from qiime2.plugins.quality_control.methods import exclude_seqs

Docstring:

Exclude sequences by alignment

This method aligns feature sequences to a set of reference sequences to
identify sequences that hit/miss the reference within a specified
perc_identity, evalue, and perc_query_aligned. This method could be used to
define a positive filter, e.g., extract only feature sequences that align
to a certain clade of bacteria; or to define a negative filter, e.g.,
identify sequences that align to contaminant or human DNA sequences that
should be excluded from subsequent analyses. Note that filtering is
performed based on the perc_identity, perc_query_aligned, and evalue
thresholds (the latter only if method==BLAST and an evalue is set). Set
perc_identity==0 and/or perc_query_aligned==0 to disable these filtering
thresholds as necessary.

Parameters
----------
query_sequences : FeatureData[Sequence]
    Sequences to test for exclusion
reference_sequences : FeatureData[Sequence]
    Reference sequences to align against feature sequences
method : Str % Choices({'blast', 'blastn-short', 'vsearch'}), optional
    Alignment method to use for matching feature sequences against
    reference sequences
perc_identity : Float % Range(0.0, 1.0, inclusive_end=True), optional
    Reject match if percent identity to reference is lower. Must be in
    range [0.0, 1.0]
evalue : Float, optional
    BLAST expectation (E) value threshold for saving hits. Reject if E
    value is higher than threshold. This threshold is disabled by default.
perc_query_aligned : Float, optional
    Percent of query sequence that must align to reference in order to be
    accepted as a hit.
threads : Int % Range(1, None), optional
    Number of jobs to execute. Only applies to vsearch method.

Returns
-------
sequence_hits : FeatureData[Sequence]
    Subset of feature sequences that align to reference sequences
sequence_misses : FeatureData[Sequence]
    Subset of feature sequences that do not align to reference sequences