Fork me on GitHub

classify-consensus-blast: BLAST+ consensus taxonomy classifier

Citations
  • Christiam Camacho, George Coulouris, Vahram Avagyan, Ning Ma, Jason Papadopoulos, Kevin Bealer, and Thomas L Madden. Blast+: architecture and applications. BMC bioinformatics, 10(1):421, 2009. doi:10.1186/1471-2105-10-421.

Docstring:

Usage: qiime feature-classifier classify-consensus-blast [OPTIONS]

  Assign taxonomy to query sequences using BLAST+. Performs BLAST+ local
  alignment between query and reference_reads, then assigns consensus taxonomy
  to each query sequence from among maxaccepts hits, min_consensus of which
  share that taxonomic assignment. Note that maxaccepts selects the first N
  hits with > perc_identity similarity to query, not the top N matches. For
  top N hits, use classify-consensus-vsearch.

Inputs:
  --i-query ARTIFACT FeatureData[Sequence]
                          Query sequences.                          [required]
  --i-reference-taxonomy ARTIFACT FeatureData[Taxonomy]
                          reference taxonomy labels.                [required]
  --i-blastdb ARTIFACT    BLAST indexed database. Incompatible with
    BLASTDB               reference-reads.                          [optional]
  --i-reference-reads ARTIFACT FeatureData[Sequence]
                          Reference sequences. Incompatible with blastdb.
                                                                    [optional]
Parameters:
  --p-maxaccepts INTEGER  Maximum number of hits to keep for each query.
    Range(1, None)        BLAST will choose the first N hits in the reference
                          database that exceed perc-identity similarity to
                          query. NOTE: the database is not sorted by
                          similarity to query, so these are the first N hits
                          that pass the threshold, not necessarily the top N
                          hits.                                  [default: 10]
  --p-perc-identity PROPORTION Range(0.0, 1.0, inclusive_end=True)
                          Reject match if percent identity to query is lower.
                                                                [default: 0.8]
  --p-query-cov PROPORTION Range(0.0, 1.0, inclusive_end=True)
                          Reject match if query alignment coverage per
                          high-scoring pair is lower. Note: this uses blastn's
                          qcov_hsp_perc parameter, and may not behave
                          identically to the query-cov parameter used by
                          classify-consensus-vsearch.           [default: 0.8]
  --p-strand TEXT Choices('both', 'plus', 'minus')
                          Align against reference sequences in forward
                          ("plus"), reverse ("minus"), or both directions
                          ("both").                          [default: 'both']
  --p-evalue NUMBER       BLAST expectation value (E) threshold for saving
                          hits.                               [default: 0.001]
  --p-output-no-hits / --p-no-output-no-hits
                          Report both matching and non-matching queries.
                          WARNING: always use the default setting for this
                          option unless if you know what you are doing! If you
                          set this option to False, your sequences and feature
                          table will need to be filtered to exclude
                          unclassified sequences, otherwise you may run into
                          errors downstream from missing feature IDs. Set to
                          FALSE to mirror default BLAST search.
                                                               [default: True]
  --p-min-consensus NUMBER Range(0.5, 1.0, inclusive_start=False,
    inclusive_end=True)   Minimum fraction of assignments must match top hit
                          to be accepted as consensus assignment.
                                                               [default: 0.51]
  --p-unassignable-label TEXT
                          Annotation given to sequences without any hits.
                                                       [default: 'Unassigned']
  --p-num-threads NTHREADS
                          Number of threads (CPUs) to use in the BLAST
                          search. Pass 0 to use all available CPUs.
                                                                  [default: 1]
Outputs:
  --o-classification ARTIFACT FeatureData[Taxonomy]
                          Taxonomy classifications of query sequences.
                                                                    [required]
  --o-search-results ARTIFACT
    FeatureData[BLAST6]   Top hits for each query.                  [required]
Miscellaneous:
  --output-dir PATH       Output unspecified results to a directory
  --verbose / --quiet     Display verbose output to stdout and/or stderr
                          during execution of this action. Or silence output
                          if execution is successful (silence is golden).
  --recycle-pool TEXT     Use a cache pool for pipeline resumption. QIIME 2
                          will cache your results in this pool for reuse by
                          future invocations. These pool are retained until
                          deleted by the user. If not provided, QIIME 2 will
                          create a pool which is automatically reused by
                          invocations of the same action and removed if the
                          action is successful. Note: these pools are local to
                          the cache you are using.
  --no-recycle            Do not recycle results from a previous failed
                          pipeline run or save the results from this run for
                          future recycling.
  --parallel              Execute your action in parallel. This flag will use
                          your default parallel config.
  --parallel-config FILE  Execute your action in parallel using a config at
                          the indicated path.
  --use-cache DIRECTORY   Specify the cache to be used for the intermediate
                          work of this pipeline. If not provided, the default
                          cache under $TMP/qiime2/ will be used.
                          IMPORTANT FOR HPC USERS: If you are on an HPC system
                          and are using parallel execution it is important to
                          set this to a location that is globally accessible
                          to all nodes in the cluster.
  --example-data PATH     Write example data and exit.
  --citations             Show citations and exit.
  --help                  Show this message and exit.

Import:

from qiime2.plugins.feature_classifier.pipelines import classify_consensus_blast

Docstring:

BLAST+ consensus taxonomy classifier

Assign taxonomy to query sequences using BLAST+. Performs BLAST+ local
alignment between query and reference_reads, then assigns consensus
taxonomy to each query sequence from among maxaccepts hits, min_consensus
of which share that taxonomic assignment. Note that maxaccepts selects the
first N hits with > perc_identity similarity to query, not the top N
matches. For top N hits, use classify-consensus-vsearch.

Parameters
----------
query : FeatureData[Sequence]
    Query sequences.
reference_taxonomy : FeatureData[Taxonomy]
    reference taxonomy labels.
blastdb : BLASTDB, optional
    BLAST indexed database. Incompatible with reference_reads.
reference_reads : FeatureData[Sequence], optional
    Reference sequences. Incompatible with blastdb.
maxaccepts : Int % Range(1, None), optional
    Maximum number of hits to keep for each query. BLAST will choose the
    first N hits in the reference database that exceed perc_identity
    similarity to query. NOTE: the database is not sorted by similarity to
    query, so these are the first N hits that pass the threshold, not
    necessarily the top N hits.
perc_identity : Float % Range(0.0, 1.0, inclusive_end=True), optional
    Reject match if percent identity to query is lower.
query_cov : Float % Range(0.0, 1.0, inclusive_end=True), optional
    Reject match if query alignment coverage per high-scoring pair is
    lower. Note: this uses blastn's qcov_hsp_perc parameter, and may not
    behave identically to the query_cov parameter used by classify-
    consensus-vsearch.
strand : Str % Choices('both', 'plus', 'minus'), optional
    Align against reference sequences in forward ("plus"), reverse
    ("minus"), or both directions ("both").
evalue : Float, optional
    BLAST expectation value (E) threshold for saving hits.
output_no_hits : Bool, optional
    Report both matching and non-matching queries. WARNING: always use the
    default setting for this option unless if you know what you are doing!
    If you set this option to False, your sequences and feature table will
    need to be filtered to exclude unclassified sequences, otherwise you
    may run into errors downstream from missing feature IDs. Set to FALSE
    to mirror default BLAST search.
min_consensus : Float % Range(0.5, 1.0, inclusive_start=False, inclusive_end=True), optional
    Minimum fraction of assignments must match top hit to be accepted as
    consensus assignment.
unassignable_label : Str, optional
    Annotation given to sequences without any hits.
num_threads : Threads, optional
    Number of threads (CPUs) to use in the BLAST search. Pass 0 to use all
    available CPUs.

Returns
-------
classification : FeatureData[Taxonomy]
    Taxonomy classifications of query sequences.
search_results : FeatureData[BLAST6]
    Top hits for each query.