Fork me on GitHub

classify-hybrid-vsearch-sklearn: ALPHA Hybrid classifier: VSEARCH exact match + sklearn classifierΒΆ

Docstring:

Usage: qiime feature-classifier classify-hybrid-vsearch-sklearn
           [OPTIONS]

  NOTE: THIS PIPELINE IS AN ALPHA RELEASE. Please report bugs to
  https://forum.qiime2.org! Assign taxonomy to query sequences using hybrid
  classifier. First performs rough positive filter to remove artifact and
  low-coverage sequences (use "prefilter" parameter to toggle this step on
  or off). Second, performs VSEARCH exact match between query and
  reference_reads to find exact matches, followed by least common ancestor
  consensus taxonomy assignment from among maxaccepts top hits,
  min_consensus of which share that taxonomic assignment. Query sequences
  without an exact match are then classified with a pre-trained sklearn
  taxonomy classifier to predict the most likely taxonomic lineage.

Inputs:
  --i-query ARTIFACT FeatureData[Sequence]
                        Sequences to classify taxonomically.        [required]
  --i-reference-reads ARTIFACT FeatureData[Sequence]
                        reference sequences.                        [required]
  --i-reference-taxonomy ARTIFACT FeatureData[Taxonomy]
                        reference taxonomy labels.                  [required]
  --i-classifier ARTIFACT TaxonomicClassifier
                        Pre-trained sklearn taxonomic classifier for
                        classifying the reads.                      [required]
Parameters:
  --p-maxaccepts VALUE Int % Range(1, None) | Str % Choices('all')
                        Maximum number of hits to keep for each query. Set to
                        "all" to keep all hits > perc-identity similarity.
                                                                 [default: 10]
  --p-perc-identity PROPORTION Range(0.0, 1.0, inclusive_end=True)
                        Percent sequence similarity to use for PREFILTER.
                        Reject match if percent identity to query is lower.
                        Set to a lower value to perform a rough pre-filter.
                        This parameter is ignored if `prefilter` is disabled.
                                                                [default: 0.5]
  --p-query-cov PROPORTION Range(0.0, 1.0, inclusive_end=True)
                        Query coverage threshold to use for PREFILTER. Reject
                        match if query alignment coverage per high-scoring
                        pair is lower. Set to a lower value to perform a rough
                        pre-filter. This parameter is ignored if `prefilter`
                        is disabled.                            [default: 0.8]
  --p-strand TEXT Choices('both', 'plus')
                        Align against reference sequences in forward ("plus")
                        or both directions ("both").         [default: 'both']
  --p-min-consensus NUMBER Range(0.5, 1.0, inclusive_start=False,
    inclusive_end=True) Minimum fraction of assignments must match top hit to
                        be accepted as consensus assignment.   [default: 0.51]
  --p-reads-per-batch INTEGER
    Range(0, None)      Number of reads to process in each batch for sklearn
                        classification. If "auto", this parameter is
                        autoscaled to min(number of query sequences / threads,
                        20000).                                   [default: 0]
  --p-confidence VALUE Float % Range(0, 1, inclusive_end=True) | Str %
    Choices('disable')  Confidence threshold for limiting taxonomic depth.
                        Set to "disable" to disable confidence calculation, or
                        0 to calculate confidence but not apply it to limit
                        the taxonomic depth of the assignments. [default: 0.7]
  --p-read-orientation TEXT Choices('same', 'reverse-complement', 'auto')
                        Direction of reads with respect to reference
                        sequences in pre-trained sklearn classifier. same will
                        cause reads to be classified unchanged;
                        reverse-complement will cause reads to be reversed and
                        complemented prior to classification. "auto" will
                        autodetect orientation based on the confidence
                        estimates for the first 100 reads.   [default: 'auto']
  --p-threads INTEGER   Number of threads to use for job parallelization.
    Range(1, None)                                                [default: 1]
  --p-prefilter / --p-no-prefilter
                        Toggle positive filter of query sequences on or off.
                                                               [default: True]
  --p-sample-size INTEGER
    Range(1, None)      Randomly extract the given number of sequences from
                        the reference database to use for prefiltering. This
                        parameter is ignored if `prefilter` is disabled.
                                                               [default: 1000]
  --p-randseed INTEGER  Use integer as a seed for the pseudo-random generator
    Range(0, None)      used during prefiltering. A given seed always produces
                        the same output, which is useful for replicability.
                        Set to 0 to use a pseudo-random seed. This parameter
                        is ignored if `prefilter` is disabled.    [default: 0]
Outputs:
  --o-classification ARTIFACT FeatureData[Taxonomy]
                        The resulting taxonomy classifications.     [required]
Miscellaneous:
  --output-dir PATH     Output unspecified results to a directory
  --verbose / --quiet   Display verbose output to stdout and/or stderr during
                        execution of this action. Or silence output if
                        execution is successful (silence is golden).
  --citations           Show citations and exit.
  --help                Show this message and exit.

Import:

from qiime2.plugins.feature_classifier.pipelines import classify_hybrid_vsearch_sklearn

Docstring:

ALPHA Hybrid classifier: VSEARCH exact match + sklearn classifier

NOTE: THIS PIPELINE IS AN ALPHA RELEASE. Please report bugs to
https://forum.qiime2.org! Assign taxonomy to query sequences using hybrid
classifier. First performs rough positive filter to remove artifact and
low-coverage sequences (use "prefilter" parameter to toggle this step on or
off). Second, performs VSEARCH exact match between query and
reference_reads to find exact matches, followed by least common ancestor
consensus taxonomy assignment from among maxaccepts top hits, min_consensus
of which share that taxonomic assignment. Query sequences without an exact
match are then classified with a pre-trained sklearn taxonomy classifier to
predict the most likely taxonomic lineage.

Parameters
----------
query : FeatureData[Sequence]
    Sequences to classify taxonomically.
reference_reads : FeatureData[Sequence]
    reference sequences.
reference_taxonomy : FeatureData[Taxonomy]
    reference taxonomy labels.
classifier : TaxonomicClassifier
    Pre-trained sklearn taxonomic classifier for classifying the reads.
maxaccepts : Int % Range(1, None) | Str % Choices('all'), optional
    Maximum number of hits to keep for each query. Set to "all" to keep all
    hits > perc_identity similarity.
perc_identity : Float % Range(0.0, 1.0, inclusive_end=True), optional
    Percent sequence similarity to use for PREFILTER. Reject match if
    percent identity to query is lower. Set to a lower value to perform a
    rough pre-filter. This parameter is ignored if `prefilter` is disabled.
query_cov : Float % Range(0.0, 1.0, inclusive_end=True), optional
    Query coverage threshold to use for PREFILTER. Reject match if query
    alignment coverage per high-scoring pair is lower. Set to a lower value
    to perform a rough pre-filter. This parameter is ignored if `prefilter`
    is disabled.
strand : Str % Choices('both', 'plus'), optional
    Align against reference sequences in forward ("plus") or both
    directions ("both").
min_consensus : Float % Range(0.5, 1.0, inclusive_start=False, inclusive_end=True), optional
    Minimum fraction of assignments must match top hit to be accepted as
    consensus assignment.
reads_per_batch : Int % Range(0, None), optional
    Number of reads to process in each batch for sklearn classification. If
    "auto", this parameter is autoscaled to min(number of query sequences /
    threads, 20000).
confidence : Float % Range(0, 1, inclusive_end=True) | Str % Choices('disable'), optional
    Confidence threshold for limiting taxonomic depth. Set to "disable" to
    disable confidence calculation, or 0 to calculate confidence but not
    apply it to limit the taxonomic depth of the assignments.
read_orientation : Str % Choices('same', 'reverse-complement', 'auto'), optional
    Direction of reads with respect to reference sequences in pre-trained
    sklearn classifier. same will cause reads to be classified unchanged;
    reverse-complement will cause reads to be reversed and complemented
    prior to classification. "auto" will autodetect orientation based on
    the confidence estimates for the first 100 reads.
threads : Int % Range(1, None), optional
    Number of threads to use for job parallelization.
prefilter : Bool, optional
    Toggle positive filter of query sequences on or off.
sample_size : Int % Range(1, None), optional
    Randomly extract the given number of sequences from the reference
    database to use for prefiltering. This parameter is ignored if
    `prefilter` is disabled.
randseed : Int % Range(0, None), optional
    Use integer as a seed for the pseudo-random generator used during
    prefiltering. A given seed always produces the same output, which is
    useful for replicability. Set to 0 to use a pseudo-random seed. This
    parameter is ignored if `prefilter` is disabled.

Returns
-------
classification : FeatureData[Taxonomy]
    The resulting taxonomy classifications.