Fork me on GitHub

classify-sklearn: Pre-fitted sklearn-based taxonomy classifier

Citations
  • Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. Scikit-learn: machine learning in python. Journal of machine learning research, 12(Oct):2825–2830, 2011.

Docstring:

Usage: qiime feature-classifier classify-sklearn [OPTIONS]

  Classify reads by taxon using a fitted classifier.

Inputs:
  --i-reads ARTIFACT FeatureData[Sequence]
                         The feature data to be classified.         [required]
  --i-classifier ARTIFACT
    TaxonomicClassifier  The taxonomic classifier for classifying the reads.
                                                                    [required]
Parameters:
  --p-reads-per-batch VALUE Int % Range(1, None) | Str % Choices('auto')
                         Number of reads to process in each batch. If "auto",
                         this parameter is autoscaled to min( number of query
                         sequences / n-jobs, 20000).         [default: 'auto']
  --p-n-jobs NTHREADS    The maximum number of concurrent worker processes.
                         If -1 all CPUs are used. If 1 is given, no parallel
                         computing code is used at all, which is useful for
                         debugging. For n-jobs below -1, (n_cpus + 1 + n-jobs)
                         are used. Thus for n-jobs = -2, all CPUs but one are
                         used.                                    [default: 1]
  --p-pre-dispatch TEXT  "all" or expression, as in "3*n_jobs". The number of
                         batches (of tasks) to be pre-dispatched.
                                                         [default: '2*n_jobs']
  --p-confidence VALUE Float % Range(0, 1, inclusive_end=True) | Str %
    Choices('disable')   Confidence threshold for limiting taxonomic depth.
                         Set to "disable" to disable confidence calculation,
                         or 0 to calculate confidence but not apply it to
                         limit the taxonomic depth of the assignments.
                                                                [default: 0.7]
  --p-read-orientation TEXT Choices('same', 'reverse-complement', 'auto')
                         Direction of reads with respect to reference
                         sequences. same will cause reads to be classified
                         unchanged; reverse-complement will cause reads to be
                         reversed and complemented prior to classification.
                         "auto" will autodetect orientation based on the
                         confidence estimates for the first 100 reads.
                                                             [default: 'auto']
Outputs:
  --o-classification ARTIFACT FeatureData[Taxonomy]
                                                                    [required]
Miscellaneous:
  --output-dir PATH      Output unspecified results to a directory
  --verbose / --quiet    Display verbose output to stdout and/or stderr
                         during execution of this action. Or silence output if
                         execution is successful (silence is golden).
  --example-data PATH    Write example data and exit.
  --citations            Show citations and exit.
  --help                 Show this message and exit.

Import:

from qiime2.plugins.feature_classifier.methods import classify_sklearn

Docstring:

Pre-fitted sklearn-based taxonomy classifier

Classify reads by taxon using a fitted classifier.

Parameters
----------
reads : FeatureData[Sequence]
    The feature data to be classified.
classifier : TaxonomicClassifier
    The taxonomic classifier for classifying the reads.
reads_per_batch : Int % Range(1, None) | Str % Choices('auto'), optional
    Number of reads to process in each batch. If "auto", this parameter is
    autoscaled to min( number of query sequences / n_jobs, 20000).
n_jobs : Threads, optional
    The maximum number of concurrent worker processes. If -1 all CPUs are
    used. If 1 is given, no parallel computing code is used at all, which
    is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are
    used. Thus for n_jobs = -2, all CPUs but one are used.
pre_dispatch : Str, optional
    "all" or expression, as in "3*n_jobs". The number of batches (of tasks)
    to be pre-dispatched.
confidence : Float % Range(0, 1, inclusive_end=True) | Str % Choices('disable'), optional
    Confidence threshold for limiting taxonomic depth. Set to "disable" to
    disable confidence calculation, or 0 to calculate confidence but not
    apply it to limit the taxonomic depth of the assignments.
read_orientation : Str % Choices('same', 'reverse-complement', 'auto'), optional
    Direction of reads with respect to reference sequences. same will cause
    reads to be classified unchanged; reverse-complement will cause reads
    to be reversed and complemented prior to classification. "auto" will
    autodetect orientation based on the confidence estimates for the first
    100 reads.

Returns
-------
classification : FeatureData[Taxonomy]