Fork me on GitHub

classify-sklearn: Pre-fitted sklearn-based taxonomy classifier

Citations
  • Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. Scikit-learn: machine learning in python. Journal of machine learning research, 12(Oct):2825–2830, 2011.

Docstring:

Usage: qiime feature-classifier classify-sklearn [OPTIONS]

  Classify reads by taxon using a fitted classifier.

Inputs:
  --i-reads ARTIFACT FeatureData[Sequence]
                         The feature data to be classified.         [required]
  --i-classifier ARTIFACT
    TaxonomicClassifier  The taxonomic classifier for classifying the reads.
                                                                    [required]
Parameters:
  --p-reads-per-batch VALUE Int % Range(1, None) | Str % Choices('auto')
                         Number of reads to process in each batch. If "auto",
                         this parameter is autoscaled to min( number of query
                         sequences / n-jobs, 20000).         [default: 'auto']
  --p-n-jobs NTHREADS    The maximum number of concurrent worker processes.
                         If 0 all CPUs are used. If 1 is given, no parallel
                         computing code is used at all, which is useful for
                         debugging.                               [default: 1]
  --p-pre-dispatch TEXT  "all" or expression, as in "3*n_jobs". The number of
                         batches (of tasks) to be pre-dispatched.
                                                         [default: '2*n_jobs']
  --p-confidence VALUE Float % Range(0, 1, inclusive_end=True) | Str %
    Choices('disable')   Confidence threshold for limiting taxonomic depth.
                         Set to "disable" to disable confidence calculation,
                         or 0 to calculate confidence but not apply it to
                         limit the taxonomic depth of the assignments.
                                                                [default: 0.7]
  --p-read-orientation TEXT Choices('same', 'reverse-complement', 'auto')
                         Direction of reads with respect to reference
                         sequences. same will cause reads to be classified
                         unchanged; reverse-complement will cause reads to be
                         reversed and complemented prior to classification.
                         "auto" will autodetect orientation based on the
                         confidence estimates for the first 100 reads.
                                                             [default: 'auto']
Outputs:
  --o-classification ARTIFACT FeatureData[Taxonomy]
                                                                    [required]
Miscellaneous:
  --output-dir PATH      Output unspecified results to a directory
  --verbose / --quiet    Display verbose output to stdout and/or stderr
                         during execution of this action. Or silence output if
                         execution is successful (silence is golden).
  --example-data PATH    Write example data and exit.
  --citations            Show citations and exit.
  --use-cache DIRECTORY  Specify the cache to be used for the intermediate
                         work of this action. If not provided, the default
                         cache under $TMP/qiime2/ will be used.
                         IMPORTANT FOR HPC USERS: If you are on an HPC system
                         and are using parallel execution it is important to
                         set this to a location that is globally accessible to
                         all nodes in the cluster.
  --help                 Show this message and exit.

Import:

from qiime2.plugins.feature_classifier.methods import classify_sklearn

Docstring:

Pre-fitted sklearn-based taxonomy classifier

Classify reads by taxon using a fitted classifier.

Parameters
----------
reads : FeatureData[Sequence]
    The feature data to be classified.
classifier : TaxonomicClassifier
    The taxonomic classifier for classifying the reads.
reads_per_batch : Int % Range(1, None) | Str % Choices('auto'), optional
    Number of reads to process in each batch. If "auto", this parameter is
    autoscaled to min( number of query sequences / n_jobs, 20000).
n_jobs : Threads, optional
    The maximum number of concurrent worker processes. If 0 all CPUs are
    used. If 1 is given, no parallel computing code is used at all, which
    is useful for debugging.
pre_dispatch : Str, optional
    "all" or expression, as in "3*n_jobs". The number of batches (of tasks)
    to be pre-dispatched.
confidence : Float % Range(0, 1, inclusive_end=True) | Str % Choices('disable'), optional
    Confidence threshold for limiting taxonomic depth. Set to "disable" to
    disable confidence calculation, or 0 to calculate confidence but not
    apply it to limit the taxonomic depth of the assignments.
read_orientation : Str % Choices('same', 'reverse-complement', 'auto'), optional
    Direction of reads with respect to reference sequences. same will cause
    reads to be classified unchanged; reverse-complement will cause reads
    to be reversed and complemented prior to classification. "auto" will
    autodetect orientation based on the confidence estimates for the first
    100 reads.

Returns
-------
classification : FeatureData[Taxonomy]