Docstring:
Usage: qiime rescript evaluate-fit-classifier [OPTIONS]
Train a naive Bayes classifier on a set of reference sequences, then test
performance accuracy on this same set of sequences. This results in a
"perfect" classifier that "knows" the correct identity of each input
sequence. Such a leaky classifier indicates the upper limit of
classification accuracy based on sequence information alone, as
misclassifications are an indication of unresolvable kmer profiles. This
test simulates the case where all query sequences are present in a fully
comprehensive reference database. To simulate more realistic conditions, see
`evaluate_cross_validate`. THE CLASSIFIER OUTPUT BY THIS PIPELINE IS
PRODUCTION-READY and can be re-used for classification of other sequences
(provided the reference data are viable), hence THIS PIPELINE IS USEFUL FOR
TRAINING FEATURE CLASSIFIERS AND THEN EVALUATING THEM ON-THE-FLY.
Inputs:
--i-sequences ARTIFACT FeatureData[Sequence]
Reference sequences to use for classifier
training/testing. [required]
--i-taxonomy ARTIFACT FeatureData[Taxonomy]
Reference taxonomy to use for classifier
training/testing. [required]
Parameters:
--p-reads-per-batch VALUE Int % Range(1, None) | Str % Choices('auto')
Number of reads to process in each batch. If
"auto", this parameter is autoscaled to min( number
of query sequences / n-jobs, 20000).
[default: 'auto']
--p-n-jobs NTHREADS The maximum number of concurrent worker processes.
If 0 all CPUs are used. If 1 is given, no parallel
computing code is used at all, which is useful for
debugging. [default: 1]
--p-confidence VALUE Float % Range(0, 1, inclusive_end=True) | Str %
Choices('disable') Confidence threshold for limiting taxonomic depth.
Set to "disable" to disable confidence calculation,
or 0 to calculate confidence but not apply it to
limit the taxonomic depth of the assignments.
[default: 0.7]
Outputs:
--o-classifier ARTIFACT Trained naive Bayes taxonomic classifier.
TaxonomicClassifier [required]
--o-evaluation VISUALIZATION
Visualization of classification accuracy results.
[required]
--o-observed-taxonomy ARTIFACT FeatureData[Taxonomy]
Observed taxonomic label for each input sequence,
predicted by the trained classifier. [required]
Miscellaneous:
--output-dir PATH Output unspecified results to a directory
--verbose / --quiet Display verbose output to stdout and/or stderr
during execution of this action. Or silence output
if execution is successful (silence is golden).
--recycle-pool TEXT Use a cache pool for pipeline resumption. QIIME 2
will cache your results in this pool for reuse by
future invocations. These pool are retained until
deleted by the user. If not provided, QIIME 2 will
create a pool which is automatically reused by
invocations of the same action and removed if the
action is successful. Note: these pools are local to
the cache you are using.
--no-recycle Do not recycle results from a previous failed
pipeline run or save the results from this run for
future recycling.
--parallel Execute your action in parallel. This flag will use
your default parallel config.
--parallel-config FILE Execute your action in parallel using a config at
the indicated path.
--example-data PATH Write example data and exit.
--citations Show citations and exit.
--use-cache DIRECTORY Specify the cache to be used for the intermediate
work of this action. If not provided, the default
cache under $TMP/qiime2/ will be used.
IMPORTANT FOR HPC USERS: If you are on an HPC system
and are using parallel execution it is important to
set this to a location that is globally accessible
to all nodes in the cluster.
--help Show this message and exit.
Import:
from qiime2.plugins.rescript.pipelines import evaluate_fit_classifier
Docstring:
Evaluate and train naive Bayes classifier on reference sequences.
Train a naive Bayes classifier on a set of reference sequences, then test
performance accuracy on this same set of sequences. This results in a
"perfect" classifier that "knows" the correct identity of each input
sequence. Such a leaky classifier indicates the upper limit of
classification accuracy based on sequence information alone, as
misclassifications are an indication of unresolvable kmer profiles. This
test simulates the case where all query sequences are present in a fully
comprehensive reference database. To simulate more realistic conditions,
see `evaluate_cross_validate`. THE CLASSIFIER OUTPUT BY THIS PIPELINE IS
PRODUCTION-READY and can be re-used for classification of other sequences
(provided the reference data are viable), hence THIS PIPELINE IS USEFUL FOR
TRAINING FEATURE CLASSIFIERS AND THEN EVALUATING THEM ON-THE-FLY.
Parameters
----------
sequences : FeatureData[Sequence]
Reference sequences to use for classifier training/testing.
taxonomy : FeatureData[Taxonomy]
Reference taxonomy to use for classifier training/testing.
reads_per_batch : Int % Range(1, None) | Str % Choices('auto'), optional
Number of reads to process in each batch. If "auto", this parameter is
autoscaled to min( number of query sequences / n_jobs, 20000).
n_jobs : Threads, optional
The maximum number of concurrent worker processes. If 0 all CPUs are
used. If 1 is given, no parallel computing code is used at all, which
is useful for debugging.
confidence : Float % Range(0, 1, inclusive_end=True) | Str % Choices('disable'), optional
Confidence threshold for limiting taxonomic depth. Set to "disable" to
disable confidence calculation, or 0 to calculate confidence but not
apply it to limit the taxonomic depth of the assignments.
Returns
-------
classifier : TaxonomicClassifier
Trained naive Bayes taxonomic classifier.
evaluation : Visualization
Visualization of classification accuracy results.
observed_taxonomy : FeatureData[Taxonomy]
Observed taxonomic label for each input sequence, predicted by the
trained classifier.