Warning
This site has been replaced by the new QIIME 2 “amplicon distribution” documentation, as of the 2025.4 release of QIIME 2. You can still access the content from the “old docs” here for the QIIME 2 2024.10 and earlier releases, but we recommend that you transition to the new documentation at https://amplicon-docs.qiime2.org. Content on this site is no longer updated and may be out of date.
Are you looking for:
the QIIME 2 homepage? That’s https://qiime2.org.
learning resources for microbiome marker gene (i.e., amplicon) analysis? See the QIIME 2 amplicon distribution documentation.
learning resources for microbiome metagenome analysis? See the MOSHPIT documentation.
installation instructions, plugins, books, videos, workshops, or resources? See the QIIME 2 Library.
general help? See the QIIME 2 Forum.
Old content beyond this point… 👴👵
evaluate-fit-classifier: Evaluate and train naive Bayes classifier on reference sequences.¶
Citations |
|
---|
Docstring:
Usage: qiime rescript evaluate-fit-classifier [OPTIONS] Train a naive Bayes classifier on a set of reference sequences, then test performance accuracy on this same set of sequences. This results in a "perfect" classifier that "knows" the correct identity of each input sequence. Such a leaky classifier indicates the upper limit of classification accuracy based on sequence information alone, as misclassifications are an indication of unresolvable kmer profiles. This test simulates the case where all query sequences are present in a fully comprehensive reference database. To simulate more realistic conditions, see `evaluate_cross_validate`. THE CLASSIFIER OUTPUT BY THIS PIPELINE IS PRODUCTION-READY and can be re-used for classification of other sequences (provided the reference data are viable), hence THIS PIPELINE IS USEFUL FOR TRAINING FEATURE CLASSIFIERS AND THEN EVALUATING THEM ON-THE-FLY. Inputs: --i-sequences ARTIFACT FeatureData[Sequence] Reference sequences to use for classifier training/testing. [required] --i-taxonomy ARTIFACT FeatureData[Taxonomy] Reference taxonomy to use for classifier training/testing. [required] Parameters: --p-reads-per-batch VALUE Int % Range(1, None) | Str % Choices('auto') Number of reads to process in each batch. If "auto", this parameter is autoscaled to min( number of query sequences / n-jobs, 20000). [default: 'auto'] --p-n-jobs NTHREADS The maximum number of concurrent worker processes. If 0 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. [default: 1] --p-confidence VALUE Float % Range(0, 1, inclusive_end=True) | Str % Choices('disable') Confidence threshold for limiting taxonomic depth. Set to "disable" to disable confidence calculation, or 0 to calculate confidence but not apply it to limit the taxonomic depth of the assignments. [default: 0.7] Outputs: --o-classifier ARTIFACT Trained naive Bayes taxonomic classifier. TaxonomicClassifier [required] --o-evaluation VISUALIZATION Visualization of classification accuracy results. [required] --o-observed-taxonomy ARTIFACT FeatureData[Taxonomy] Observed taxonomic label for each input sequence, predicted by the trained classifier. [required] Miscellaneous: --output-dir PATH Output unspecified results to a directory --verbose / --quiet Display verbose output to stdout and/or stderr during execution of this action. Or silence output if execution is successful (silence is golden). --recycle-pool TEXT Use a cache pool for pipeline resumption. QIIME 2 will cache your results in this pool for reuse by future invocations. These pool are retained until deleted by the user. If not provided, QIIME 2 will create a pool which is automatically reused by invocations of the same action and removed if the action is successful. Note: these pools are local to the cache you are using. --no-recycle Do not recycle results from a previous failed pipeline run or save the results from this run for future recycling. --parallel Execute your action in parallel. This flag will use your default parallel config. --parallel-config FILE Execute your action in parallel using a config at the indicated path. --example-data PATH Write example data and exit. --citations Show citations and exit. --use-cache DIRECTORY Specify the cache to be used for the intermediate work of this action. If not provided, the default cache under $TMP/qiime2/will be used. IMPORTANT FOR HPC USERS: If you are on an HPC system and are using parallel execution it is important to set this to a location that is globally accessible to all nodes in the cluster. --help Show this message and exit.
Import:
from qiime2.plugins.rescript.pipelines import evaluate_fit_classifier
Docstring:
Evaluate and train naive Bayes classifier on reference sequences. Train a naive Bayes classifier on a set of reference sequences, then test performance accuracy on this same set of sequences. This results in a "perfect" classifier that "knows" the correct identity of each input sequence. Such a leaky classifier indicates the upper limit of classification accuracy based on sequence information alone, as misclassifications are an indication of unresolvable kmer profiles. This test simulates the case where all query sequences are present in a fully comprehensive reference database. To simulate more realistic conditions, see `evaluate_cross_validate`. THE CLASSIFIER OUTPUT BY THIS PIPELINE IS PRODUCTION-READY and can be re-used for classification of other sequences (provided the reference data are viable), hence THIS PIPELINE IS USEFUL FOR TRAINING FEATURE CLASSIFIERS AND THEN EVALUATING THEM ON-THE-FLY. Parameters ---------- sequences : FeatureData[Sequence] Reference sequences to use for classifier training/testing. taxonomy : FeatureData[Taxonomy] Reference taxonomy to use for classifier training/testing. reads_per_batch : Int % Range(1, None) | Str % Choices('auto'), optional Number of reads to process in each batch. If "auto", this parameter is autoscaled to min( number of query sequences / n_jobs, 20000). n_jobs : Threads, optional The maximum number of concurrent worker processes. If 0 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. confidence : Float % Range(0, 1, inclusive_end=True) | Str % Choices('disable'), optional Confidence threshold for limiting taxonomic depth. Set to "disable" to disable confidence calculation, or 0 to calculate confidence but not apply it to limit the taxonomic depth of the assignments. Returns ------- classifier : TaxonomicClassifier Trained naive Bayes taxonomic classifier. evaluation : Visualization Visualization of classification accuracy results. observed_taxonomy : FeatureData[Taxonomy] Observed taxonomic label for each input sequence, predicted by the trained classifier.