Warning
This site has been replaced by the new QIIME 2 “amplicon distribution” documentation, as of the 2025.4 release of QIIME 2. You can still access the content from the “old docs” here for the QIIME 2 2024.10 and earlier releases, but we recommend that you transition to the new documentation at https://amplicon-docs.qiime2.org. Content on this site is no longer updated and may be out of date.
Are you looking for:
the QIIME 2 homepage? That’s https://qiime2.org.
learning resources for microbiome marker gene (i.e., amplicon) analysis? See the QIIME 2 amplicon distribution documentation.
learning resources for microbiome metagenome analysis? See the MOSHPIT documentation.
installation instructions, plugins, books, videos, workshops, or resources? See the QIIME 2 Library.
general help? See the QIIME 2 Forum.
Old content beyond this point… 👴👵
evaluate-cross-validate: Evaluate DNA sequence reference database via cross-validated taxonomic classification.¶
Citations |
|
---|
Docstring:
Usage: qiime rescript evaluate-cross-validate [OPTIONS] Evaluate DNA sequence reference database via cross-validated taxonomic classification. Unique taxonomic labels are truncated to enable appropriate label stratification. See the cited reference (Bokulich et al. 2018) for more details. Inputs: --i-sequences ARTIFACT FeatureData[Sequence] Reference sequences to use for classifier training/testing. [required] --i-taxonomy ARTIFACT FeatureData[Taxonomy] Reference taxonomy to use for classifier training/testing. [required] Parameters: --p-k INTEGER Number of stratified folds. Range(2, None) [default: 3] --p-random-state INTEGER Range(0, None) Seed used by the random number generator. [default: 0] --p-reads-per-batch VALUE Int % Range(1, None) | Str % Choices('auto') Number of reads to process in each batch. If "auto", this parameter is autoscaled to min( number of query sequences / n-jobs, 20000). [default: 'auto'] --p-n-jobs NTHREADS The maximum number of concurrent worker processes. If 0 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. [default: 1] --p-confidence VALUE Float % Range(0, 1, inclusive_end=True) | Str % Choices('disable') Confidence threshold for limiting taxonomic depth. Set to "disable" to disable confidence calculation, or 0 to calculate confidence but not apply it to limit the taxonomic depth of the assignments. [default: 0.7] Outputs: --o-expected-taxonomy ARTIFACT FeatureData[Taxonomy] Expected taxonomic label for each input sequence. Taxonomic labels may be truncated due to k-fold CV and stratification. [required] --o-observed-taxonomy ARTIFACT FeatureData[Taxonomy] Observed taxonomic label for each input sequence, predicted by cross-validation. [required] --o-evaluation VISUALIZATION Visualization of cross-validated accuracy results. [required] Miscellaneous: --output-dir PATH Output unspecified results to a directory --verbose / --quiet Display verbose output to stdout and/or stderr during execution of this action. Or silence output if execution is successful (silence is golden). --recycle-pool TEXT Use a cache pool for pipeline resumption. QIIME 2 will cache your results in this pool for reuse by future invocations. These pool are retained until deleted by the user. If not provided, QIIME 2 will create a pool which is automatically reused by invocations of the same action and removed if the action is successful. Note: these pools are local to the cache you are using. --no-recycle Do not recycle results from a previous failed pipeline run or save the results from this run for future recycling. --parallel Execute your action in parallel. This flag will use your default parallel config. --parallel-config FILE Execute your action in parallel using a config at the indicated path. --example-data PATH Write example data and exit. --citations Show citations and exit. --use-cache DIRECTORY Specify the cache to be used for the intermediate work of this action. If not provided, the default cache under $TMP/qiime2/will be used. IMPORTANT FOR HPC USERS: If you are on an HPC system and are using parallel execution it is important to set this to a location that is globally accessible to all nodes in the cluster. --help Show this message and exit.
Import:
from qiime2.plugins.rescript.pipelines import evaluate_cross_validate
Docstring:
Evaluate DNA sequence reference database via cross-validated taxonomic classification. Evaluate DNA sequence reference database via cross-validated taxonomic classification. Unique taxonomic labels are truncated to enable appropriate label stratification. See the cited reference (Bokulich et al. 2018) for more details. Parameters ---------- sequences : FeatureData[Sequence] Reference sequences to use for classifier training/testing. taxonomy : FeatureData[Taxonomy] Reference taxonomy to use for classifier training/testing. k : Int % Range(2, None), optional Number of stratified folds. random_state : Int % Range(0, None), optional Seed used by the random number generator. reads_per_batch : Int % Range(1, None) | Str % Choices('auto'), optional Number of reads to process in each batch. If "auto", this parameter is autoscaled to min( number of query sequences / n_jobs, 20000). n_jobs : Threads, optional The maximum number of concurrent worker processes. If 0 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. confidence : Float % Range(0, 1, inclusive_end=True) | Str % Choices('disable'), optional Confidence threshold for limiting taxonomic depth. Set to "disable" to disable confidence calculation, or 0 to calculate confidence but not apply it to limit the taxonomic depth of the assignments. Returns ------- expected_taxonomy : FeatureData[Taxonomy] Expected taxonomic label for each input sequence. Taxonomic labels may be truncated due to k-fold CV and stratification. observed_taxonomy : FeatureData[Taxonomy] Observed taxonomic label for each input sequence, predicted by cross- validation. evaluation : Visualization Visualization of cross-validated accuracy results.