Docstring:
Usage: qiime rescript evaluate-cross-validate [OPTIONS]
Evaluate DNA sequence reference database via cross-validated taxonomic
classification. Unique taxonomic labels are truncated to enable appropriate
label stratification. See the cited reference (Bokulich et al. 2018) for
more details.
Inputs:
--i-sequences ARTIFACT FeatureData[Sequence]
Reference sequences to use for classifier
training/testing. [required]
--i-taxonomy ARTIFACT FeatureData[Taxonomy]
Reference taxonomy to use for classifier
training/testing. [required]
Parameters:
--p-k INTEGER Number of stratified folds.
Range(2, None) [default: 3]
--p-random-state INTEGER
Range(0, None) Seed used by the random number generator.
[default: 0]
--p-reads-per-batch VALUE Int % Range(1, None) | Str % Choices('auto')
Number of reads to process in each batch. If
"auto", this parameter is autoscaled to min( number
of query sequences / n-jobs, 20000).
[default: 'auto']
--p-n-jobs NTHREADS The maximum number of concurrent worker processes.
If 0 all CPUs are used. If 1 is given, no parallel
computing code is used at all, which is useful for
debugging. [default: 1]
--p-confidence VALUE Float % Range(0, 1, inclusive_end=True) | Str %
Choices('disable') Confidence threshold for limiting taxonomic depth.
Set to "disable" to disable confidence calculation,
or 0 to calculate confidence but not apply it to
limit the taxonomic depth of the assignments.
[default: 0.7]
Outputs:
--o-expected-taxonomy ARTIFACT FeatureData[Taxonomy]
Expected taxonomic label for each input sequence.
Taxonomic labels may be truncated due to k-fold CV
and stratification. [required]
--o-observed-taxonomy ARTIFACT FeatureData[Taxonomy]
Observed taxonomic label for each input sequence,
predicted by cross-validation. [required]
--o-evaluation VISUALIZATION
Visualization of cross-validated accuracy results.
[required]
Miscellaneous:
--output-dir PATH Output unspecified results to a directory
--verbose / --quiet Display verbose output to stdout and/or stderr
during execution of this action. Or silence output
if execution is successful (silence is golden).
--recycle-pool TEXT Use a cache pool for pipeline resumption. QIIME 2
will cache your results in this pool for reuse by
future invocations. These pool are retained until
deleted by the user. If not provided, QIIME 2 will
create a pool which is automatically reused by
invocations of the same action and removed if the
action is successful. Note: these pools are local to
the cache you are using.
--no-recycle Do not recycle results from a previous failed
pipeline run or save the results from this run for
future recycling.
--parallel Execute your action in parallel. This flag will use
your default parallel config.
--parallel-config FILE Execute your action in parallel using a config at
the indicated path.
--example-data PATH Write example data and exit.
--citations Show citations and exit.
--use-cache DIRECTORY Specify the cache to be used for the intermediate
work of this action. If not provided, the default
cache under $TMP/qiime2/ will be used.
IMPORTANT FOR HPC USERS: If you are on an HPC system
and are using parallel execution it is important to
set this to a location that is globally accessible
to all nodes in the cluster.
--help Show this message and exit.
Import:
from qiime2.plugins.rescript.pipelines import evaluate_cross_validate
Docstring:
Evaluate DNA sequence reference database via cross-validated taxonomic
classification.
Evaluate DNA sequence reference database via cross-validated taxonomic
classification. Unique taxonomic labels are truncated to enable appropriate
label stratification. See the cited reference (Bokulich et al. 2018) for
more details.
Parameters
----------
sequences : FeatureData[Sequence]
Reference sequences to use for classifier training/testing.
taxonomy : FeatureData[Taxonomy]
Reference taxonomy to use for classifier training/testing.
k : Int % Range(2, None), optional
Number of stratified folds.
random_state : Int % Range(0, None), optional
Seed used by the random number generator.
reads_per_batch : Int % Range(1, None) | Str % Choices('auto'), optional
Number of reads to process in each batch. If "auto", this parameter is
autoscaled to min( number of query sequences / n_jobs, 20000).
n_jobs : Threads, optional
The maximum number of concurrent worker processes. If 0 all CPUs are
used. If 1 is given, no parallel computing code is used at all, which
is useful for debugging.
confidence : Float % Range(0, 1, inclusive_end=True) | Str % Choices('disable'), optional
Confidence threshold for limiting taxonomic depth. Set to "disable" to
disable confidence calculation, or 0 to calculate confidence but not
apply it to limit the taxonomic depth of the assignments.
Returns
-------
expected_taxonomy : FeatureData[Taxonomy]
Expected taxonomic label for each input sequence. Taxonomic labels may
be truncated due to k-fold CV and stratification.
observed_taxonomy : FeatureData[Taxonomy]
Observed taxonomic label for each input sequence, predicted by cross-
validation.
evaluation : Visualization
Visualization of cross-validated accuracy results.