evaluate-cross-validate: Evaluate DNA sequence reference database via cross-validated taxonomic classification.¶

Citations	Nicholas A. Bokulich, Benjamin D. Kaehler, Jai Ram Rideout, Matthew Dillon, Evan Bolyen, Rob Knight, Gavin A. Huttley, and J. Gregory Caporaso. Optimizing taxonomic classification of marker-gene amplicon sequences with qiime 2's q2-feature-classifier plugin. Microbiome, 6(1):90, 2018. URL: https://doi.org/10.1186/s40168-018-0470-z, doi:10.1186/s40168-018-0470-z.

Command line interface
Artifact API

Docstring:

Usage: qiime rescript evaluate-cross-validate [OPTIONS]

  Evaluate DNA sequence reference database via cross-validated taxonomic
  classification. Unique taxonomic labels are truncated to enable appropriate
  label stratification. See the cited reference (Bokulich et al. 2018) for
  more details.

Inputs:
  --i-sequences ARTIFACT FeatureData[Sequence]
                          Reference sequences to use for classifier
                          training/testing.                         [required]
  --i-taxonomy ARTIFACT FeatureData[Taxonomy]
                          Reference taxonomy to use for classifier
                          training/testing.                         [required]
Parameters:
  --p-k INTEGER           Number of stratified folds.
    Range(2, None)                                                [default: 3]
  --p-random-state INTEGER
    Range(0, None)        Seed used by the random number generator.
                                                                  [default: 0]
  --p-reads-per-batch VALUE Int % Range(1, None) | Str % Choices('auto')
                          Number of reads to process in each batch. If
                          "auto", this parameter is autoscaled to min( number
                          of query sequences / n-jobs, 20000).
                                                             [default: 'auto']
  --p-n-jobs NTHREADS     The maximum number of concurrent worker processes.
                          If 0 all CPUs are used. If 1 is given, no parallel
                          computing code is used at all, which is useful for
                          debugging.                              [default: 1]
  --p-confidence VALUE Float % Range(0, 1, inclusive_end=True) | Str %
    Choices('disable')    Confidence threshold for limiting taxonomic depth.
                          Set to "disable" to disable confidence calculation,
                          or 0 to calculate confidence but not apply it to
                          limit the taxonomic depth of the assignments.
                                                                [default: 0.7]
Outputs:
  --o-expected-taxonomy ARTIFACT FeatureData[Taxonomy]
                          Expected taxonomic label for each input sequence.
                          Taxonomic labels may be truncated due to k-fold CV
                          and stratification.                       [required]
  --o-observed-taxonomy ARTIFACT FeatureData[Taxonomy]
                          Observed taxonomic label for each input sequence,
                          predicted by cross-validation.            [required]
  --o-evaluation VISUALIZATION
                          Visualization of cross-validated accuracy results.
                                                                    [required]
Miscellaneous:
  --output-dir PATH       Output unspecified results to a directory
  --verbose / --quiet     Display verbose output to stdout and/or stderr
                          during execution of this action. Or silence output
                          if execution is successful (silence is golden).
  --recycle-pool TEXT     Use a cache pool for pipeline resumption. QIIME 2
                          will cache your results in this pool for reuse by
                          future invocations. These pool are retained until
                          deleted by the user. If not provided, QIIME 2 will
                          create a pool which is automatically reused by
                          invocations of the same action and removed if the
                          action is successful. Note: these pools are local to
                          the cache you are using.
  --no-recycle            Do not recycle results from a previous failed
                          pipeline run or save the results from this run for
                          future recycling.
  --parallel              Execute your action in parallel. This flag will use
                          your default parallel config.
  --parallel-config FILE  Execute your action in parallel using a config at
                          the indicated path.
  --use-cache DIRECTORY   Specify the cache to be used for the intermediate
                          work of this pipeline. If not provided, the default
                          cache under $TMP/qiime2/ will be used.
                          IMPORTANT FOR HPC USERS: If you are on an HPC system
                          and are using parallel execution it is important to
                          set this to a location that is globally accessible
                          to all nodes in the cluster.
  --example-data PATH     Write example data and exit.
  --citations             Show citations and exit.
  --help                  Show this message and exit.

Import:

from qiime2.plugins.rescript.pipelines import evaluate_cross_validate

Docstring:

Evaluate DNA sequence reference database via cross-validated taxonomic
classification.

Evaluate DNA sequence reference database via cross-validated taxonomic
classification. Unique taxonomic labels are truncated to enable appropriate
label stratification. See the cited reference (Bokulich et al. 2018) for
more details.

Parameters
----------
sequences : FeatureData[Sequence]
    Reference sequences to use for classifier training/testing.
taxonomy : FeatureData[Taxonomy]
    Reference taxonomy to use for classifier training/testing.
k : Int % Range(2, None), optional
    Number of stratified folds.
random_state : Int % Range(0, None), optional
    Seed used by the random number generator.
reads_per_batch : Int % Range(1, None) | Str % Choices('auto'), optional
    Number of reads to process in each batch. If "auto", this parameter is
    autoscaled to min( number of query sequences / n_jobs, 20000).
n_jobs : Threads, optional
    The maximum number of concurrent worker processes. If 0 all CPUs are
    used. If 1 is given, no parallel computing code is used at all, which
    is useful for debugging.
confidence : Float % Range(0, 1, inclusive_end=True) | Str % Choices('disable'), optional
    Confidence threshold for limiting taxonomic depth. Set to "disable" to
    disable confidence calculation, or 0 to calculate confidence but not
    apply it to limit the taxonomic depth of the assignments.

Returns
-------
expected_taxonomy : FeatureData[Taxonomy]
    Expected taxonomic label for each input sequence. Taxonomic labels may
    be truncated due to k-fold CV and stratification.
observed_taxonomy : FeatureData[Taxonomy]
    Observed taxonomic label for each input sequence, predicted by cross-
    validation.
evaluation : Visualization
    Visualization of cross-validated accuracy results.