Fork me on GitHub

evaluate-taxonomy: Evaluate expected vs. observed taxonomic assignmentsΒΆ

Citations

[quality-control:evaluate-taxonomy:BKR+18]Nicholas A Bokulich, Benjamin D Kaehler, Jai Ram Rideout, Matthew Dillon, Evan Bolyen, Rob Knight, Gavin A Huttley, and J Gregory Caporaso. Optimizing taxonomic classification of marker-gene amplicon sequences with qiime 2’s q2-feature-classifier plugin. Microbiome, 2018.

Docstring:

Usage: qiime quality-control evaluate-taxonomy [OPTIONS]

  This visualizer compares a pair of observed and expected taxonomic
  assignments to calculate precision, recall, and F-measure at each
  taxonomic level, up to maximum level specified by the depth parameter.
  These metrics are calculated at each semicolon-delimited rank. This action
  is useful for comparing the accuracy of taxonomic assignment, e.g.,
  between different taxonomy classifiers or other bioinformatics methods.
  Expected taxonomies should be derived from simulated or mock community
  sequences that have known taxonomic affiliations.

Options:
  --i-expected-taxa ARTIFACT PATH FeatureData[Taxonomy]
                                  Expected taxonomic assignments  [required]
  --i-observed-taxa ARTIFACT PATH FeatureData[Taxonomy]
                                  Observed taxonomic assignments  [required]
  --p-depth INTEGER               Maximum depth of semicolon-delimited
                                  taxonomic ranks to test (e.g., 1 = root, 7 =
                                  species for the greengenes reference
                                  sequence database).  [required]
  --p-palette [Pastel1|Set3|Set1|tab20|Pastel2|magma|rainbow|Paired|inferno|Accent|tab10|plasma|Dark2|Set2|tab20b|tab20c|terrain|viridis]
                                  Color palette to utilize for plotting.
                                  [default: Set1]
  --p-require-exp-ids / --p-no-require-exp-ids
                                  Require that all features found in observed
                                  taxa must be found in expected taxa or raise
                                  error.  [default: True]
  --p-require-obs-ids / --p-no-require-obs-ids
                                  Require that all features found in expected
                                  taxa must be found in observed taxa or raise
                                  error.  [default: True]
  --i-feature-table ARTIFACT PATH FeatureTable[RelativeFrequency]
                                  Optional feature table containing relative
                                  frequency of each feature, used to weight
                                  accuracy scores by frequency. Must contain
                                  all features found in expected and/or
                                  observed taxa. Features found in the table
                                  but not the expected/observed taxa will be
                                  dropped prior to analysis.  [optional]
  --p-sample-id TEXT              Optional sample ID to use for extracting
                                  frequency data from feature table, and for
                                  labeling accuracy results. If no sample_id
                                  is provided, feature frequencies are derived
                                  from the sum of all samples present in the
                                  feature table.  [optional]
  --o-visualization VISUALIZATION PATH
                                  [required if not passing --output-dir]
  --output-dir DIRECTORY          Output unspecified results to a directory
  --cmd-config FILE               Use config file for command options
  --verbose                       Display verbose output to stdout and/or
                                  stderr during execution of this action.
                                  [default: False]
  --quiet                         Silence output if execution is successful
                                  (silence is golden).  [default: False]
  --citations                     Show citations and exit.
  --help                          Show this message and exit.

Import:

from qiime2.plugins.quality_control.visualizers import evaluate_taxonomy

Docstring:

Evaluate expected vs. observed taxonomic assignments

This visualizer compares a pair of observed and expected taxonomic
assignments to calculate precision, recall, and F-measure at each taxonomic
level, up to maximum level specified by the depth parameter. These metrics
are calculated at each semicolon-delimited rank. This action is useful for
comparing the accuracy of taxonomic assignment, e.g., between different
taxonomy classifiers or other bioinformatics methods. Expected taxonomies
should be derived from simulated or mock community sequences that have
known taxonomic affiliations.

Parameters
----------
expected_taxa : FeatureData[Taxonomy]
    Expected taxonomic assignments
observed_taxa : FeatureData[Taxonomy]
    Observed taxonomic assignments
depth : Int
    Maximum depth of semicolon-delimited taxonomic ranks to test (e.g., 1 =
    root, 7 = species for the greengenes reference sequence database).
palette : Str % Choices({'Accent', 'Dark2', 'Paired', 'Pastel1', 'Pastel2', 'Set1', 'Set2', 'Set3', 'inferno', 'magma', 'plasma', 'rainbow', 'tab10', 'tab20', 'tab20b', 'tab20c', 'terrain', 'viridis'}), optional
    Color palette to utilize for plotting.
require_exp_ids : Bool, optional
    Require that all features found in observed taxa must be found in
    expected taxa or raise error.
require_obs_ids : Bool, optional
    Require that all features found in expected taxa must be found in
    observed taxa or raise error.
feature_table : FeatureTable[RelativeFrequency], optional
    Optional feature table containing relative frequency of each feature,
    used to weight accuracy scores by frequency. Must contain all features
    found in expected and/or observed taxa. Features found in the table but
    not the expected/observed taxa will be dropped prior to analysis.
sample_id : Str, optional
    Optional sample ID to use for extracting frequency data from feature
    table, and for labeling accuracy results. If no sample_id is provided,
    feature frequencies are derived from the sum of all samples present in
    the feature table.

Returns
-------
visualization : Visualization