Fork me on GitHub

evaluate-composition: Evaluate expected vs. observed taxonomic composition of samples

Citations
  • Nicholas A Bokulich, Benjamin D Kaehler, Jai Ram Rideout, Matthew Dillon, Evan Bolyen, Rob Knight, Gavin A Huttley, and J Gregory Caporaso. Optimizing taxonomic classification of marker-gene amplicon sequences with qiime 2’s q2-feature-classifier plugin. Microbiome, 2018.

Docstring:

Usage: qiime quality-control evaluate-composition [OPTIONS]

  This visualizer compares the feature composition of pairs of observed and
  expected samples containing the same sample ID in two separate feature
  tables. Typically, feature composition will consist of taxonomy
  classifications or other semicolon-delimited feature annotations. Taxon
  accuracy rate, taxon detection rate, and linear regression scores between
  expected and observed observations are calculated at each semicolon-
  delimited rank, and plots of per-level accuracy and observation
  correlations are plotted. A histogram of distance between false positive
  observations and the nearest expected feature is also generated, where
  distance equals the number of rank differences between the observed
  feature and the nearest common lineage in the expected feature. This
  visualizer is most suitable for testing per-run data quality on sequencing
  runs that contain mock communities or other samples with known
  composition. Also suitable for sanity checks of bioinformatics pipeline
  performance.

Inputs:
  --i-expected-features ARTIFACT FeatureTable[RelativeFrequency]
                       Expected feature compositions                [required]
  --i-observed-features ARTIFACT FeatureTable[RelativeFrequency]
                       Observed feature compositions                [required]
Parameters:
  --p-depth INTEGER    Maximum depth of semicolon-delimited taxonomic ranks
                       to test (e.g., 1 = root, 7 = species for the greengenes
                       reference sequence database).              [default: 7]
  --p-palette TEXT Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2',
    'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c',
    'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow')
                       Color palette to utilize for plotting.
                                                             [default: 'Set1']
  --p-plot-tar / --p-no-plot-tar
                       Plot taxon accuracy rate (TAR) on score plot. TAR is
                       the number of true positive features divided by the
                       total number of observed features (TAR = true positives
                       / (true positives + false positives)).  [default: True]
  --p-plot-tdr / --p-no-plot-tdr
                       Plot taxon detection rate (TDR) on score plot. TDR is
                       the number of true positive features divided by the
                       total number of expected features (TDR = true positives
                       / (true positives + false negatives)).  [default: True]
  --p-plot-r-value / --p-no-plot-r-value
                       Plot expected vs. observed linear regression r value
                       on score plot.                         [default: False]
  --p-plot-r-squared / --p-no-plot-r-squared
                       Plot expected vs. observed linear regression r-squared
                       value on score plot.                    [default: True]
  --p-plot-bray-curtis / --p-no-plot-bray-curtis
                       Plot expected vs. observed Bray-Curtis dissimilarity
                       scores on score plot.                  [default: False]
  --p-plot-jaccard / --p-no-plot-jaccard
                       Plot expected vs. observed Jaccard distances scores on
                       score plot.                            [default: False]
  --p-plot-observed-features / --p-no-plot-observed-features
                       Plot observed features count on score plot.
                                                              [default: False]
  --p-plot-observed-features-ratio / --p-no-plot-observed-features-ratio
                       Plot ratio of observed:expected features on score
                       plot.                                   [default: True]
  --m-metadata-file METADATA
  --m-metadata-column COLUMN  MetadataColumn[Categorical]
                       Optional sample metadata that maps observed-features
                       sample IDs to expected-features sample IDs.  [optional]
Outputs:
  --o-visualization VISUALIZATION
                                                                    [required]
Miscellaneous:
  --output-dir PATH    Output unspecified results to a directory
  --verbose / --quiet  Display verbose output to stdout and/or stderr during
                       execution of this action. Or silence output if
                       execution is successful (silence is golden).
  --citations          Show citations and exit.
  --help               Show this message and exit.

Import:

from qiime2.plugins.quality_control.visualizers import evaluate_composition

Docstring:

Evaluate expected vs. observed taxonomic composition of samples

This visualizer compares the feature composition of pairs of observed and
expected samples containing the same sample ID in two separate feature
tables. Typically, feature composition will consist of taxonomy
classifications or other semicolon-delimited feature annotations. Taxon
accuracy rate, taxon detection rate, and linear regression scores between
expected and observed observations are calculated at each semicolon-
delimited rank, and plots of per-level accuracy and observation
correlations are plotted. A histogram of distance between false positive
observations and the nearest expected feature is also generated, where
distance equals the number of rank differences between the observed feature
and the nearest common lineage in the expected feature. This visualizer is
most suitable for testing per-run data quality on sequencing runs that
contain mock communities or other samples with known composition. Also
suitable for sanity checks of bioinformatics pipeline performance.

Parameters
----------
expected_features : FeatureTable[RelativeFrequency]
    Expected feature compositions
observed_features : FeatureTable[RelativeFrequency]
    Observed feature compositions
depth : Int, optional
    Maximum depth of semicolon-delimited taxonomic ranks to test (e.g., 1 =
    root, 7 = species for the greengenes reference sequence database).
palette : Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow'), optional
    Color palette to utilize for plotting.
plot_tar : Bool, optional
    Plot taxon accuracy rate (TAR) on score plot. TAR is the number of true
    positive features divided by the total number of observed features (TAR
    = true positives / (true positives + false positives)).
plot_tdr : Bool, optional
    Plot taxon detection rate (TDR) on score plot. TDR is the number of
    true positive features divided by the total number of expected features
    (TDR = true positives / (true positives + false negatives)).
plot_r_value : Bool, optional
    Plot expected vs. observed linear regression r value on score plot.
plot_r_squared : Bool, optional
    Plot expected vs. observed linear regression r-squared value on score
    plot.
plot_bray_curtis : Bool, optional
    Plot expected vs. observed Bray-Curtis dissimilarity scores on score
    plot.
plot_jaccard : Bool, optional
    Plot expected vs. observed Jaccard distances scores on score plot.
plot_observed_features : Bool, optional
    Plot observed features count on score plot.
plot_observed_features_ratio : Bool, optional
    Plot ratio of observed:expected features on score plot.
metadata : MetadataColumn[Categorical], optional
    Optional sample metadata that maps observed_features sample IDs to
    expected_features sample IDs.

Returns
-------
visualization : Visualization