Fork me on GitHub

merge-taxa: Compare taxonomies and select the longest, highest scoring, or find the least common ancestor.ΒΆ

Docstring:

Usage: qiime rescript merge-taxa [OPTIONS]

  Compare taxonomy annotations and choose the best one. Can select the longest
  taxonomy annotation, the highest scoring, or the least common ancestor.
  Note: when a tie occurs, the last taxonomy added takes precedent.

Inputs:
  --i-data ARTIFACTS... List[FeatureData[Taxonomy]]
                         Two or more feature taxonomies to be merged.
                                                                    [required]
Parameters:
  --p-mode TEXT Choices('len', 'lca', 'score', 'super', 'majority')
                         How to merge feature taxonomies: "len" will select
                         the taxonomy with the most elements (e.g., species
                         level will beat genus level); "lca" will find the
                         least common ancestor and report this consensus
                         taxonomy; "score" will select the taxonomy with the
                         highest score (e.g., confidence or consensus score).
                         Note that "score" assumes that this score is always
                         contained as the second column in a feature taxonomy
                         dataframe. "majority" finds the LCA consensus while
                         giving preference to majority labels. "super" finds
                         the LCA consensus while giving preference to majority
                         labels and collapsing substrings into superstrings.
                         For example, when a more specific taxonomy does not
                         contradict a less specific taxonomy, the more
                         specific is chosen. That is, "g__Faecalibacterium;
                         s__prausnitzii", will be preferred over
                         "g__Faecalibacterium; s__"           [default: 'len']
  --p-rank-handle-regex TEXT
                         Regular expression indicating which taxonomic rank a
                         label belongs to; this handle is stripped from the
                         label prior to operating on the taxonomy. The net
                         effect is that ambiguous or empty levels can be
                         removed prior to comparison, enabling selection of
                         taxonomies with more complete taxonomic information.
                         For example, "^[dkpcofgs]__" will recognize
                         greengenes or silva rank handles. Note that
                         rank_handles are removed but not replaced; use the
                         new_rank_handle parameter to replace the rank
                         handles.                   [default: '^[dkpcofgs]__']
  --p-new-rank-handles VALUES... List[Str % Choices('disable')] | List[Str
    % Choices('domain', 'superkingdom', 'kingdom', 'subkingdom',
    'superphylum', 'phylum', 'subphylum', 'infraphylum', 'superclass',
    'class', 'subclass', 'infraclass', 'cohort', 'superorder', 'order',
    'suborder', 'infraorder', 'parvorder', 'superfamily', 'family',
    'subfamily', 'tribe', 'subtribe', 'genus', 'subgenus', 'species group',
    'species subgroup', 'species', 'subspecies', 'forma')]
                         Specifies the set of rank handles to prepend to
                         taxonomic labels at each rank. Note that merged
                         taxonomies will only contain as many levels as there
                         are handles if this parameter is used. This will trim
                         all taxonomies to the given levels, even if longer
                         annotations exist. Note that this parameter will
                         prepend rank handles whether or not they already
                         exist in the taxonomy, so should ALWAYS be used in
                         conjunction with `rank-handle-regex` if rank handles
                         exist in any of the inputs. Use 'disable' to prevent
                         prepending 'new-rank-handles'
[default: ['domain', 'phylum', 'class', 'order', 'family', 'genus', 'species']]
  --p-unclassified-label TEXT
                         Specifies what label should be used for taxonomies
                         that could not be resolved (when LCA modes are used).
                                                       [default: 'Unassigned']
Outputs:
  --o-merged-data ARTIFACT FeatureData[Taxonomy]
                                                                    [required]
Miscellaneous:
  --output-dir PATH      Output unspecified results to a directory
  --verbose / --quiet    Display verbose output to stdout and/or stderr
                         during execution of this action. Or silence output if
                         execution is successful (silence is golden).
  --example-data PATH    Write example data and exit.
  --citations            Show citations and exit.
  --use-cache DIRECTORY  Specify the cache to be used for the intermediate
                         work of this action. If not provided, the default
                         cache under $TMP/qiime2/ will be used.
                         IMPORTANT FOR HPC USERS: If you are on an HPC system
                         and are using parallel execution it is important to
                         set this to a location that is globally accessible to
                         all nodes in the cluster.
  --help                 Show this message and exit.

Import:

from qiime2.plugins.rescript.methods import merge_taxa

Docstring:

Compare taxonomies and select the longest, highest scoring, or find the
least common ancestor.

Compare taxonomy annotations and choose the best one. Can select the
longest taxonomy annotation, the highest scoring, or the least common
ancestor. Note: when a tie occurs, the last taxonomy added takes precedent.

Parameters
----------
data : List[FeatureData[Taxonomy]]
    Two or more feature taxonomies to be merged.
mode : Str % Choices('len', 'lca', 'score', 'super', 'majority'), optional
    How to merge feature taxonomies: "len" will select the taxonomy with
    the most elements (e.g., species level will beat genus level); "lca"
    will find the least common ancestor and report this consensus taxonomy;
    "score" will select the taxonomy with the highest score (e.g.,
    confidence or consensus score). Note that "score" assumes that this
    score is always contained as the second column in a feature taxonomy
    dataframe. "majority" finds the LCA consensus while giving preference
    to majority labels. "super" finds the LCA consensus while giving
    preference to majority labels and collapsing substrings into
    superstrings. For example, when a more specific taxonomy does not
    contradict a less specific taxonomy, the more specific is chosen. That
    is, "g__Faecalibacterium; s__prausnitzii", will be preferred over
    "g__Faecalibacterium; s__"
rank_handle_regex : Str, optional
    Regular expression indicating which taxonomic rank a label belongs to;
    this handle is stripped from the label prior to operating on the
    taxonomy. The net effect is that ambiguous or empty levels can be
    removed prior to comparison, enabling selection of taxonomies with more
    complete taxonomic information. For example, "^[dkpcofgs]__" will
    recognize greengenes or silva rank handles. Note that rank_handles are
    removed but not replaced; use the new_rank_handle parameter to replace
    the rank handles.
new_rank_handles : List[Str % Choices('disable')] | List[Str % Choices('domain', 'superkingdom', 'kingdom', 'subkingdom', 'superphylum', 'phylum', 'subphylum', 'infraphylum', 'superclass', 'class', 'subclass', 'infraclass', 'cohort', 'superorder', 'order', 'suborder', 'infraorder', 'parvorder', 'superfamily', 'family', 'subfamily', 'tribe', 'subtribe', 'genus', 'subgenus', 'species group', 'species subgroup', 'species', 'subspecies', 'forma')], optional
    Specifies the set of rank handles to prepend to taxonomic labels at
    each rank. Note that merged taxonomies will only contain as many levels
    as there are handles if this parameter is used. This will trim all
    taxonomies to the given levels, even if longer annotations exist. Note
    that this parameter will prepend rank handles whether or not they
    already exist in the taxonomy, so should ALWAYS be used in conjunction
    with `rank_handle_regex` if rank handles exist in any of the inputs.
    Use 'disable' to prevent prepending 'new_rank_handles'
unclassified_label : Str, optional
    Specifies what label should be used for taxonomies that could not be
    resolved (when LCA modes are used).

Returns
-------
merged_data : FeatureData[Taxonomy]