Docstring:
Usage: qiime rescript merge-taxa [OPTIONS]
Compare taxonomy annotations and choose the best one. Can select the longest
taxonomy annotation, the highest scoring, or the least common ancestor.
Note: when a tie occurs, the last taxonomy added takes precedent.
Inputs:
--i-data ARTIFACTS... List[FeatureData[Taxonomy]]
Two or more feature taxonomies to be merged.
[required]
Parameters:
--p-mode TEXT Choices('len', 'lca', 'score', 'super', 'majority')
How to merge feature taxonomies: "len" will select
the taxonomy with the most elements (e.g., species
level will beat genus level); "lca" will find the
least common ancestor and report this consensus
taxonomy; "score" will select the taxonomy with the
highest score (e.g., confidence or consensus score).
Note that "score" assumes that this score is always
contained as the second column in a feature taxonomy
dataframe. "majority" finds the LCA consensus while
giving preference to majority labels. "super" finds
the LCA consensus while giving preference to majority
labels and collapsing substrings into superstrings.
For example, when a more specific taxonomy does not
contradict a less specific taxonomy, the more
specific is chosen. That is, "g__Faecalibacterium;
s__prausnitzii", will be preferred over
"g__Faecalibacterium; s__" [default: 'len']
--p-rank-handle-regex TEXT
Regular expression indicating which taxonomic rank a
label belongs to; this handle is stripped from the
label prior to operating on the taxonomy. The net
effect is that ambiguous or empty levels can be
removed prior to comparison, enabling selection of
taxonomies with more complete taxonomic information.
For example, "^[dkpcofgs]__" will recognize
greengenes or silva rank handles. Note that
rank_handles are removed but not replaced; use the
new_rank_handle parameter to replace the rank
handles. [default: '^[dkpcofgs]__']
--p-new-rank-handles VALUES... List[Str % Choices('disable')] | List[Str
% Choices('domain', 'superkingdom', 'kingdom', 'subkingdom',
'superphylum', 'phylum', 'subphylum', 'infraphylum', 'superclass',
'class', 'subclass', 'infraclass', 'cohort', 'superorder', 'order',
'suborder', 'infraorder', 'parvorder', 'superfamily', 'family',
'subfamily', 'tribe', 'subtribe', 'genus', 'subgenus', 'species group',
'species subgroup', 'species', 'subspecies', 'forma')]
Specifies the set of rank handles to prepend to
taxonomic labels at each rank. Note that merged
taxonomies will only contain as many levels as there
are handles if this parameter is used. This will trim
all taxonomies to the given levels, even if longer
annotations exist. Note that this parameter will
prepend rank handles whether or not they already
exist in the taxonomy, so should ALWAYS be used in
conjunction with `rank-handle-regex` if rank handles
exist in any of the inputs. Use 'disable' to prevent
prepending 'new-rank-handles'
[default: ['domain', 'phylum', 'class', 'order', 'family', 'genus', 'species']]
--p-unclassified-label TEXT
Specifies what label should be used for taxonomies
that could not be resolved (when LCA modes are used).
[default: 'Unassigned']
Outputs:
--o-merged-data ARTIFACT FeatureData[Taxonomy]
[required]
Miscellaneous:
--output-dir PATH Output unspecified results to a directory
--verbose / --quiet Display verbose output to stdout and/or stderr
during execution of this action. Or silence output if
execution is successful (silence is golden).
--example-data PATH Write example data and exit.
--citations Show citations and exit.
--use-cache DIRECTORY Specify the cache to be used for the intermediate
work of this action. If not provided, the default
cache under $TMP/qiime2/ will be used.
IMPORTANT FOR HPC USERS: If you are on an HPC system
and are using parallel execution it is important to
set this to a location that is globally accessible to
all nodes in the cluster.
--help Show this message and exit.
Import:
from qiime2.plugins.rescript.methods import merge_taxa
Docstring:
Compare taxonomies and select the longest, highest scoring, or find the
least common ancestor.
Compare taxonomy annotations and choose the best one. Can select the
longest taxonomy annotation, the highest scoring, or the least common
ancestor. Note: when a tie occurs, the last taxonomy added takes precedent.
Parameters
----------
data : List[FeatureData[Taxonomy]]
Two or more feature taxonomies to be merged.
mode : Str % Choices('len', 'lca', 'score', 'super', 'majority'), optional
How to merge feature taxonomies: "len" will select the taxonomy with
the most elements (e.g., species level will beat genus level); "lca"
will find the least common ancestor and report this consensus taxonomy;
"score" will select the taxonomy with the highest score (e.g.,
confidence or consensus score). Note that "score" assumes that this
score is always contained as the second column in a feature taxonomy
dataframe. "majority" finds the LCA consensus while giving preference
to majority labels. "super" finds the LCA consensus while giving
preference to majority labels and collapsing substrings into
superstrings. For example, when a more specific taxonomy does not
contradict a less specific taxonomy, the more specific is chosen. That
is, "g__Faecalibacterium; s__prausnitzii", will be preferred over
"g__Faecalibacterium; s__"
rank_handle_regex : Str, optional
Regular expression indicating which taxonomic rank a label belongs to;
this handle is stripped from the label prior to operating on the
taxonomy. The net effect is that ambiguous or empty levels can be
removed prior to comparison, enabling selection of taxonomies with more
complete taxonomic information. For example, "^[dkpcofgs]__" will
recognize greengenes or silva rank handles. Note that rank_handles are
removed but not replaced; use the new_rank_handle parameter to replace
the rank handles.
new_rank_handles : List[Str % Choices('disable')] | List[Str % Choices('domain', 'superkingdom', 'kingdom', 'subkingdom', 'superphylum', 'phylum', 'subphylum', 'infraphylum', 'superclass', 'class', 'subclass', 'infraclass', 'cohort', 'superorder', 'order', 'suborder', 'infraorder', 'parvorder', 'superfamily', 'family', 'subfamily', 'tribe', 'subtribe', 'genus', 'subgenus', 'species group', 'species subgroup', 'species', 'subspecies', 'forma')], optional
Specifies the set of rank handles to prepend to taxonomic labels at
each rank. Note that merged taxonomies will only contain as many levels
as there are handles if this parameter is used. This will trim all
taxonomies to the given levels, even if longer annotations exist. Note
that this parameter will prepend rank handles whether or not they
already exist in the taxonomy, so should ALWAYS be used in conjunction
with `rank_handle_regex` if rank handles exist in any of the inputs.
Use 'disable' to prevent prepending 'new_rank_handles'
unclassified_label : Str, optional
Specifies what label should be used for taxonomies that could not be
resolved (when LCA modes are used).
Returns
-------
merged_data : FeatureData[Taxonomy]