Docstring:
Usage: qiime phylogeny align-to-tree-mafft-raxml [OPTIONS]
This pipeline will start by creating a sequence alignment using MAFFT, after
which any alignment columns that are phylogenetically uninformative or
ambiguously aligned will be removed (masked). The resulting masked alignment
will be used to infer a phylogenetic tree using RAxML, under the specified
substitution model, and then subsequently rooted at its midpoint. Output
files from each step of the pipeline will be saved. This includes both the
unmasked and masked MAFFT alignment from q2-alignment methods, and both the
rooted and unrooted phylogenies from q2-phylogeny methods.
Inputs:
--i-sequences ARTIFACT FeatureData[Sequence]
The sequences to be used for creating a iqtree
based rooted phylogenetic tree. [required]
Parameters:
--p-n-threads NTHREADS The number of threads. (Use `all` to automatically
use all available cores. This value is used when
aligning the sequences and creating the tree with
iqtree. [default: 1]
--p-mask-max-gap-frequency PROPORTION Range(0, 1, inclusive_end=True)
The maximum relative frequency of gap characters in
a column for the column to be retained. This
relative frequency must be a number between 0.0 and
1.0 (inclusive), where 0.0 retains only those
columns without gap characters, and 1.0 retains all
columns regardless of gap character frequency. This
value is used when masking the aligned sequences.
[default: 1.0]
--p-mask-min-conservation PROPORTION Range(0, 1, inclusive_end=True)
The minimum relative frequency of at least one
non-gap character in a column for that column to be
retained. This relative frequency must be a number
between 0.0 and 1.0 (inclusive). For example, if a
value of 0.4 is provided, a column will only be
retained if it contains at least one character that
is present in at least 40% of the sequences. This
value is used when masking the aligned sequences.
[default: 0.4]
--p-parttree / --p-no-parttree
This flag is required if the number of sequences
being aligned are larger than 1000000. Disabled by
default. NOTE: if using this option, it is
recomended that only the CAT-based substitution
models of RAxML be considered for this pipeline.
[default: False]
--p-substitution-model TEXT Choices('GTRGAMMA', 'GTRGAMMAI', 'GTRCAT',
'GTRCATI') Model of Nucleotide Substitution.
[default: 'GTRGAMMA']
--p-seed INTEGER Random number seed for the parsimony starting tree.
This allows you to reproduce tree results. If not
supplied then one will be randomly chosen.
[optional]
--p-raxml-version TEXT Choices('Standard', 'SSE3', 'AVX2')
Select a specific CPU optimization of RAxML to use.
The SSE3 versions will run approximately 40% faster
than the standard version. The AVX2 version will run
10-30% faster than the SSE3 version.
[default: 'Standard']
Outputs:
--o-alignment ARTIFACT FeatureData[AlignedSequence]
The aligned sequences. [required]
--o-masked-alignment ARTIFACT FeatureData[AlignedSequence]
The masked alignment. [required]
--o-tree ARTIFACT The unrooted phylogenetic tree.
Phylogeny[Unrooted] [required]
--o-rooted-tree ARTIFACT
Phylogeny[Rooted] The rooted phylogenetic tree. [required]
Miscellaneous:
--output-dir PATH Output unspecified results to a directory
--verbose / --quiet Display verbose output to stdout and/or stderr
during execution of this action. Or silence output
if execution is successful (silence is golden).
--recycle-pool TEXT Use a cache pool for pipeline resumption. QIIME 2
will cache your results in this pool for reuse by
future invocations. These pool are retained until
deleted by the user. If not provided, QIIME 2 will
create a pool which is automatically reused by
invocations of the same action and removed if the
action is successful. Note: these pools are local to
the cache you are using.
--no-recycle Do not recycle results from a previous failed
pipeline run or save the results from this run for
future recycling.
--parallel Execute your action in parallel. This flag will use
your default parallel config.
--parallel-config FILE Execute your action in parallel using a config at
the indicated path.
--use-cache DIRECTORY Specify the cache to be used for the intermediate
work of this pipeline. If not provided, the default
cache under $TMP/qiime2/ will be used.
IMPORTANT FOR HPC USERS: If you are on an HPC system
and are using parallel execution it is important to
set this to a location that is globally accessible
to all nodes in the cluster.
--example-data PATH Write example data and exit.
--citations Show citations and exit.
--help Show this message and exit.
Import:
from qiime2.plugins.phylogeny.pipelines import align_to_tree_mafft_raxml
Docstring:
Build a phylogenetic tree using raxml and mafft alignment.
This pipeline will start by creating a sequence alignment using MAFFT,
after which any alignment columns that are phylogenetically uninformative
or ambiguously aligned will be removed (masked). The resulting masked
alignment will be used to infer a phylogenetic tree using RAxML, under the
specified substitution model, and then subsequently rooted at its midpoint.
Output files from each step of the pipeline will be saved. This includes
both the unmasked and masked MAFFT alignment from q2-alignment methods, and
both the rooted and unrooted phylogenies from q2-phylogeny methods.
Parameters
----------
sequences : FeatureData[Sequence]
The sequences to be used for creating a iqtree based rooted
phylogenetic tree.
n_threads : Threads, optional
The number of threads. (Use `all` to automatically use all available
cores. This value is used when aligning the sequences and creating the
tree with iqtree.
mask_max_gap_frequency : Float % Range(0, 1, inclusive_end=True), optional
The maximum relative frequency of gap characters in a column for the
column to be retained. This relative frequency must be a number between
0.0 and 1.0 (inclusive), where 0.0 retains only those columns without
gap characters, and 1.0 retains all columns regardless of gap
character frequency. This value is used when masking the aligned
sequences.
mask_min_conservation : Float % Range(0, 1, inclusive_end=True), optional
The minimum relative frequency of at least one non-gap character in a
column for that column to be retained. This relative frequency must be
a number between 0.0 and 1.0 (inclusive). For example, if a value of
0.4 is provided, a column will only be retained if it contains at
least one character that is present in at least 40% of the sequences.
This value is used when masking the aligned sequences.
parttree : Bool, optional
This flag is required if the number of sequences being aligned are
larger than 1000000. Disabled by default. NOTE: if using this option,
it is recomended that only the CAT-based substitution models of RAxML
be considered for this pipeline.
substitution_model : Str % Choices('GTRGAMMA', 'GTRGAMMAI', 'GTRCAT', 'GTRCATI'), optional
Model of Nucleotide Substitution.
seed : Int, optional
Random number seed for the parsimony starting tree. This allows you to
reproduce tree results. If not supplied then one will be randomly
chosen.
raxml_version : Str % Choices('Standard', 'SSE3', 'AVX2'), optional
Select a specific CPU optimization of RAxML to use. The SSE3 versions
will run approximately 40% faster than the standard version. The AVX2
version will run 10-30% faster than the SSE3 version.
Returns
-------
alignment : FeatureData[AlignedSequence]
The aligned sequences.
masked_alignment : FeatureData[AlignedSequence]
The masked alignment.
tree : Phylogeny[Unrooted]
The unrooted phylogenetic tree.
rooted_tree : Phylogeny[Rooted]
The rooted phylogenetic tree.