Fork me on GitHub

align-to-tree-mafft-raxml: Build a phylogenetic tree using raxml and mafft alignment.ΒΆ

Docstring:

Usage: qiime phylogeny align-to-tree-mafft-raxml [OPTIONS]

  This pipeline will start by creating a sequence alignment using MAFFT, after
  which any alignment columns that are phylogenetically uninformative or
  ambiguously aligned will be removed (masked). The resulting masked alignment
  will be used to infer a phylogenetic tree using RAxML, under the specified
  substitution model, and then subsequently rooted at its midpoint. Output
  files from each step of the pipeline will be saved. This includes both the
  unmasked and masked MAFFT alignment from q2-alignment methods, and both the
  rooted and unrooted phylogenies from q2-phylogeny methods.

Inputs:
  --i-sequences ARTIFACT FeatureData[Sequence]
                          The sequences to be used for creating a iqtree
                          based rooted phylogenetic tree.           [required]
Parameters:
  --p-n-threads NTHREADS  The number of threads. (Use `all` to automatically
                          use all available cores. This value is used when
                          aligning the sequences and creating the tree with
                          iqtree.                                 [default: 1]
  --p-mask-max-gap-frequency PROPORTION Range(0, 1, inclusive_end=True)
                          The maximum relative frequency of gap characters in
                          a column for the column to be retained. This
                          relative frequency must be a number between 0.0 and
                          1.0 (inclusive), where 0.0 retains only those
                          columns without gap characters, and 1.0 retains all
                          columns  regardless of gap character frequency. This
                          value is used when masking the aligned sequences.
                                                                [default: 1.0]
  --p-mask-min-conservation PROPORTION Range(0, 1, inclusive_end=True)
                          The minimum relative frequency of at least one
                          non-gap character in a column for that column to be
                          retained. This relative frequency must be a number
                          between 0.0 and 1.0 (inclusive). For example, if a
                          value of  0.4 is provided, a column will only be
                          retained  if it contains at least one character that
                          is present in at least 40% of the sequences. This
                          value is used when masking the aligned sequences.
                                                                [default: 0.4]
  --p-parttree / --p-no-parttree
                          This flag is required if the number of sequences
                          being aligned are larger than 1000000. Disabled by
                          default. NOTE: if using this option, it is
                          recomended that only the CAT-based substitution
                          models of RAxML be considered for this pipeline.
                                                              [default: False]
  --p-substitution-model TEXT Choices('GTRGAMMA', 'GTRGAMMAI', 'GTRCAT',
    'GTRCATI')            Model of Nucleotide Substitution.
                                                         [default: 'GTRGAMMA']
  --p-seed INTEGER        Random number seed for the parsimony starting tree.
                          This allows you to reproduce tree results. If not
                          supplied then one will be randomly chosen.
                                                                    [optional]
  --p-raxml-version TEXT Choices('Standard', 'SSE3', 'AVX2')
                          Select a specific CPU optimization of RAxML to use.
                          The SSE3 versions will run approximately 40% faster
                          than the standard version. The AVX2 version will run
                          10-30% faster than the SSE3 version.
                                                         [default: 'Standard']
Outputs:
  --o-alignment ARTIFACT FeatureData[AlignedSequence]
                          The aligned sequences.                    [required]
  --o-masked-alignment ARTIFACT FeatureData[AlignedSequence]
                          The masked alignment.                     [required]
  --o-tree ARTIFACT       The unrooted phylogenetic tree.
    Phylogeny[Unrooted]                                             [required]
  --o-rooted-tree ARTIFACT
    Phylogeny[Rooted]     The rooted phylogenetic tree.             [required]
Miscellaneous:
  --output-dir PATH       Output unspecified results to a directory
  --verbose / --quiet     Display verbose output to stdout and/or stderr
                          during execution of this action. Or silence output
                          if execution is successful (silence is golden).
  --recycle-pool TEXT     Use a cache pool for pipeline resumption. QIIME 2
                          will cache your results in this pool for reuse by
                          future invocations. These pool are retained until
                          deleted by the user. If not provided, QIIME 2 will
                          create a pool which is automatically reused by
                          invocations of the same action and removed if the
                          action is successful. Note: these pools are local to
                          the cache you are using.
  --no-recycle            Do not recycle results from a previous failed
                          pipeline run or save the results from this run for
                          future recycling.
  --parallel              Execute your action in parallel. This flag will use
                          your default parallel config.
  --parallel-config FILE  Execute your action in parallel using a config at
                          the indicated path.
  --use-cache DIRECTORY   Specify the cache to be used for the intermediate
                          work of this pipeline. If not provided, the default
                          cache under $TMP/qiime2/ will be used.
                          IMPORTANT FOR HPC USERS: If you are on an HPC system
                          and are using parallel execution it is important to
                          set this to a location that is globally accessible
                          to all nodes in the cluster.
  --example-data PATH     Write example data and exit.
  --citations             Show citations and exit.
  --help                  Show this message and exit.

Import:

from qiime2.plugins.phylogeny.pipelines import align_to_tree_mafft_raxml

Docstring:

Build a phylogenetic tree using raxml and mafft alignment.

This pipeline will start by creating a sequence alignment using MAFFT,
after which any alignment columns that are phylogenetically uninformative
or ambiguously aligned will be removed (masked). The resulting masked
alignment will be used to infer a phylogenetic tree using RAxML, under the
specified substitution model, and then subsequently rooted at its midpoint.
Output files from each step of the pipeline will be saved. This includes
both the unmasked and masked MAFFT alignment from q2-alignment methods, and
both the rooted and unrooted phylogenies from q2-phylogeny methods.

Parameters
----------
sequences : FeatureData[Sequence]
    The sequences to be used for creating a iqtree based rooted
    phylogenetic tree.
n_threads : Threads, optional
    The number of threads. (Use `all` to automatically use all available
    cores. This value is used when aligning the sequences and creating the
    tree with iqtree.
mask_max_gap_frequency : Float % Range(0, 1, inclusive_end=True), optional
    The maximum relative frequency of gap characters in a column for the
    column to be retained. This relative frequency must be a number between
    0.0 and 1.0 (inclusive), where 0.0 retains only those columns without
    gap characters, and 1.0 retains all columns  regardless of gap
    character frequency. This value is used when masking the aligned
    sequences.
mask_min_conservation : Float % Range(0, 1, inclusive_end=True), optional
    The minimum relative frequency of at least one non-gap character in a
    column for that column to be retained. This relative frequency must be
    a number between 0.0 and 1.0 (inclusive). For example, if a value of
    0.4 is provided, a column will only be retained  if it contains at
    least one character that is present in at least 40% of the sequences.
    This value is used when masking the aligned sequences.
parttree : Bool, optional
    This flag is required if the number of sequences being aligned are
    larger than 1000000. Disabled by default. NOTE: if using this option,
    it is recomended that only the CAT-based substitution models of RAxML
    be considered for this pipeline.
substitution_model : Str % Choices('GTRGAMMA', 'GTRGAMMAI', 'GTRCAT', 'GTRCATI'), optional
    Model of Nucleotide Substitution.
seed : Int, optional
    Random number seed for the parsimony starting tree. This allows you to
    reproduce tree results. If not supplied then one will be randomly
    chosen.
raxml_version : Str % Choices('Standard', 'SSE3', 'AVX2'), optional
    Select a specific CPU optimization of RAxML to use. The SSE3 versions
    will run approximately 40% faster than the standard version. The AVX2
    version will run 10-30% faster than the SSE3 version.

Returns
-------
alignment : FeatureData[AlignedSequence]
    The aligned sequences.
masked_alignment : FeatureData[AlignedSequence]
    The masked alignment.
tree : Phylogeny[Unrooted]
    The unrooted phylogenetic tree.
rooted_tree : Phylogeny[Rooted]
    The rooted phylogenetic tree.