Fork me on GitHub

cluster-features-de-novo: De novo clustering of features.ΒΆ

Docstring:

Usage: qiime vsearch cluster-features-de-novo [OPTIONS]

  Given a feature table and the associated feature sequences, cluster the
  features based on user-specified percent identity threshold of their
  sequences. This is not a general-purpose de novo clustering method, but
  rather is intended to be used for clustering the results of quality-
  filtering/dereplication methods, such as DADA2, or for re-clustering a
  FeatureTable at a lower percent identity than it was originally clustered
  at. When a group of features in the input table are clustered into a
  single feature, the frequency of that single feature in a given sample is
  the sum of the frequencies of the features that were clustered in that
  sample. Feature identifiers and sequences will be inherited from the
  centroid feature of each cluster. See the vsearch documentation for
  details on how sequence clustering is performed.

Options:
  --i-sequences ARTIFACT PATH FeatureData[Sequence]
                                  The sequences corresponding to the features
                                  in table.  [required]
  --i-table ARTIFACT PATH FeatureTable[Frequency]
                                  The feature table to be clustered.
                                  [required]
  --p-perc-identity FLOAT         The percent identity at which clustering
                                  should be performed. This parameter maps to
                                  vsearch's --id parameter.  [required]
  --p-threads INTEGER RANGE       The number of threads to use for
                                  computation. Passing 0 will launch one
                                  thread per CPU core.  [default: 1]
  --o-clustered-table ARTIFACT PATH FeatureTable[Frequency]
                                  The table following clustering of features.
                                  [required if not passing --output-dir]
  --o-clustered-sequences ARTIFACT PATH FeatureData[Sequence]
                                  Sequences representing clustered features.
                                  [required if not passing --output-dir]
  --output-dir DIRECTORY          Output unspecified results to a directory
  --cmd-config FILE               Use config file for command options
  --verbose                       Display verbose output to stdout and/or
                                  stderr during execution of this action.
                                  [default: False]
  --quiet                         Silence output if execution is successful
                                  (silence is golden).  [default: False]
  --citations                     Show citations and exit.
  --help                          Show this message and exit.

Import:

from qiime2.plugins.vsearch.methods import cluster_features_de_novo

Docstring:

De novo clustering of features.

Given a feature table and the associated feature sequences, cluster the
features based on user-specified percent identity threshold of their
sequences. This is not a general-purpose de novo clustering method, but
rather is intended to be used for clustering the results of quality-
filtering/dereplication methods, such as DADA2, or for re-clustering a
FeatureTable at a lower percent identity than it was originally clustered
at. When a group of features in the input table are clustered into a single
feature, the frequency of that single feature in a given sample is the sum
of the frequencies of the features that were clustered in that sample.
Feature identifiers and sequences will be inherited from the centroid
feature of each cluster. See the vsearch documentation for details on how
sequence clustering is performed.

Parameters
----------
sequences : FeatureData[Sequence]
    The sequences corresponding to the features in table.
table : FeatureTable[Frequency]
    The feature table to be clustered.
perc_identity : Float % Range(0, 1, inclusive_start=False, inclusive_end=True)
    The percent identity at which clustering should be performed. This
    parameter maps to vsearch's --id parameter.
threads : Int % Range(0, 256, inclusive_end=True), optional
    The number of threads to use for computation. Passing 0 will launch one
    thread per CPU core.

Returns
-------
clustered_table : FeatureTable[Frequency]
    The table following clustering of features.
clustered_sequences : FeatureData[Sequence]
    Sequences representing clustered features.