Fork me on GitHub

cluster-features-de-novo: De novo clustering of features.ΒΆ

Docstring:

Usage: qiime vsearch cluster-features-de-novo [OPTIONS]

  Given a feature table and the associated feature sequences, cluster the
  features based on user-specified percent identity threshold of their
  sequences. This is not a general-purpose de novo clustering method, but
  rather is intended to be used for clustering the results of quality-
  filtering/dereplication methods, such as DADA2, or for re-clustering a
  FeatureTable at a lower percent identity than it was originally clustered
  at. When a group of features in the input table are clustered into a single
  feature, the frequency of that single feature in a given sample is the sum
  of the frequencies of the features that were clustered in that sample.
  Feature identifiers and sequences will be inherited from the centroid
  feature of each cluster. See the vsearch documentation for details on how
  sequence clustering is performed.

Inputs:
  --i-sequences ARTIFACT FeatureData[Sequence]
                          The sequences corresponding to the features in
                          table.                                    [required]
  --i-table ARTIFACT FeatureTable[Frequency]
                          The feature table to be clustered.        [required]
Parameters:
  --p-perc-identity PROPORTION Range(0, 1, inclusive_start=False,
    inclusive_end=True)   The percent identity at which clustering should be
                          performed. This parameter maps to vsearch's --id
                          parameter.                                [required]
  --p-strand TEXT Choices('plus', 'both')
                          Search plus (i.e., forward) or both (i.e., forward
                          and reverse complement) strands.   [default: 'plus']
  --p-threads NTHREADS    The number of threads to use for computation.
                          Passing 0 will launch one thread per CPU core.
                                                                  [default: 1]
Outputs:
  --o-clustered-table ARTIFACT FeatureTable[Frequency]
                          The table following clustering of features.
                                                                    [required]
  --o-clustered-sequences ARTIFACT FeatureData[Sequence]
                          Sequences representing clustered features.
                                                                    [required]
Miscellaneous:
  --output-dir PATH       Output unspecified results to a directory
  --verbose / --quiet     Display verbose output to stdout and/or stderr
                          during execution of this action. Or silence output
                          if execution is successful (silence is golden).
  --example-data PATH     Write example data and exit.
  --citations             Show citations and exit.
  --use-cache DIRECTORY   Specify the cache to be used for the intermediate
                          work of this action. If not provided, the default
                          cache under $TMP/qiime2/ will be used.
                          IMPORTANT FOR HPC USERS: If you are on an HPC system
                          and are using parallel execution it is important to
                          set this to a location that is globally accessible
                          to all nodes in the cluster.
  --help                  Show this message and exit.

Examples:
  # ### example: cluster features de novo
  qiime vsearch cluster-features-de-novo \
    --i-sequences seqs1.qza \
    --i-table table1.qza \
    --p-perc-identity 0.97 \
    --p-strand plus \
    --p-threads 1 \
    --o-clustered-table clustered-table.qza \
    --o-clustered-sequences clustered-sequences.qza

Import:

from qiime2.plugins.vsearch.methods import cluster_features_de_novo

Docstring:

De novo clustering of features.

Given a feature table and the associated feature sequences, cluster the
features based on user-specified percent identity threshold of their
sequences. This is not a general-purpose de novo clustering method, but
rather is intended to be used for clustering the results of quality-
filtering/dereplication methods, such as DADA2, or for re-clustering a
FeatureTable at a lower percent identity than it was originally clustered
at. When a group of features in the input table are clustered into a single
feature, the frequency of that single feature in a given sample is the sum
of the frequencies of the features that were clustered in that sample.
Feature identifiers and sequences will be inherited from the centroid
feature of each cluster. See the vsearch documentation for details on how
sequence clustering is performed.

Parameters
----------
sequences : FeatureData[Sequence]
    The sequences corresponding to the features in table.
table : FeatureTable[Frequency]
    The feature table to be clustered.
perc_identity : Float % Range(0, 1, inclusive_start=False, inclusive_end=True)
    The percent identity at which clustering should be performed. This
    parameter maps to vsearch's --id parameter.
strand : Str % Choices('plus', 'both'), optional
    Search plus (i.e., forward) or both (i.e., forward and reverse
    complement) strands.
threads : Threads, optional
    The number of threads to use for computation. Passing 0 will launch one
    thread per CPU core.

Returns
-------
clustered_table : FeatureTable[Frequency]
    The table following clustering of features.
clustered_sequences : FeatureData[Sequence]
    Sequences representing clustered features.