Fork me on GitHub

get-silva-data: Download, parse, and import SILVA database.

Citations
  • Elmar Pruesse, Christian Quast, Katrin Knittel, Bernhard M Fuchs, Wolfgang Ludwig, Jorg Peplies, and Frank Oliver Glockner. Silva: a comprehensive online resource for quality checked and aligned ribosomal rna sequence data compatible with arb. Nucleic Acids Res, 35(21):7188–7196, 2007.

  • Christian Quast, Elmar Pruesse, Pelin Yilmaz, Jan Gerken, Timmy Schweer, Pablo Yarza, Jorg Peplies, and Frank Oliver Glockner. The silva ribosomal rna gene database project: improved data processing and web-based tools. Nucleic Acids Res, 41(Database issue):D590–6, 2013.

Docstring:

Usage: qiime rescript get-silva-data [OPTIONS]

  Download, parse, and import SILVA database files, given a version number and
  reference target. Downloads data directly from SILVA, parses the taxonomy
  files, and outputs ready-to-use sequence and taxonomy artifacts. REQUIRES
  STABLE INTERNET CONNECTION. NOTE: THIS ACTION ACQUIRES DATA FROM THE SILVA
  DATABASE. SEE https://www.arb-silva.de/silva-license-information/ FOR MORE
  INFORMATION and be aware that earlier versions may be released under a
  different license.

Parameters:
  --p-version VALUE Str % Choices('128', '132')¹ | Str % Choices('138')² |
    Str % Choices('138.1', '138.2')³
                          SILVA database version to download.
                                                            [default: '138.2']
  --p-target VALUE Str % Choices('SSURef_NR99', 'SSURef', 'LSURef')¹ | Str
    % Choices('SSURef_NR99', 'SSURef')² | Str % Choices('SSURef_NR99',
    'SSURef', 'LSURef_NR99', 'LSURef')³
                          Reference sequence target to download. SSURef =
                          redundant small subunit reference. LSURef =
                          redundant large subunit reference. SSURef_NR99 =
                          non-redundant (clustered at 99% similarity) small
                          subunit reference.          [default: 'SSURef_NR99']
  --p-include-species-labels / --p-no-include-species-labels
                          Include species rank labels in taxonomy output.
                          Note: species-labels may not be reliable in all
                          cases.                              [default: False]
  --p-rank-propagation / --p-no-rank-propagation
                          If a rank has no taxonomy associated with it, the
                          taxonomy from the upper-level rank of that lineage,
                          will be propagated downward. For example, if we are
                          missing the genus label for 'f__Pasteurellaceae;
                          g__'then the 'f__' rank will be propagated to
                          become: 'f__Pasteurellaceae; g__Pasteurellaceae'.
                                                               [default: True]
  --p-ranks TEXT... Choices('domain', 'superkingdom', 'kingdom',
    'subkingdom', 'superphylum', 'phylum', 'subphylum', 'infraphylum',
    'superclass', 'class', 'subclass', 'infraclass', 'superorder', 'order',
    'suborder', 'superfamily', 'family', 'subfamily', 'genus')
                          List of taxonomic ranks for building a taxonomy
                          from the SILVA Taxonomy database. Use
                          'include-species-labels' to append the organism name
                          as the species label. [default: 'domain', 'phylum',
                          'class', 'order', 'family', 'genus']      [optional]
  --p-download-sequences / --p-no-download-sequences
                          Toggle whether or not to download and import the
                          SILVA reference sequences associated with the
                          release. Skipping the sequences is useful if you
                          only want to download and parse the taxonomy, e.g.,
                          a local copy of the sequences already exists or for
                          testing purposes. NOTE: if this option is used, a
                          `silva-sequences` output is still created, but
                          contains no data.                    [default: True]
Outputs:
  --o-silva-sequences ARTIFACT FeatureData[RNASequence]
                          SILVA reference sequences.                [required]
  --o-silva-taxonomy ARTIFACT FeatureData[Taxonomy]
                          SILVA reference taxonomy.                 [required]
Miscellaneous:
  --output-dir PATH       Output unspecified results to a directory
  --verbose / --quiet     Display verbose output to stdout and/or stderr
                          during execution of this action. Or silence output
                          if execution is successful (silence is golden).
  --recycle-pool TEXT     Use a cache pool for pipeline resumption. QIIME 2
                          will cache your results in this pool for reuse by
                          future invocations. These pool are retained until
                          deleted by the user. If not provided, QIIME 2 will
                          create a pool which is automatically reused by
                          invocations of the same action and removed if the
                          action is successful. Note: these pools are local to
                          the cache you are using.
  --no-recycle            Do not recycle results from a previous failed
                          pipeline run or save the results from this run for
                          future recycling.
  --parallel              Execute your action in parallel. This flag will use
                          your default parallel config.
  --parallel-config FILE  Execute your action in parallel using a config at
                          the indicated path.
  --example-data PATH     Write example data and exit.
  --citations             Show citations and exit.
  --use-cache DIRECTORY   Specify the cache to be used for the intermediate
                          work of this action. If not provided, the default
                          cache under $TMP/qiime2/ will be used.
                          IMPORTANT FOR HPC USERS: If you are on an HPC system
                          and are using parallel execution it is important to
                          set this to a location that is globally accessible
                          to all nodes in the cluster.
  --help                  Show this message and exit.

Import:

from qiime2.plugins.rescript.pipelines import get_silva_data

Docstring:

Download, parse, and import SILVA database.

Download, parse, and import SILVA database files, given a version number
and reference target. Downloads data directly from SILVA, parses the
taxonomy files, and outputs ready-to-use sequence and taxonomy artifacts.
REQUIRES STABLE INTERNET CONNECTION. NOTE: THIS ACTION ACQUIRES DATA FROM
THE SILVA DATABASE. SEE https://www.arb-silva.de/silva-license-information/
FOR MORE INFORMATION and be aware that earlier versions may be released
under a different license.

Parameters
----------
version : Str % Choices('128', '132')¹ | Str % Choices('138')² | Str % Choices('138.1', '138.2')³, optional
    SILVA database version to download.
target : Str % Choices('SSURef_NR99', 'SSURef', 'LSURef')¹ | Str % Choices('SSURef_NR99', 'SSURef')² | Str % Choices('SSURef_NR99', 'SSURef', 'LSURef_NR99', 'LSURef')³, optional
    Reference sequence target to download. SSURef = redundant small subunit
    reference. LSURef = redundant large subunit reference. SSURef_NR99 =
    non-redundant (clustered at 99% similarity) small subunit reference.
include_species_labels : Bool, optional
    Include species rank labels in taxonomy output. Note: species-labels
    may not be reliable in all cases.
rank_propagation : Bool, optional
    If a rank has no taxonomy associated with it, the taxonomy from the
    upper-level rank of that lineage, will be propagated downward. For
    example, if we are missing the genus label for 'f__Pasteurellaceae;
    g__'then the 'f__' rank will be propagated to become:
    'f__Pasteurellaceae; g__Pasteurellaceae'.
ranks : List[Str % Choices('domain', 'superkingdom', 'kingdom', 'subkingdom', 'superphylum', 'phylum', 'subphylum', 'infraphylum', 'superclass', 'class', 'subclass', 'infraclass', 'superorder', 'order', 'suborder', 'superfamily', 'family', 'subfamily', 'genus')], optional
    List of taxonomic ranks for building a taxonomy from the SILVA Taxonomy
    database. Use 'include_species_labels' to append the organism name as
    the species label. [default: 'domain', 'phylum', 'class', 'order',
    'family', 'genus']
download_sequences : Bool, optional
    Toggle whether or not to download and import the SILVA reference
    sequences associated with the release. Skipping the sequences is useful
    if you only want to download and parse the taxonomy, e.g., a local copy
    of the sequences already exists or for testing purposes. NOTE: if this
    option is used, a `silva_sequences` output is still created, but
    contains no data.

Returns
-------
silva_sequences : FeatureData[RNASequence]
    SILVA reference sequences.
silva_taxonomy : FeatureData[Taxonomy]
    SILVA reference taxonomy.