There are two glossaries for QIIME 2: one geared toward users (this document), and another geared toward developers. You can find the Developer Glossary here.
Artifacts are QIIME 2 results that are generally considered to represent intermediate data in an analysis, meaning that an artifact is generated by QIIME 2 and intended to be consumed by QIIME 2 (rather than by a human). Artifacts can be generated either by importing data into QIIME 2 or as out from a QIIME 2 action. When written to file, artifacts typically have the extension
.qza, which stands for QIIME Zipped Artifact. Artifacts can be provided as input to QIIME 2 actions, loaded with tools such as the QIIME 2 Artifact API for use with Python 3 or qiime2R for use with R, or exported from QIIME 2 for use with other software.
- data provenance¶
- data format¶
A view of an artifact as a file or multiple files stored on disk. QIIME 2 supports many data (or file) formats, and multiple data formats are sometimes available for importing or exporting of QIIME 2 artifacts of a given semantic type.
- data type¶
A view of an artifact as an in-memory data representation. Data types are generally only encountered by Artifact API users or plugin developers. QIIME 2 supports many data types, and multiple data types are sometimes available for viewing QIIME 2 artifacts of a given semantic type.
- decentralized data provenance¶
Information describing how a QIIME 2 result was generated. This will include details on all of the QIIME 2 actions that led to the creation of an artifact, including the values of all parameters, and references to all inputs and results as UUIDs. Data provenance additionally contains the literature citations that are relevant to the generation of a QIIME 2 result. Those citations should be included in all published work that derives from a given QIIME 2 result.
All QIIME 2 results contain embedded data provenance which can be visualized with QIIME 2 View. Because the data provenance is embedded in the results themselves, as opposed to being stored in a centralized database that maintains records on all results (for example), QIIME 2’s data provenance is described as being decentralized.
A unit of observation, such as an operational taxonomic unit, a sequence variant, a gene, a metabolite, etc. This generic term is used because QIIME 2 can support many different types of features.
A type of QIIME 2 action that takes one or more artifacts or parameters as input, and produces one or more artifacts as output. For example, the filter-features action in the q2-feature-table plugin is a method.
A type of QIIME 2 action that typically combines two or more other actions. A pipeline takes one or more artifacts or parameters as input, and produces one or more results ( artifacts and/or visualizations) as output. For example, the core-metrics action in the q2-diversity plugin is a pipeline.
A plugin provides analysis functionality in the form of actions. All plugins can be accessed through all interfaces. Plugins can be developed and distributed by anyone. As of this writing, a collection of plugins referred to as the “core distribution” is provided on installation of QIIME 2. Additional plugins can be installed, and the primary resource enabling discovery of additional plugins is the QIIME 2 Library. Anyone with a QIIME 2 Forum account can share their plugins on the QIIME 2 Library. We plan to phase out the core distribution as we move toward distributing all QIIME 2 plugins through the QIIME 2 Library.
- primitive type¶
A type used to define a parameter to a QIIME 2 action. For example, strings (i.e., text), integers, and booleans (i.e., true or false values) are primitives. Primitives are only used as input to actions, and never generated as output by QIIME 2.
An individual unit of study in an analysis.
- semantic type¶
A semantic type describes the meaning of data in QIIME 2. All results in QIIME 2 have a single semantic type associated with them, and when importing data into QIIME 2, the user must provide the semantic type of that data.
The use of semantic types by QIIME 2 provides an unambiguous way to communicate with others about data, and allows QIIME 2 to reason about data and help users prevent error. An example is helpful for illustrating what semantic types are and how they’re used by QIIME 2. QIIME 2 contains two related semantic types,
Phylogeny[Unrooted], which represent rooted and unrooted phylogenetic trees, respectively. Both rooted and unrooted phylogenetic trees can be stored in newick files, and it isn’t possible to easily tell if a phylogenetic tree is rooted or not without parsing the file. Some actions, such as the beta-phylogenetic method in the q2-diversity plugin, should be applied only to a rooted phylogenetic tree. By associating a semantic type with a phylogenetic tree artifact, QIIME 2 can determine if the correct type of data is being provided to an action, without having to first parse the file (which might be slow, and therefore delay the amount of time before an error can be presented to a user), and then possibly make assumptions based on what is observed. If a user accidentally provides data of a semantic type that is not acceptable for a QIIME 2 action, QIIME 2 can quickly detect this mismatch and provide the user with detailed information on the error and how to correct it.
Semantic types shouldn’t be confused with data formats which define how data is represented on disk. For example, another QIIME 2 semantic type, the
FeatureTable[Frequency], can be written to a BIOM-formatted file or to a tab-separated text file. By differentiating data formats from semantic types, QIIME 2 can support import and export of different file formats based on a user’s needs. Semantic types should also not be confused with data types. For example, the
FeatureTable[Frequency]semantic type could be represented in memory as a
biom.Tableobject or a
pandas.DataFrameobject, and for different applications one of these representations might be more useful than the other. Regardless of which in-memory representation is used, the meaning of the data is the same. By differentiating data types and semantic types, QIIME 2 allows developers and users to choose the data structure that is most convenient for them for a given task.
QIIME 2 uses UUIDs, or Universally Unique Identifiers, to reference all results, and all executions of actions. These can be used, for example, to determine that a given artifact was generated as output from a specific execution of an action using data provenance. UUIDs are an unambiguous way to refer to QIIME 2 results, because they can never change without invalidating a QIIME 2 artifact (unlike file names, for example, which are easy to change and are thus unreliable for tracking results).
A type of QIIME 2 action that takes one or more artifacts or parameters as input, and produces exactly one visualization as output. For example, the
summarizeaction in the q2-feature-table plugin is a visualizer.
Visualizations are QIIME 2 results that represent terminal output in an analysis, meaning that they are generated by QIIME 2 and intended to be consumed by a human (as opposed to being consumed by QIIME 2 or other software). Visualizations can only be generated by QIIME 2 visualizers or pipelines. When written to file, visualizations typically have the extension
.qzv, which stands for QIIME Zipped Visualization. Visualizations can be viewed with QIIME 2 View on systems that don’t have QIIME 2 installed, and QIIME 2 interfaces typically provide their own support for viewing (such as the
qiime tools viewcommand available through the QIIME 2 command line interface).