Fork me on GitHub

Utilities in QIIME 2

There are many non-plugin-based utilities available in QIIME 2. The following document attempts to demonstrate many of these functions. This document is divided by interface, and attempts to cross-reference similar functionality available in other interfaces.

q2cli

Most of the interesting utilities can be found in the tools subcommand of q2cli:

qiime tools --help

stdout:

Usage: qiime tools [OPTIONS] COMMAND [ARGS]...

  Tools for working with QIIME 2 files.

Options:
  --help      Show this message and exit.

Commands:
  citations         Print citations for a QIIME 2 result.
  export            Export data from a QIIME 2 Artifact or a Visualization
  extract           Extract a QIIME 2 Artifact or Visualization archive.
  import            Import data into a new QIIME 2 Artifact.
  inspect-metadata  Inspect columns available in metadata.
  peek              Take a peek at a QIIME 2 Artifact or Visualization.
  validate          Validate data in a QIIME 2 Artifact.
  view              View a QIIME 2 Visualization.

Let’s get our hands on some data so that we can learn more about this functionality! First, we will take a look at the taxonomic bar charts from the PD Mice Tutorial:

Please select a download option that is most appropriate for your environment:
wget \
  -O "taxa-barplot.qzv" \
  "https://data.qiime2.org/2020.8/tutorials/utilities/taxa-barplot.qzv"
curl -sL \
  "https://data.qiime2.org/2020.8/tutorials/utilities/taxa-barplot.qzv" > \
  "taxa-barplot.qzv"

Retrieving Citations

Now that we have some results, let’s learn more about the citations relevant to the creation of this visualization. First, we can check the help text for the qiime tools citations command:

qiime tools citations --help

stdout:

Usage: qiime tools citations [OPTIONS] ARTIFACT/VISUALIZATION

  Print citations as a BibTex file (.bib) for a QIIME 2 result.

Options:
  --help      Show this message and exit.

Output visualizations:

Now that we know how to use the command, we will run the following:

qiime tools citations taxa-barplot.qzv

stdout:

@article{framework|qiime2:2019.10.0|0,
 author = {Bolyen, Evan and Rideout, Jai Ram and Dillon, Matthew R. and Bokulich, Nicholas A. and Abnet, Christian C. and Al-Ghalith, Gabriel A. and Alexander, Harriet and Alm, Eric J. and Arumugam, Manimozhiyan and Asnicar, Francesco and Bai, Yang and Bisanz, Jordan E. and Bittinger, Kyle and Brejnrod, Asker and Brislawn, Colin J. and Brown, C. Titus and Callahan, Benjamin J. and Caraballo-Rodríguez, Andrés Mauricio and Chase, John and Cope, Emily K. and Da Silva, Ricardo and Diener, Christian and Dorrestein, Pieter C. and Douglas, Gavin M. and Durall, Daniel M. and Duvallet, Claire and Edwardson, Christian F. and Ernst, Madeleine and Estaki, Mehrbod and Fouquier, Jennifer and Gauglitz, Julia M. and Gibbons, Sean M. and Gibson, Deanna L. and Gonzalez, Antonio and Gorlick, Kestrel and Guo, Jiarong and Hillmann, Benjamin and Holmes, Susan and Holste, Hannes and Huttenhower, Curtis and Huttley, Gavin A. and Janssen, Stefan and Jarmusch, Alan K. and Jiang, Lingjing and Kaehler, Benjamin D. and Kang, Kyo Bin and Keefe, Christopher R. and Keim, Paul and Kelley, Scott T. and Knights, Dan and Koester, Irina and Kosciolek, Tomasz and Kreps, Jorden and Langille, Morgan G. I. and Lee, Joslynn and Ley, Ruth and Liu, Yong-Xin and Loftfield, Erikka and Lozupone, Catherine and Maher, Massoud and Marotz, Clarisse and Martin, Bryan D. and McDonald, Daniel and McIver, Lauren J. and Melnik, Alexey V. and Metcalf, Jessica L. and Morgan, Sydney C. and Morton, Jamie T. and Naimey, Ahmad Turan and Navas-Molina, Jose A. and Nothias, Louis Felix and Orchanian, Stephanie B. and Pearson, Talima and Peoples, Samuel L. and Petras, Daniel and Preuss, Mary Lai and Pruesse, Elmar and Rasmussen, Lasse Buur and Rivers, Adam and Robeson, Michael S. and Rosenthal, Patrick and Segata, Nicola and Shaffer, Michael and Shiffer, Arron and Sinha, Rashmi and Song, Se Jin and Spear, John R. and Swafford, Austin D. and Thompson, Luke R. and Torres, Pedro J. and Trinh, Pauline and Tripathi, Anupriya and Turnbaugh, Peter J. and Ul-Hasan, Sabah and van der Hooft, Justin J. J. and Vargas, Fernando and Vázquez-Baeza, Yoshiki and Vogtmann, Emily and von Hippel, Max and Walters, William and Wan, Yunhu and Wang, Mingxun and Warren, Jonathan and Weber, Kyle C. and Williamson, Charles H. D. and Willis, Amy D. and Xu, Zhenjiang Zech and Zaneveld, Jesse R. and Zhang, Yilong and Zhu, Qiyun and Knight, Rob and Caporaso, J. Gregory},
 doi = {10.1038/s41587-019-0209-9},
 issn = {1546-1696},
 journal = {Nature Biotechnology},
 number = {8},
 pages = {852-857},
 title = {Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2},
 url = {https://doi.org/10.1038/s41587-019-0209-9},
 volume = {37},
 year = {2019}
}

@article{view|types:2019.10.0|BIOMV210DirFmt|0,
 author = {McDonald, Daniel and Clemente, Jose C and Kuczynski, Justin and Rideout, Jai Ram and Stombaugh, Jesse and Wendel, Doug and Wilke, Andreas and Huse, Susan and Hufnagle, John and Meyer, Folker and Knight, Rob and Caporaso, J Gregory},
 doi = {10.1186/2047-217X-1-7},
 journal = {GigaScience},
 number = {1},
 pages = {7},
 publisher = {BioMed Central},
 title = {The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome},
 volume = {1},
 year = {2012}
}

@inproceedings{view|types:2019.10.0|pandas.core.frame:DataFrame|0,
 author = { Wes McKinney },
 booktitle = { Proceedings of the 9th Python in Science Conference },
 editor = { Stéfan van der Walt and Jarrod Millman },
 pages = { 51 -- 56 },
 title = { Data Structures for Statistical Computing in Python },
 year = { 2010 }
}

@inproceedings{view|types:2019.10.0|pandas.core.series:Series|0,
 author = { Wes McKinney },
 booktitle = { Proceedings of the 9th Python in Science Conference },
 editor = { Stéfan van der Walt and Jarrod Millman },
 pages = { 51 -- 56 },
 title = { Data Structures for Statistical Computing in Python },
 year = { 2010 }
}

@article{plugin|dada2:2019.10.0|0,
 author = {Callahan, Benjamin J and McMurdie, Paul J and Rosen, Michael J and Han, Andrew W and Johnson, Amy Jo A and Holmes, Susan P},
 doi = {10.1038/nmeth.3869},
 journal = {Nature methods},
 number = {7},
 pages = {581},
 publisher = {Nature Publishing Group},
 title = {DADA2: high-resolution sample inference from Illumina amplicon data},
 volume = {13},
 year = {2016}
}

@article{framework|qiime2:2019.4.0|0,
 author = {Bolyen, Evan and Rideout, Jai Ram and Dillon, Matthew R and Bokulich, Nicholas A and Abnet, Christian and Al-Ghalith, Gabriel A and Alexander, Harriet and Alm, Eric J and Arumugam, Manimozhiyan and Asnicar, Francesco and Bai, Yang and Bisanz, Jordan E and Bittinger, Kyle and Brejnrod, Asker and Brislawn, Colin J and Brown, C Titus and Callahan, Benjamin J and Caraballo-Rodríguez, Andrés Mauricio and Chase, John and Cope, Emily and Da Silva, Ricardo and Dorrestein, Pieter C and Douglas, Gavin M and Durall, Daniel M and Duvallet, Claire and Edwardson, Christian F and Ernst, Madeleine and Estaki, Mehrbod and Fouquier, Jennifer and Gauglitz, Julia M and Gibson, Deanna L and Gonzalez, Antonio and Gorlick, Kestrel and Guo, Jiarong and Hillmann, Benjamin and Holmes, Susan and Holste, Hannes and Huttenhower, Curtis and Huttley, Gavin and Janssen, Stefan and Jarmusch, Alan K and Jiang, Lingjing and Kaehler, Benjamin and Kang, Kyo Bin and Keefe, Christopher R and Keim, Paul and Kelley, Scott T and Knights, Dan and Koester, Irina and Kosciolek, Tomasz and Kreps, Jorden and Langille, Morgan GI and Lee, Joslynn and Ley, Ruth and Liu, Yong-Xin and Loftfield, Erikka and Lozupone, Catherine and Maher, Massoud and Marotz, Clarisse and Martin, Bryan and McDonald, Daniel and McIver, Lauren J and Melnik, Alexey V and Metcalf, Jessica L and Morgan, Sydney C and Morton, Jamie and Naimey, Ahmad Turan and Navas-Molina, Jose A and Nothias, Louis Felix and Orchanian, Stephanie B and Pearson, Talima and Peoples, Samuel L and Petras, Daniel and Preuss, Mary Lai and Pruesse, Elmar and Rasmussen, Lasse Buur and Rivers, Adam and Robeson, II, Michael S and Rosenthal, Patrick and Segata, Nicola and Shaffer, Michael and Shiffer, Arron and Sinha, Rashmi and Song, Se Jin and Spear, John R and Swafford, Austin D and Thompson, Luke R and Torres, Pedro J and Trinh, Pauline and Tripathi, Anupriya and Turnbaugh, Peter J and Ul-Hasan, Sabah and van der Hooft, Justin JJ and Vargas, Fernando and Vázquez-Baeza, Yoshiki and Vogtmann, Emily and von Hippel, Max and Walters, William and Wan, Yunhu and Wang, Mingxun and Warren, Jonathan and Weber, Kyle C and Williamson, Chase HD and Willis, Amy D and Xu, Zhenjiang Zech and Zaneveld, Jesse R and Zhang, Yilong and Knight, Rob and Caporaso, J Gregory},
 doi = {10.7287/peerj.preprints.27295v1},
 issn = {2167-9843},
 journal = {PeerJ Preprints},
 month = {oct},
 pages = {e27295v1},
 title = {QIIME 2: Reproducible, interactive, scalable, and extensible microbiome data science},
 url = {https://doi.org/10.7287/peerj.preprints.27295v1},
 volume = {6},
 year = {2018}
}

@article{action|feature-classifier:2019.10.0|method:classify_sklearn|0,
 author = {Pedregosa, Fabian and Varoquaux, Gaël and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and Vanderplas, Jake and Passos, Alexandre and Cournapeau, David and Brucher, Matthieu and Perrot, Matthieu and Duchesnay, Édouard},
 journal = {Journal of machine learning research},
 number = {Oct},
 pages = {2825--2830},
 title = {Scikit-learn: Machine learning in Python},
 volume = {12},
 year = {2011}
}

@article{plugin|feature-classifier:2019.10.0|0,
 author = {Bokulich, Nicholas A. and Kaehler, Benjamin D. and Rideout, Jai Ram and Dillon, Matthew and Bolyen, Evan and Knight, Rob and Huttley, Gavin A. and Caporaso, J. Gregory},
 doi = {10.1186/s40168-018-0470-z},
 journal = {Microbiome},
 number = {1},
 pages = {90},
 title = {Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin},
 url = {https://doi.org/10.1186/s40168-018-0470-z},
 volume = {6},
 year = {2018}
}

@article{view|types:2019.10.0|biom.table:Table|0,
 author = {McDonald, Daniel and Clemente, Jose C and Kuczynski, Justin and Rideout, Jai Ram and Stombaugh, Jesse and Wendel, Doug and Wilke, Andreas and Huse, Susan and Hufnagle, John and Meyer, Folker and Knight, Rob and Caporaso, J Gregory},
 doi = {10.1186/2047-217X-1-7},
 journal = {GigaScience},
 number = {1},
 pages = {7},
 publisher = {BioMed Central},
 title = {The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome},
 volume = {1},
 year = {2012}
}

@article{plugin|feature-classifier:2019.4.0|0,
 author = {Bokulich, Nicholas A. and Kaehler, Benjamin D. and Rideout, Jai Ram and Dillon, Matthew and Bolyen, Evan and Knight, Rob and Huttley, Gavin A. and Caporaso, J. Gregory},
 doi = {10.1186/s40168-018-0470-z},
 journal = {Microbiome},
 number = {1},
 pages = {90},
 title = {Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin},
 url = {https://doi.org/10.1186/s40168-018-0470-z},
 volume = {6},
 year = {2018}
}

@article{action|feature-classifier:2019.4.0|method:fit_classifier_naive_bayes|0,
 author = {Pedregosa, Fabian and Varoquaux, Gaël and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and Vanderplas, Jake and Passos, Alexandre and Cournapeau, David and Brucher, Matthieu and Perrot, Matthieu and Duchesnay, Édouard},
 journal = {Journal of machine learning research},
 number = {Oct},
 pages = {2825--2830},
 title = {Scikit-learn: Machine learning in Python},
 volume = {12},
 year = {2011}
}

@inproceedings{view|types:2019.4.1|pandas.core.series:Series|0,
 author = { Wes McKinney },
 booktitle = { Proceedings of the 9th Python in Science Conference },
 editor = { Stéfan van der Walt and Jarrod Millman },
 pages = { 51 -- 56 },
 title = { Data Structures for Statistical Computing in Python },
 year = { 2010 }
}

As you can see, the citations for this particular visualization are presented above in BibTeX format.

We can also see the citations for a specific plugin:

qiime vsearch --citations

stdout:

% use `qiime tools citations` on a QIIME 2 result for complete list

@article{key0,
 author = {Rognes, Torbjørn and Flouri, Tomáš and Nichols, Ben and Quince, Christopher and Mahé, Frédéric},
 doi = {10.7717/peerj.2584},
 journal = {PeerJ},
 pages = {e2584},
 publisher = {PeerJ Inc.},
 title = {VSEARCH: a versatile open source tool for metagenomics},
 volume = {4},
 year = {2016}
}

And also for a specific action of a plugin:

qiime vsearch cluster-features-open-reference --citations

stdout:

% use `qiime tools citations` on a QIIME 2 result for complete list

@article{key0,
 author = {Rideout, Jai Ram and He, Yan and Navas-Molina, Jose A. and Walters, William A. and Ursell, Luke K. and Gibbons, Sean M. and Chase, John and McDonald, Daniel and Gonzalez, Antonio and Robbins-Pianka, Adam and Clemente, Jose C. and Gilbert, Jack A. and Huse, Susan M. and Zhou, Hong-Wei and Knight, Rob and Caporaso, J. Gregory},
 doi = {10.7717/peerj.545},
 journal = {PeerJ},
 pages = {e545},
 publisher = {PeerJ Inc.},
 title = {Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences},
 volume = {2},
 year = {2014}
}

Viewing Visualizations

What if we want to view our taxa bar plots? One option is to load the visualization at https://view.qiime2.org. All QIIME 2 Results may be opened this way. This will present the visualization (assuming the file is a .qzv), Result details (e.g. filename, uuid, type, format, citations), and a provenance graph showing how the Visualization or Artifact was created.

Note

Provenance viewing is only available at https://view.qiime2.org.

Another option is to use qiime tools view to accomplish the job. This command may only be used with Visualizations, and will not display Visualization details (see Peeking at Results) or provenence, but provides a quick and easy way to view your results from the command line.

qiime tools view taxa-barplot.qzv

This will open a browser window with your visualization loaded in it. When you are done, you can close the browser window and press ctrl-c on the keyboard to terminate the command.

Peeking at Results

Oftentimes we need to verify the type and uuid of an Artifact. We can use the qiime tools peek command to view a brief summary report of those facts. First, let’s get some data to look at:

Please select a download option that is most appropriate for your environment:
wget \
  -O "faith-pd-vector.qza" \
  "https://data.qiime2.org/2020.8/tutorials/utilities/faith-pd-vector.qza"
curl -sL \
  "https://data.qiime2.org/2020.8/tutorials/utilities/faith-pd-vector.qza" > \
  "faith-pd-vector.qza"

Now that we have data, we can learn more about the file:

qiime tools peek faith-pd-vector.qza

stdout:

UUID:        d5186dce-438d-44bb-903c-cb51a7ad4abe
Type:        SampleData[AlphaDiversity] % Properties('phylogenetic')
Data format: AlphaDiversityDirectoryFormat

Output artifacts:

Here we can see that the type of the Artifact is SampleData[AlphaDiversity] % Properties('phylogenetic'), as well as the Artifact’s UUID and format.

Validating Results

We can also validate the integrity of the file by running qiime tools validate:

qiime tools validate faith-pd-vector.qza

stdout:

Result faith-pd-vector.qza appears to be valid at level=max.

If there was an issue with the file, this command will usually do a good job of reporting what the problem is (within reason).

Inspecting Metadata

In the Metadata tutorial we learned about the metadata tabulate command, and the resulting visualization it creates. Oftentimes we don’t care so much about the values of the Metadata, but rather, just the shape of it: how many columns? What are their names? What are their types? How many rows (or IDs) are in the file?

We can demonstrate this by first downloading some sample metadata:

Please select a download option that is most appropriate for your environment:
wget \
  -O "sample-metadata.tsv" \
  "https://data.qiime2.org/2020.8/tutorials/pd-mice/sample_metadata.tsv"
curl -sL \
  "https://data.qiime2.org/2020.8/tutorials/pd-mice/sample_metadata.tsv" > \
  "sample-metadata.tsv"

Then, we can run the qiime tools inspect-metadata command:

qiime tools inspect-metadata sample-metadata.tsv

stdout:

              COLUMN NAME  TYPE       
=========================  ===========
                  barcode  categorical
                 mouse_id  categorical
                 genotype  categorical
                  cage_id  categorical
                    donor  categorical
             donor_status  categorical
     days_post_transplant  numeric    
genotype_and_donor_status  categorical
=========================  ===========
                     IDS:  48
                 COLUMNS:  8

Question

How many metadata columns are there in sample-metadata.tsv? How many IDs? Identify how many categorical columns are present. Now do the same for numerical columns.

This tool can be very helpful for learning about Metadata column names for files that are viewable as Metadata.

Please select a download option that is most appropriate for your environment:
wget \
  -O "jaccard-pcoa.qza" \
  "https://data.qiime2.org/2020.8/tutorials/utilities/jaccard-pcoa.qza"
curl -sL \
  "https://data.qiime2.org/2020.8/tutorials/utilities/jaccard-pcoa.qza" > \
  "jaccard-pcoa.qza"

The file we just downloaded is a Jaccard PCoA (from the PD Mice Tutorial), which, can be used in place of the “typical” TSV-formatted Metadata file. We might need to know about column names for commands we wish to run, using inspect-metadata, we can learn all about it:

qiime tools inspect-metadata jaccard-pcoa.qza

stdout:

COLUMN NAME  TYPE   
===========  =======
     Axis 1  numeric
     Axis 2  numeric
     Axis 3  numeric
     Axis 4  numeric
     Axis 5  numeric
     Axis 6  numeric
     Axis 7  numeric
     Axis 8  numeric
     Axis 9  numeric
    Axis 10  numeric
    Axis 11  numeric
    Axis 12  numeric
    Axis 13  numeric
    Axis 14  numeric
    Axis 15  numeric
    Axis 16  numeric
    Axis 17  numeric
    Axis 18  numeric
    Axis 19  numeric
    Axis 20  numeric
    Axis 21  numeric
    Axis 22  numeric
    Axis 23  numeric
    Axis 24  numeric
    Axis 25  numeric
    Axis 26  numeric
    Axis 27  numeric
    Axis 28  numeric
    Axis 29  numeric
    Axis 30  numeric
    Axis 31  numeric
    Axis 32  numeric
    Axis 33  numeric
    Axis 34  numeric
    Axis 35  numeric
    Axis 36  numeric
    Axis 37  numeric
    Axis 38  numeric
    Axis 39  numeric
    Axis 40  numeric
    Axis 41  numeric
    Axis 42  numeric
    Axis 43  numeric
    Axis 44  numeric
    Axis 45  numeric
    Axis 46  numeric
    Axis 47  numeric
===========  =======
       IDS:  47
   COLUMNS:  47

Output artifacts:

Question

How many IDs are there? How many columns? Are there any categorical columns? Why?

Artifact API

Coming soon, please stay tuned!