Fork me on GitHub

Importing data

Note

This tutorial assumes you have installed QIIME 2 using one of the procedures in the install documents.

In order to use QIIME 2, your input data must be stored in QIIME 2 artifacts (i.e. .qza files). This is what enables distributed and automatic provenance tracking, as well as semantic type validation and transformations between data formats (see the core concepts page for more details about QIIME 2 artifacts). This tutorial demonstrates how to import various data formats into QIIME 2 artifacts for use with QIIME 2.

Note

This tutorial does not describe all data formats that are currently supported in QIIME 2. It is a work-in-progress that describes some of the most commonly used data formats that are available. We are also actively working on supporting additional data formats. If you need to import data in a format that is not covered here, please post to the QIIME 2 Forum for help.

Importing will typically happen with your initial data (e.g. sequences obtained from a sequencing facility), but importing can be performed at any step in your analysis pipeline. For example, if a collaborator provides you with a .biom file, you can import it into a QIIME 2 artifact to perform “downstream” statistical analyses that operate on a feature table.

Importing can be accomplished using any of the QIIME 2 interfaces. This tutorial will focus on using the QIIME 2 command-line interface (q2cli) to import data. Each section below briefly describes a data format, provides commands to download example data, and illustrates how to import the data into a QIIME 2 artifact.

You may want to begin by creating a directory to work in.

mkdir qiime2-importing-tutorial
cd qiime2-importing-tutorial

Sequence data

“EMP protocol” multiplexed single-end fastq

Format description

In the “Earth Microbiome Project (EMP) protocol” format for single-end reads, there are two fastq.gz files, one containing sequence reads and one containing the associated barcode reads, with the sequence data still multiplexed. The order of the records in the two fastq.gz files defines the association between a sequence read and its barcode read.

Obtaining example data

mkdir emp-single-end-sequences
Please select a download option that is most appropriate for your environment:
wget -O "emp-single-end-sequences/barcodes.fastq.gz" "https://data.qiime2.org/2017.2/tutorials/moving-pictures/emp-single-end-sequences/barcodes.fastq.gz"
curl -sL "https://data.qiime2.org/2017.2/tutorials/moving-pictures/emp-single-end-sequences/barcodes.fastq.gz" > "emp-single-end-sequences/barcodes.fastq.gz"
Please select a download option that is most appropriate for your environment:
wget -O "emp-single-end-sequences/sequences.fastq.gz" "https://data.qiime2.org/2017.2/tutorials/moving-pictures/emp-single-end-sequences/sequences.fastq.gz"
curl -sL "https://data.qiime2.org/2017.2/tutorials/moving-pictures/emp-single-end-sequences/sequences.fastq.gz" > "emp-single-end-sequences/sequences.fastq.gz"

Importing data

qiime tools import \
  --type EMPSingleEndSequences \
  --input-path emp-single-end-sequences \
  --output-path emp-single-end-sequences.qza

Output artifacts:

“EMP protocol” multiplexed paired-end fastq

Format description

In the “Earth Microbiome Project (EMP) protocol” format for paired-end reads, there are three fastq.gz files, one containing forward sequence reads, one containing reverse sequence reads, and one containing the associated barcode reads, with the sequence data still multiplexed. The order of the records in the three fastq.gz files defines the association between the sequences reads and barcode reads.

Obtaining example data

mkdir emp-paired-end-sequences
Please select a download option that is most appropriate for your environment:

Download URL: https://data.qiime2.org/2017.2/tutorials/atacama-soils/1p/forward.fastq.gz

Save as: emp-paired-end-sequences/forward.fastq.gz

wget -O "emp-paired-end-sequences/forward.fastq.gz" "https://data.qiime2.org/2017.2/tutorials/atacama-soils/1p/forward.fastq.gz"
curl -sL "https://data.qiime2.org/2017.2/tutorials/atacama-soils/1p/forward.fastq.gz" > "emp-paired-end-sequences/forward.fastq.gz"
Please select a download option that is most appropriate for your environment:

Download URL: https://data.qiime2.org/2017.2/tutorials/atacama-soils/1p/reverse.fastq.gz

Save as: emp-paired-end-sequences/reverse.fastq.gz

wget -O "emp-paired-end-sequences/reverse.fastq.gz" "https://data.qiime2.org/2017.2/tutorials/atacama-soils/1p/reverse.fastq.gz"
curl -sL "https://data.qiime2.org/2017.2/tutorials/atacama-soils/1p/reverse.fastq.gz" > "emp-paired-end-sequences/reverse.fastq.gz"
Please select a download option that is most appropriate for your environment:

Download URL: https://data.qiime2.org/2017.2/tutorials/atacama-soils/1p/barcodes.fastq.gz

Save as: emp-paired-end-sequences/barcodes.fastq.gz

wget -O "emp-paired-end-sequences/barcodes.fastq.gz" "https://data.qiime2.org/2017.2/tutorials/atacama-soils/1p/barcodes.fastq.gz"
curl -sL "https://data.qiime2.org/2017.2/tutorials/atacama-soils/1p/barcodes.fastq.gz" > "emp-paired-end-sequences/barcodes.fastq.gz"

Importing data

qiime tools import \
  --type EMPPairedEndSequences \
  --input-path emp-paired-end-sequences \
  --output-path emp-paired-end-sequences.qza

Output artifacts:

Casava 1.8 single-end demultiplexed fastq

Format description

In this format, there is one fastq.gz file for each sample in the study, and the file name includes the sample identifier. The file name for a single sample might look like L2S357_15_L001_R1_001.fastq.gz. The underscore-separated fields in this file name are the sample identifier, the barcode sequence or a barcode identifier, the lane number, the read number, and the set number.

Obtaining example data

Please select a download option that is most appropriate for your environment:
wget -O "casava-18-single-end-demultiplexed.zip" "https://data.qiime2.org/2017.2/tutorials/importing-sequence-data/casava-18-single-end-demultiplexed.zip"
curl -sL "https://data.qiime2.org/2017.2/tutorials/importing-sequence-data/casava-18-single-end-demultiplexed.zip" > "casava-18-single-end-demultiplexed.zip"
unzip -q casava-18-single-end-demultiplexed.zip

Importing data

qiime tools import \
  --type 'SampleData[SequencesWithQuality]' \
  --input-path casava-18-single-end-demultiplexed \
  --source-format CasavaOneEightSingleLanePerSampleDirFmt \
  --output-path demux-single-end.qza

Output artifacts:

Casava 1.8 paired-end demultiplexed fastq

Format description

In this format, there are two fastq.gz file for each sample in the study, and the file name includes the sample identifier. The forward and reverse read file names for a single sample might look like L2S357_15_L001_R1_001.fastq.gz and L2S357_15_L001_R2_001.fastq.gz, respectively. The underscore-separated fields in this file name are the sample identifier, the barcode sequence or a barcode identifier, the lane number, the read number, and the set number.

Obtaining example data

Please select a download option that is most appropriate for your environment:
wget -O "casava-18-paired-end-demultiplexed.zip" "https://data.qiime2.org/2017.2/tutorials/importing-sequence-data/casava-18-paired-end-demultiplexed.zip"
curl -sL "https://data.qiime2.org/2017.2/tutorials/importing-sequence-data/casava-18-paired-end-demultiplexed.zip" > "casava-18-paired-end-demultiplexed.zip"
unzip -q casava-18-paired-end-demultiplexed.zip

Importing data

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path casava-18-paired-end-demultiplexed \
  --source-format CasavaOneEightSingleLanePerSampleDirFmt \
  --output-path demux-paired-end.qza

Output artifacts:

Feature table data

BIOM v1.0.0

Format description

See the BIOM v1.0.0 format specification for details.

Obtaining example data

Please select a download option that is most appropriate for your environment:
wget -O "feature-table-v100.biom" "https://data.qiime2.org/2017.2/tutorials/importing-sequence-data/feature-table-v100.biom"
curl -sL "https://data.qiime2.org/2017.2/tutorials/importing-sequence-data/feature-table-v100.biom" > "feature-table-v100.biom"

Importing data

qiime tools import \
  --input-path feature-table-v100.biom \
  --type "FeatureTable[Frequency]" \
  --source-format BIOMV100Format \
  --output-path feature-table-1.qza

Output artifacts:

BIOM v2.1.0

Format description

See the BIOM v2.1.0 format specification for details.

Obtaining example data

Please select a download option that is most appropriate for your environment:
wget -O "feature-table-v210.biom" "https://data.qiime2.org/2017.2/tutorials/importing-sequence-data/feature-table-v210.biom"
curl -sL "https://data.qiime2.org/2017.2/tutorials/importing-sequence-data/feature-table-v210.biom" > "feature-table-v210.biom"

Importing data

qiime tools import \
  --input-path feature-table-v210.biom \
  --type "FeatureTable[Frequency]" \
  --source-format BIOMV210Format \
  --output-path feature-table-2.qza

Output artifacts: