Filtering feature tables#

We’ll next obtain a much larger feature table representing all of the samples included in the ([LTC+21]) dataset. These would take too much time to denoise in this course, so we’ll start with the feature table, sequences, and metadata provided by the authors and filter to samples that we’ll use for our analyses. If you’d like to perform other experiments with this feature table, you can do that using the full feature table or a subset that you define by filtering.

Access the data#

First, download the full feature table.

wget \
  -O 'feature-table.qza' \
  'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/feature-table.qza'

Next, download the ASV sequences.

wget \
  -O 'rep-seqs.qza' \
  'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/rep-seqs.qza'

View the metadata#

We’ll take a quick look at the QIIME 2-formatted study metadata to refresh our memories. Either review the summary that you previously generated, or generate another one.

Generate summaries of full table and sequence data#

Next, it’s useful to generate summaries of the feature table and sequence data. We did this after running DADA2 previously, but since we’re now working with a new feature table and new sequence data, we should look at a summary of this table as well.

qiime feature-table summarize \
  --i-table feature-table.qza \
  --m-sample-metadata-file sample-metadata.tsv \
  --o-visualization table.qzv
qiime feature-table tabulate-seqs \
  --i-data rep-seqs.qza \
  --o-visualization rep-seqs.qzv

Exercise 1

Which column or columns in the metadata could be used to identify samples that were included in the autoFMT study?

Filter the feature table to the autoFMT study samples#

In this tutorial, we’re going to work specifically with samples that were included in the autoFMT randomized trial. We’ll now begin a series of filtering steps applied to both the feature table and the sequences to select only features and samples that are relevant to that study.

First, we’ll remove samples that are not part of the autoFMT study from the feature table. We identify these samples using the metadata. Specifically, this step filters samples that do not contain a value in the autoFmtGroup column in the metadata.

qiime feature-table filter-samples \
  --i-table feature-table.qza \
  --m-metadata-file sample-metadata.tsv \
  --p-where 'autoFmtGroup IS NOT NULL' \
  --o-filtered-table autofmt-table.qza

We can now summarize the feature table again to observe how it changed as a result of this first filtering step.

qiime feature-table summarize \
  --i-table autofmt-table.qza \
  --m-sample-metadata-file sample-metadata.tsv \
  --o-visualization autofmt-table-summ.qzv

Exercise 2

How many samples and features are in this feature table after filtering? How does that compare to the feature table prior to filtering?

Perform additional filtering steps on feature table#

Before we proceed with the analysis, we’ll apply a few more filtering steps.

First, we’re going to focus in on a specific window of time - mainly the ten days prior to the patients cell transplant through seventy days following the transplant. Some of the subjects in this study have very long-term microbiota data, but since many don’t it helps to just focus our analysis on the temporal range that is most relevant to this analysis.

qiime feature-table filter-samples \
  --i-table autofmt-table.qza \
  --m-metadata-file sample-metadata.tsv \
  --p-where 'DayRelativeToNearestHCT BETWEEN -10 AND 70' \
  --o-filtered-table filtered-table-1.qza

Finally, we’ll filter features from the feature table if they don’t occur in at least two samples. This filter is used here primarily to reduce the runtime of some of the downstream steps for the purpose of this tutorial. This filter isn’t necessary to run in your own analyses.

qiime feature-table filter-features \
  --i-table filtered-table-1.qza \
  --p-min-samples 2 \
  --o-filtered-table filtered-table-2.qza

Exercise 3

Generate a summary of this latest filtered feature table on your own (expand this box for help if necessary). How many samples and features are in this feature table?

Filter features from sequence data to reduce runtime of feature annotation#

At this point, we have filtered features from our feature table, but those features are still present in our sequence data. In the next section we’ll be performing some computationally expensive operations on these sequences, so to make those go quicker we’ll next filter all features that are no longer in our feature table from our collection of feature sequences.

qiime feature-table filter-seqs \
  --i-data rep-seqs.qza \
  --i-table filtered-table-2.qza \
  --o-filtered-data filtered-sequences-1.qza