Filtering feature tables

Filtering feature tables#

We’ll next obtain a much larger feature table representing all of the samples included in the ([LTC+21]) dataset. These would take too much time to denoise in this course, so we’ll start with the feature table, sequences, and metadata provided by the authors and filter to samples that we’ll use for our analyses. If you’d like to perform other experiments with this feature table, you can do that using the full feature table or a subset that you define by filtering.

Access the data#

First, download the full feature table.

url = 'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/feature-table.qza'
fn = 'feature-table.qza'
request.urlretrieve(url, fn)
feature_table = Artifact.load(fn)

url <- 'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/feature-table.qza'
fn <- 'feature-table.qza'
request$urlretrieve(url, fn)
feature_table <- Artifact$load(fn)

wget \
  -O 'feature-table.qza' \
  'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/feature-table.qza'

feature_table_url = 'https://data.qiime2.org/2024.5/tutorials/liao/full-feature-table.qza'

def artifact_from_url(url):
    def factory():
        import tempfile
        import requests
        import qiime2

        data = requests.get(url)

        with tempfile.NamedTemporaryFile() as f:
            f.write(data.content)
            f.flush()
            result = qiime2.Artifact.load(f.name)

        return result
    return factory

feature_table = use.init_artifact(
        'feature-table',
        artifact_from_url(feature_table_url))

Using the Upload Data tool:

On the first tab (Regular), press the Paste/Fetch data button at the bottom.
1. Set “Name” (first text-field) to: feature-table.qza
2. In the larger text-area, copy-and-paste: https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/feature-table.qza
3. (“Type”, “Genome”, and “Settings” can be ignored)
Press the Start button at the bottom.

feature-table.qza | view | download

Next, download the ASV sequences.

url = 'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/rep-seqs.qza'
fn = 'rep-seqs.qza'
request.urlretrieve(url, fn)
rep_seqs = Artifact.load(fn)

url <- 'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/rep-seqs.qza'
fn <- 'rep-seqs.qza'
request$urlretrieve(url, fn)
rep_seqs <- Artifact$load(fn)

wget \
  -O 'rep-seqs.qza' \
  'https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/rep-seqs.qza'

seqs_url = 'https://data.qiime2.org/2024.5/tutorials/liao/rep-seqs.qza'

feature_sequences = use.init_artifact(
    'rep-seqs',
    artifact_from_url(seqs_url))

Using the Upload Data tool:

On the first tab (Regular), press the Paste/Fetch data button at the bottom.
1. Set “Name” (first text-field) to: rep-seqs.qza
2. In the larger text-area, copy-and-paste: https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/data/030-tutorial-downstream/010-filtering/rep-seqs.qza
3. (“Type”, “Genome”, and “Settings” can be ignored)
Press the Start button at the bottom.

rep-seqs.qza | view | download

View the metadata#

We’ll take a quick look at the QIIME 2-formatted study metadata to refresh our memories. Either review the summary that you previously generated, or generate another one.

Expand this box for help generating a metadata summary.

metadata_summ_viz, = metadata_actions.tabulate(
    input=sample_metadata_md,
)

action_results <- metadata_actions$tabulate(
    input=sample_metadata_md,
)
metadata_summ_viz <- action_results$visualization

qiime metadata tabulate \
  --m-input-file sample-metadata.tsv \
  --o-visualization metadata-summ.qzv

use.action(
    use.UsageAction(plugin_id='metadata', action_id='tabulate'),
    use.UsageInputs(input=sample_metadata),
    use.UsageOutputNames(visualization='metadata_summ')
)

Using the qiime2 metadata tabulate tool:

For “input”:
- Perform the following steps.
  1. Leave as Metadata from TSV
  2. Set “Metadata Source” to sample-metadata.tsv
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 metadata tabulate [...] : visualization.qzv`	`metadata-summ.qzv`

metadata-summ.qzv | view | download

Generate summaries of full table and sequence data#

Next, it’s useful to generate summaries of the feature table and sequence data. We did this after running DADA2 previously, but since we’re now working with a new feature table and new sequence data, we should look at a summary of this table as well.

table_viz, = feature_table_actions.summarize(
    table=feature_table,
    sample_metadata=sample_metadata_md,
)
rep_seqs_viz, = feature_table_actions.tabulate_seqs(
    data=rep_seqs,
)

action_results <- feature_table_actions$summarize(
    table=feature_table,
    sample_metadata=sample_metadata_md,
)
table_viz <- action_results$visualization
action_results <- feature_table_actions$tabulate_seqs(
    data=rep_seqs,
)
rep_seqs_viz <- action_results$visualization

qiime feature-table summarize \
  --i-table feature-table.qza \
  --m-sample-metadata-file sample-metadata.tsv \
  --o-visualization table.qzv
qiime feature-table tabulate-seqs \
  --i-data rep-seqs.qza \
  --o-visualization rep-seqs.qzv

use.action(
    use.UsageAction(plugin_id='feature_table', action_id='summarize'),
    use.UsageInputs(table=feature_table, sample_metadata=sample_metadata),
    use.UsageOutputNames(visualization='table'),
)

use.action(
    use.UsageAction(plugin_id='feature_table', action_id='tabulate_seqs'),
    use.UsageInputs(data=feature_sequences),
    use.UsageOutputNames(visualization='rep_seqs'),
)

Using the qiime2 feature-table summarize tool:

Set “table” to #: feature-table.qza
Expand the additional options section
- For “sample_metadata”:
  - Press the + Insert sample_metadata button to set up the next steps.
    1. Leave as Metadata from TSV
    2. Set “Metadata Source” to sample-metadata.tsv
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 feature-table summarize [...] : visualization.qzv`	`table.qzv`

Using the qiime2 feature-table tabulate-seqs tool:

Set “data” to #: rep-seqs.qza
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 feature-table tabulate-seqs [...] : visualization.qzv`	`rep-seqs.qzv`

table.qzv | view | download
rep-seqs.qzv | view | download

Exercise 1

Which column or columns in the metadata could be used to identify samples that were included in the autoFMT study?

Solution to Exercise 1

Several columns contain this information, such as autoFmtGroup which contains the value “treatment” if the subject was in the treatment group, “control” if the subject was in the control group, and no value if the patient was not enrolled in this particular study.

Filter the feature table to the autoFMT study samples#

In this tutorial, we’re going to work specifically with samples that were included in the autoFMT randomized trial. We’ll now begin a series of filtering steps applied to both the feature table and the sequences to select only features and samples that are relevant to that study.

First, we’ll remove samples that are not part of the autoFMT study from the feature table. We identify these samples using the metadata. Specifically, this step filters samples that do not contain a value in the autoFmtGroup column in the metadata.

autofmt_table, = feature_table_actions.filter_samples(
    table=feature_table,
    metadata=sample_metadata_md,
    where='autoFmtGroup IS NOT NULL',
)

action_results <- feature_table_actions$filter_samples(
    table=feature_table,
    metadata=sample_metadata_md,
    where='autoFmtGroup IS NOT NULL',
)
autofmt_table <- action_results$filtered_table

qiime feature-table filter-samples \
  --i-table feature-table.qza \
  --m-metadata-file sample-metadata.tsv \
  --p-where 'autoFmtGroup IS NOT NULL' \
  --o-filtered-table autofmt-table.qza

autofmt_table, = use.action(
    use.UsageAction(plugin_id='feature_table', action_id='filter_samples'),
    use.UsageInputs(table=feature_table, metadata=sample_metadata,
                    where="autoFmtGroup IS NOT NULL"),
    use.UsageOutputNames(filtered_table='autofmt_table')
)

Using the qiime2 feature-table filter-samples tool:

Set “table” to #: feature-table.qza
Expand the additional options section
1. For “metadata”:
  - Press the + Insert metadata button to set up the next steps.
    1. Leave as Metadata from TSV
    2. Set “Metadata Source” to sample-metadata.tsv
2. Set “where” to autoFmtGroup IS NOT NULL
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 feature-table filter-samples [...] : filtered_table.qza`	`autofmt-table.qza`

autofmt-table.qza | view | download

We can now summarize the feature table again to observe how it changed as a result of this first filtering step.

autofmt_table_summ_viz, = feature_table_actions.summarize(
    table=autofmt_table,
    sample_metadata=sample_metadata_md,
)

action_results <- feature_table_actions$summarize(
    table=autofmt_table,
    sample_metadata=sample_metadata_md,
)
autofmt_table_summ_viz <- action_results$visualization

qiime feature-table summarize \
  --i-table autofmt-table.qza \
  --m-sample-metadata-file sample-metadata.tsv \
  --o-visualization autofmt-table-summ.qzv

use.action(
    use.UsageAction(plugin_id='feature_table', action_id='summarize'),
    use.UsageInputs(table=autofmt_table, sample_metadata=sample_metadata),
    use.UsageOutputNames(visualization='autofmt_table_summ'),
)

Using the qiime2 feature-table summarize tool:

Set “table” to #: autofmt-table.qza
Expand the additional options section
- For “sample_metadata”:
  - Press the + Insert sample_metadata button to set up the next steps.
    1. Leave as Metadata from TSV
    2. Set “Metadata Source” to sample-metadata.tsv
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 feature-table summarize [...] : visualization.qzv`	`autofmt-table-summ.qzv`

autofmt-table-summ.qzv | view | download

Exercise 2

How many samples and features are in this feature table after filtering? How does that compare to the feature table prior to filtering?

Solution to Exercise 2

Aftering filtering there are 556 samples and 4,256 features represented in the feature table.

Prior to filtering to just the autoFMT study, there were 12,546 samples and 17,865 features represented in the feature table.

Perform additional filtering steps on feature table#

Before we proceed with the analysis, we’ll apply a few more filtering steps.

First, we’re going to focus in on a specific window of time - mainly the ten days prior to the patients cell transplant through seventy days following the transplant. Some of the subjects in this study have very long-term microbiota data, but since many don’t it helps to just focus our analysis on the temporal range that is most relevant to this analysis.

filtered_table_1, = feature_table_actions.filter_samples(
    table=autofmt_table,
    metadata=sample_metadata_md,
    where='DayRelativeToNearestHCT BETWEEN -10 AND 70',
)

action_results <- feature_table_actions$filter_samples(
    table=autofmt_table,
    metadata=sample_metadata_md,
    where='DayRelativeToNearestHCT BETWEEN -10 AND 70',
)
filtered_table_1 <- action_results$filtered_table

qiime feature-table filter-samples \
  --i-table autofmt-table.qza \
  --m-metadata-file sample-metadata.tsv \
  --p-where 'DayRelativeToNearestHCT BETWEEN -10 AND 70' \
  --o-filtered-table filtered-table-1.qza

filtered_table_1, = use.action(
    use.UsageAction(plugin_id='feature_table', action_id='filter_samples'),
    use.UsageInputs(table=autofmt_table, metadata=sample_metadata,
                    where="DayRelativeToNearestHCT BETWEEN -10 AND 70"),
    use.UsageOutputNames(filtered_table='filtered_table_1')
)

Using the qiime2 feature-table filter-samples tool:

Set “table” to #: autofmt-table.qza
Expand the additional options section
1. For “metadata”:
  - Press the + Insert metadata button to set up the next steps.
    1. Leave as Metadata from TSV
    2. Set “Metadata Source” to sample-metadata.tsv
2. Set “where” to DayRelativeToNearestHCT BETWEEN -10 AND 70
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 feature-table filter-samples [...] : filtered_table.qza`	`filtered-table-1.qza`

filtered-table-1.qza | view | download

Finally, we’ll filter features from the feature table if they don’t occur in at least two samples. This filter is used here primarily to reduce the runtime of some of the downstream steps for the purpose of this tutorial. This filter isn’t necessary to run in your own analyses.

filtered_table_2, = feature_table_actions.filter_features(
    table=filtered_table_1,
    min_samples=2,
)

action_results <- feature_table_actions$filter_features(
    table=filtered_table_1,
    min_samples=2L,
)
filtered_table_2 <- action_results$filtered_table

qiime feature-table filter-features \
  --i-table filtered-table-1.qza \
  --p-min-samples 2 \
  --o-filtered-table filtered-table-2.qza

filtered_table_2, = use.action(
    use.UsageAction(plugin_id='feature_table', action_id='filter_features'),
    use.UsageInputs(table=filtered_table_1, min_samples=2),
    use.UsageOutputNames(filtered_table='filtered_table_2')
    )

Using the qiime2 feature-table filter-features tool:

Set “table” to #: filtered-table-1.qza
Expand the additional options section
- Set “min_samples” to 2
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 feature-table filter-features [...] : filtered_table.qza`	`filtered-table-2.qza`

filtered-table-2.qza | view | download

Exercise 3

Generate a summary of this latest filtered feature table on your own (expand this box for help if necessary). How many samples and features are in this feature table?

Solution to Exercise 3

filtered_table_2_summ_viz, = feature_table_actions.summarize(
    table=filtered_table_2,
    sample_metadata=sample_metadata_md,
)

action_results <- feature_table_actions$summarize(
    table=filtered_table_2,
    sample_metadata=sample_metadata_md,
)
filtered_table_2_summ_viz <- action_results$visualization

qiime feature-table summarize \
  --i-table filtered-table-2.qza \
  --m-sample-metadata-file sample-metadata.tsv \
  --o-visualization filtered-table-2-summ.qzv

use.action(
    use.UsageAction(plugin_id='feature_table', action_id='summarize'),
    use.UsageInputs(table=filtered_table_2, sample_metadata=sample_metadata),
    use.UsageOutputNames(visualization='filtered_table_2_summ'),
)

Using the qiime2 feature-table summarize tool:

Set “table” to #: filtered-table-2.qza
Expand the additional options section
- For “sample_metadata”:
  - Press the + Insert sample_metadata button to set up the next steps.
    1. Leave as Metadata from TSV
    2. Set “Metadata Source” to sample-metadata.tsv
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 feature-table summarize [...] : visualization.qzv`	`filtered-table-2-summ.qzv`

filtered-table-2-summ.qzv | view | download

The final feature table resulting from this series of steps contains 406 samples and 2,458 features. Approximately 23,000,000 sequences are represented in this feature table.

Filter features from sequence data to reduce runtime of feature annotation#

At this point, we have filtered features from our feature table, but those features are still present in our sequence data. In the next section we’ll be performing some computationally expensive operations on these sequences, so to make those go quicker we’ll next filter all features that are no longer in our feature table from our collection of feature sequences.

filtered_sequences_1, = feature_table_actions.filter_seqs(
    data=rep_seqs,
    table=filtered_table_2,
)

action_results <- feature_table_actions$filter_seqs(
    data=rep_seqs,
    table=filtered_table_2,
)
filtered_sequences_1 <- action_results$filtered_data

qiime feature-table filter-seqs \
  --i-data rep-seqs.qza \
  --i-table filtered-table-2.qza \
  --o-filtered-data filtered-sequences-1.qza

filtered_sequences_1, = use.action(
    use.UsageAction(plugin_id='feature_table', action_id='filter_seqs'),
    use.UsageInputs(data=feature_sequences, table=filtered_table_2),
    use.UsageOutputNames(filtered_data='filtered_sequences_1')
    )

Using the qiime2 feature-table filter-seqs tool:

Set “data” to #: rep-seqs.qza
Expand the additional options section
- Set “table” to #: filtered-table-2.qza
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	“Name” to set (be sure to press `Save`)
`#: qiime2 feature-table filter-seqs [...] : filtered_data.qza`	`filtered-sequences-1.qza`

filtered-sequences-1.qza | view | download

Filtering feature tables

Contents

Filtering feature tables#

Access the data#

View the metadata#

Generate summaries of full table and sequence data#

Filter the feature table to the autoFMT study samples#

Perform additional filtering steps on feature table#

Filter features from sequence data to reduce runtime of feature annotation#