Create Sample Metadata#
import tempfile
import requests
import qiime2
data = requests.get("https://www.dropbox.com/s/aojvmbuxp5jst1q/tblASVsamples.csv?dl=1")
with tempfile.NamedTemporaryFile() as f:
f.write(data.content)
f.flush()
pd_metadata_samples = pd.read_csv(f.name, index_col='SampleID')
pd_metadata_samples
PatientID | Timepoint | Consistency | Accession | BioProject | DayRelativeToNearestHCT | AccessionShotgun | |
---|---|---|---|---|---|---|---|
SampleID | |||||||
1000A | 1000 | 0 | formed | SRR11414397 | PRJNA545312 | -9.0 | NaN |
1000B | 1000 | 5 | liquid | SRR11414992 | PRJNA545312 | -4.0 | NaN |
1000C | 1000 | 15 | liquid | SRR11414991 | PRJNA545312 | 6.0 | NaN |
1000D | 1000 | 18 | semi-formed | SRR11414990 | PRJNA545312 | 9.0 | NaN |
1000E | 1000 | 22 | formed | SRR11414989 | PRJNA545312 | 13.0 | NaN |
... | ... | ... | ... | ... | ... | ... | ... |
FMT.0251G | FMT.0251 | 8 | semi-formed | SRR9270380 | PRJNA548153 | 7.0 | NaN |
FMT.0251H | FMT.0251 | 7 | semi-formed | SRR11396690 | PRJNA607574 | 6.0 | NaN |
FMT.0251I | FMT.0251 | 12 | semi-formed | SRR9270379 | PRJNA548153 | 11.0 | NaN |
FMT.0251J | FMT.0251 | 13 | semi-formed | SRR9270382 | PRJNA548153 | 12.0 | NaN |
FMT.0251L | FMT.0251 | 15 | semi-formed | SRR9270381 | PRJNA548153 | 14.0 | NaN |
12546 rows × 7 columns
q2_metadata = qiime2.Metadata(pd_metadata_samples)
q2_metadata
Metadata
--------
12546 IDs x 7 columns
PatientID: ColumnProperties(type='categorical')
Timepoint: ColumnProperties(type='numeric')
Consistency: ColumnProperties(type='categorical')
Accession: ColumnProperties(type='categorical')
BioProject: ColumnProperties(type='categorical')
DayRelativeToNearestHCT: ColumnProperties(type='numeric')
AccessionShotgun: ColumnProperties(type='categorical')
Call to_dataframe() for a tabular representation.
q2_metadata.save('sample_metadata_simple.tsv')
'tutorial_out/sample_metadata_simple.tsv'
Patient Metadata#
This metadata represent events which are unrelated to the samples. Care should be taken to identify and encode this in a sample-wise fashion which is consistent.
Additional description of the columns has been provided below each table.
data = requests.get("https://www.dropbox.com/s/yxv2x00z9fi0w2l/tblInfectionsCidPapers.csv?dl=1")
with tempfile.NamedTemporaryFile() as f:
f.write(data.content)
f.flush()
pd_metadata_infections = pd.read_csv(f.name, index_col='PatientID')
pd_metadata_infections
Timepoint | InfectiousAgent | DayRelativeToNearestHCT | |
---|---|---|---|
PatientID | |||
1000 | 213 | Enterococcus_Faecium | 204.0 |
1003 | 1046 | Enterococcus_Faecium_Vancomycin_Resistant | 1049.0 |
1010 | -58 | Escherichia | -61.0 |
1015 | 19 | Enterococcus_Faecium_Vancomycin_Resistant | 16.0 |
1015 | 20 | Enterococcus_Faecium_Vancomycin_Resistant | 17.0 |
... | ... | ... | ... |
pt_with_samples_2019_760 | -1 | Klebsiella_Pneumoniae | -58.0 |
pt_with_samples_2019_760 | 5 | Enterococcus_Faecium | -52.0 |
pt_with_samples_2021_897 | 0 | Escherichia | -134.0 |
pt_with_samples_2021_897 | 0 | Klebsiella_Pneumoniae | -134.0 |
pt_with_samples_642_643 | 43 | Klebsiella_Pneumoniae | 83.0 |
1231 rows × 3 columns
Day of positive blood cultures for 426 patients (include only microbes analyzed in 1. Taur, Y., et al. 2012. Intestinal domination and the risk of bacteremia in patients undergoing allogeneic hematopoietic stem cell transplantation. Clinical infectious diseases, 55(7), pp.905-914; 2. Stoma, I., et al. 2020. Compositional flux within the intestinal microbiota and risk for bloodstream infection with gram-negative bacteria. Clinical Infectious Diseases.)
PatientID: deidentified identifier of patients
Timepoint: deidentified day of infection
InfectiousAgent: the bacteria causing infections
DayRelativeToNearestHCT: day of infection relative to the nearest day of bone marrow transplant
data = requests.get("https://www.dropbox.com/s/nfb1h7kkkx8sqp1/tbltemperature.csv?dl=1")
with tempfile.NamedTemporaryFile() as f:
f.write(data.content)
f.flush()
pd_metadata_temps = pd.read_csv(f.name, index_col='PatientID', low_memory=False)
pd_metadata_temps
Timepoint | MaxTemperature | DayRelativeToNearestHCT | |
---|---|---|---|
PatientID | |||
1000 | -462 | 98.4 | -471.0 |
1000 | -427 | 98.4 | -436.0 |
1000 | -399 | 98.0 | -408.0 |
1000 | -371 | 98.0 | -380.0 |
1000 | -343 | 98.0 | -352.0 |
... | ... | ... | ... |
pt_with_samples_833_883 | 1534 | 98.2 | 1331.0 |
pt_with_samples_833_883 | 1548 | 99.0 | 1345.0 |
pt_with_samples_833_883 | 1576 | 97.9 | 1373.0 |
pt_with_samples_833_883 | 1604 | 99.0 | 1401.0 |
pt_with_samples_833_883 | 1632 | 97.9 | 1429.0 |
202579 rows × 3 columns
temperatures for 1,249 patients
PatientID: deidentified identifier of patients
Timepoint: deidentified day when patient temperature was measured
MaxTemperature: Maximum temperature (unit: Fahrenheit) recorded on that day for that patient
DayRelativeToNearestHCT: day of temperature measurement relative to the nearest day of bone marrow transplant
data = requests.get("https://www.dropbox.com/s/j277dv6lrqz7hfv/tblVanA.csv?dl=1")
with tempfile.NamedTemporaryFile() as f:
f.write(data.content)
f.flush()
pd_metadata_van_a = pd.read_csv(f.name, index_col='SampleID')
pd_metadata_van_a
VanA | |
---|---|
SampleID | |
1015P | 0 |
1015Q | 0 |
1015T | 0 |
1015U | 1 |
1015V | 1 |
... | ... |
FMT.0251G | 0 |
FMT.0251H | 0 |
FMT.0251I | 0 |
FMT.0251J | 0 |
FMT.0251L | 0 |
7547 rows × 1 columns
Results of PCR detection for vanA gene for 7,547 samples
SampleID: stool sample identifier
VanA: whether vanA gene is detected in the sample
data = requests.get("https://www.dropbox.com/s/066lxgvx16wsmqf/tbldrug.csv?dl=1")
with tempfile.NamedTemporaryFile() as f:
f.write(data.content)
f.flush()
pd_metadata_drug = pd.read_csv(f.name, index_col='PatientID', low_memory=False)
pd_metadata_drug
StartTimepoint | StopTimepoint | Factor | Category | Route | StartDayRelativeToNearestHCT | StopDayRelativeToNearestHCT | |
---|---|---|---|---|---|---|---|
PatientID | |||||||
1000 | -160 | -160 | ciprofloxacin | quinolones | intravenous | -169 | -169 |
1000 | -160 | -160 | fluconazole | antifungals | intravenous | -169 | -169 |
1000 | -151 | -151 | aztreonam | miscellaneous antibiotics | intravenous | -160 | -160 |
1000 | -151 | -151 | vancomycin | glycopeptide antibiotics | intravenous | -160 | -160 |
1000 | -150 | -150 | aztreonam | miscellaneous antibiotics | intravenous | -159 | -159 |
... | ... | ... | ... | ... | ... | ... | ... |
pt_with_samples_833_883 | 1336 | 1339 | azithromycin | macrolide derivatives | intravenous | 1133 | 1136 |
pt_with_samples_833_883 | 1336 | 1342 | posaconazole | antifungals | oral | 1133 | 1139 |
pt_with_samples_833_883 | 1337 | 1337 | vancomycin | glycopeptide antibiotics | intravenous | 1134 | 1134 |
pt_with_samples_833_883 | 1338 | 1338 | vancomycin | glycopeptide antibiotics | intravenous | 1135 | 1135 |
pt_with_samples_833_883 | 1339 | 1340 | vancomycin | glycopeptide antibiotics | intravenous | 1136 | 1137 |
80731 rows × 7 columns
Timing and route of drug administration for 1,279 patients
PatientID: deidentified identifier of patients
StartTimepoint: deidentified day when drug administration started
StopTimepoint: deidentified day when drug administration stopped (including the day)
Factor: name of the drug
Category: category of the drug
Route: route of drug administration
StartDayRelativeToNearestHCT/StopDayRelativeToNearestHCT: start/stop day of the drug administration relative to the nearest day of bone marrow transplant
data = requests.get("https://www.dropbox.com/s/ksee4q7x1c1oq99/tblhctmeta.csv?dl=1")
with tempfile.NamedTemporaryFile() as f:
f.write(data.content)
f.flush()
pd_metadata_transplant = pd.read_csv(f.name, index_col='PatientID')
pd_metadata_transplant
TimepointOfTransplant | HCTSource | Disease | wbcPatientId | autoFmtPatientId | nejmPatientId | |
---|---|---|---|---|---|---|
PatientID | ||||||
FMT.0161 | 30 | TCD | Multiple Myeloma | 000f9b9617d476abf1f143 | NaN | 1 |
667 | 8 | PBSC_unmodified | Leukemia | 001f938eeec58c18a4604a | 491 | 1 |
1277 | 5 | PBSC_unmodified | Leukemia | 0079d6c0a49b8b6c3daf83 | 748 | 1 |
464 | 8 | BM_unmodified | Leukemia | 00a7221374f597b954d09f | 342 | 1 |
420 | 6 | cord | Non-Hodgkin's Lymphoma | 00d7a5d77e1a5f7a9d3f3c | 218 | 1 |
... | ... | ... | ... | ... | ... | ... |
pt_with_samples_1105_1106_1107_1108 | 6 | cord | Leukemia | ff493eac17bd83d1a2c57c | NaN | 1 |
1759 | 8 | cord | Leukemia | ff650b2e1faab2a3fbcfc9 | NaN | 0 |
559 | -5 | TCD | Leukemia | ffcf59d52566ac7729521f | 458 | 1 |
140 | -236 | PBSC_unmodified | Hodgkin's Disease | ffe9adf3d8b0ab843ff29e | NaN | 1 |
1965 | 3 | cord | Leukemia | ffe9f50b4d9ae12e3ae671 | NaN | 0 |
1346 rows × 6 columns
Day and source of hematopoietic cell transplant (HCT) for 1,278 patients
PatientID: deidentified identifier of patients
TimepointOfTransplant: deidentified day of HCT
HCTSource: hematopoietic cell sources for HCT patients (BM_unmodified: bone marrow; PBSC_unmodified: peripheral blood stem cells; TCD: T-cell depleted; cord: cord blood)
Disease: disease of patients
wbcPatientId, autoFmtPatientId, nejmPatientId: identifiers for the same patients if they were also included in another previous study (wbcPatientId: Schluter, J. et al. 2019. The gut microbiota influences circulatory immune cell dynamics in humans. BioRxiv; autoFmtPatientId: Taur, Y. et al. 2018. Reconstitution of the gut microbiota of antibiotic-treated patients by autologous fecal microbiota transplant. Science Translational Medicine 10(460); nejmPatientId: Peled, J.U. et al. 2020. Microbiota as Predictor of Mortality in Allogeneic Hematopoietic-Cell Transplantation. New England Journal of Medicine, 382(9), pp.822-834.)
data = requests.get("https://www.dropbox.com/s/wo5c6i4kp79nob8/tblwbc.csv?dl=1")
with tempfile.NamedTemporaryFile() as f:
f.write(data.content)
f.flush()
pd_metadata_wbc = pd.read_csv(f.name, index_col='PatientID', low_memory=False)
pd_metadata_wbc
Timepoint | BloodCellType | Value | DayRelativeToNearestHCT | |
---|---|---|---|---|
PatientID | ||||
1001 | 7 | WBCtotal | <0.1 | 4 |
1001 | 8 | WBCtotal | <0.1 | 5 |
1001 | 9 | WBCtotal | <0.1 | 6 |
1001 | 11 | WBCtotal | 0 | 8 |
1001 | 12 | WBCtotal | <0.1 | 9 |
... | ... | ... | ... | ... |
pt_with_samples_1933_1993 | 15 | Lymphocytes | NaN | 7 |
pt_with_samples_1933_1993 | 16 | Lymphocytes | NaN | 8 |
pt_with_samples_1933_1993 | 17 | Lymphocytes | NaN | 9 |
2070 | 9 | Lymphocytes | NaN | 10 |
2070 | 10 | Lymphocytes | NaN | 11 |
220835 rows × 4 columns
PatientID: deidentified ID of patients
Timepoint: deidentified day of blood cell measurement
BloodCellType: lymphocyte cells (Lymphocytes), neutrophil cells (Neutrophils), and total white blood cells (WBCtotal)
Value: blood cell counts (unit: 1,000 cells/uL)
DayRelativeToNearestHCT: day of blood cell measurement relative to the nearest day of bone marrow transplant