Sequence Manifest Data Dictionary

Attachment-03_Component-1_ref-CDC_seq_manifest_data_dict.xlsx

[NCEZID] National Wastewater Surveillance System for COVID-19 and other Infectious Disease Targets of Public Health Concern

Sequence Manifest Data Dictionary

OMB: 0920-1422

Document [xlsx]
Download: xlsx | pdf

Overview

seq_metadata
seq_value_sets


Sheet 1: seq_metadata

Field Name Data Type Description Value Set Units Dependent Fields
wwtp_name string The name of the Wastewater Treatment Plant (WWTP), or the name of the septic or other treatment system to waswetwater where sample was collected, the wwtp name must match the testing data name [string, length less than or equal to 40 characters] [none] None
sample_id unique sample id (a string 20 characters or less, containing only numbers, English alphabetic characters, underscores, and hyphens; white space is not allowed; case insensitive) An uniqe ID assigned to a wastewater sample selected for sequencing. It must be unique for this NWSS reporting jurisdiction. Wastewater samples that are split between mutiple labs should have the same sample ID but different lab IDs. Wastewater samples for which multiple SARS-CoV-2 PCR targets are measured should also have the same sample ID. Note: for pooled samples a new sample_id will be created and the original sample_ids that consistute the pool will be listed in samples_in_pool [sample id] [none] None
pooled category Was this sample pooling before sequening? vs_yn [none] None
samples_in_pool string (comma-separated list) If sample is pooled, comma-separated list of the sample_id 's that were pooled; if not pooled write NA. Pooled samples must be from the same WWTP and sampled within 7 days of each other. [string];
[empty]
[none] If 'pooled' is "yes", then this must have a non-empty value
sample_collect_date date ([yyyy]-[mm]-[dd]) or comma-separated list of dates The date of sample collection; for composite samples, specify the date on which sample collection began. For pooled samples, a comma-separated list of sample collection dates for all consistuent samples (corresponding to the order of samples_in_pool) [date not after tomorrow's date] (or list thereof) [none] None
received_by_lab_date date ([yyyy]-[mm]-[dd]) or comma-separated list of dates The date sample arrived in Biobot's laboratory. For pooled samples, a comma-separated list of arrival dates for all consistuent samples (corresponding to the order of samples_in_pool) [date not after tomorrow's date] (or list thereof) [none] None
selected_for_sequencing category Was this sample selected for sequencing (Yes or No)? vs_ynp [none] None
reason_not_sequenced category If sample was not sequenced, succinctly indicate why? vs_reason_not_sequenced [none] If 'selected for sequencing' is "no", then this must have a non-empty value
date_sent_seq date ([yyyy]-[mm]-[dd]) Date sample was sent to Biobot's sequencing vendor [date not after tomorrow's date]; [empty] [none] None
seq_run_type category Choose one of the following: standard sequencing, re-run due to low coverage, method validation vs_seq_run_type [none] If 'date_sent_seq' has a non-empty value, then this must have a non-empty value
major_seq_method integer A number used to distinguish major sequencing methods [greater than or equal to 0];
[empty]
[none] If 'date_sent_seq' has a non-empty value, then this must have a non-empty value
major_seq_method_desc string Description of sequencing method [string];
[empty]
[none] If 'date_sent_seq' has a non-empty value, then this must have a non-empty value
genome_coverage float % of SARS-CoV-2 genome covered at 10x or more [0 to 100]; [empty] [none] If 'selected_for_deposition' has a non-empty value, then this must also have a non-empty value
total_raw_reads integer Number of total sequencing reads [0 or greater]; [empty] [none] If 'selected_for_deposition' has a non-empty value, then this must also have a non-empty value
coverage_above_thresh category Did the sequencing meet the minimum QC criteria (Yes or No)? (currently 20% of SARS-CoV-2 genome covered at >= 10x, minimum of 20,000 total raw reads) vs_yne [none] If 'selected_for_deposition' has a non-empty value, then this must also have a non-empty value
selected_for_deposition category Was the sequencing run selected for deposition in NCBI? (this will generally be "yes" if 'coverage_above_thresh' is "yes"; if 'coverage_above_thresh' is no a run may still be deposited) vs_yne [none] None
date_deposited date ([yyyy]-[mm]-[dd]) Date the sample was submitted for deposition to the NCBI repository (or other CDC-specified repository) [date not after tomorrow's date]; [empty] [none] Empty if selected_for_deposition is "no"
date_deposition_accepted date ([yyyy]-[mm]-[dd]) Date the deposition was accepted to the NCBI repository and went live [date not after tomorrow's date]; [empty] [none] Empty if selected_for_deposition is "no"
sra_accession string Accession number for SRA experiment [string, length less than or equal to 40 characters]; [empty] [none] If 'date_deposition_accepted is non-empty, then this must also have a non-empty value
biosample_accession string Biosample ID from NCBI [string, length less than or equal to 40 characters]; [empty] [none] If 'date_deposition_accepted' is non-empty, then this must also have a non-empty value
seq_vendor string Vendor that performed the sequencing [string]; [empty] [none] If 'date_sent_seq' has a non-empty value, then this must have a non-empty value
pcr_target_avg_conc float Concentration of the PCR target back-calculated to unconcentrated sample basis (from NWSS). This will be the N1 concentration. [any positive float other than 0];
0 (if no amplification observed)
[units specified in 'pcr_target_units' in NWSS data dictionary] None
sequencing_run_id integer Numeric ID to distinguish multiple sequencing runs of the same biological sample [0 or greater] [none] If 'date_sent_seq' has a non-empty value, then this must have a non-empty value
addl_seq_method_notes string Additional details on sequencing methodology, as needed (e.g., minor changes too small to be included in 'major_lab_method' or 'major_seq_method') [string of any length--free text description] [none] None
major_lab_method integer A number used to distinguish major lab methods at the reporting jurisdiction level. [greater than or equal to 0] [none] None

Sheet 2: seq_value_sets

Note that we may add to these value sets as we test additional sequencing protocols, etc





This includes only new value sets (for fields from NWSS, we use the NWSS values sets)





vs_reason_not_sequenced Description
vs_seq_run_type
vs_ynp Description
ANOTHER_SAMPLE_SELECTED Another sample from this same location/week was selected for sequencing instead (generally due to higher SARS-CoV-2 concentration)
STANDARD standard sequencing of samples, run each week, using ~20ul of RNA from the same RNA extraction that goes into our qPCR YES selected for sequencing
NOT_DETECTED SARS-CoV-2 was not detected in this sample
RERUN re-run of a sample that had low Ct but insufficient coverage. Generally re-extracted from larger volume of wastewater NO not selected for sequencing
CT_ABOVE_THRESHOLD SARS-CoV-2 was detected in this sample, but at too high a Ct for successful sequencing
METHOD_VALIDATION method validation run--experimental POOLED pooled with another sample before sequencing
TRIBAL_TERRITORY_OPTOUT This location is from a tribal nation/territory that has opted out of sequencing
[empty] No sequencing run

SAMPLE_SELECTION_ERROR Due to lab error, we were unable to send this sample for sequencing




File Typeapplication/vnd.openxmlformats-officedocument.spreadsheetml.sheet
File Modified0000-00-00
File Created0000-00-00

© 2024 OMB.report | Privacy Policy