Following the recent update from INSDC on the Phases of Implementation for the new INSDC standards being introduced for spatiotemporal metadata, INDSC now releases the formatting details for reporting missing metadata.
Mandatory spatiotemporal data will be captured in pre-existing fields. For sequence flat files, the data will be captured in the source qualifiers: ‘country’ and ‘collection_date’; for BioSamples the data will be captured in country and collection date attributes. BioSample fields, implementation and tooling may differ between partners and the INSDC partners may follow this announcement with individual statements about implementation.
Minimum reporting requirements for these fields are as follows, though further granularity is encouraged:
Location of collection: the locality of isolation of the sequenced sample should be indicated to country level at least and should be provided in terms of political names for nations, oceans or seas using values from the controlled vocabulary at http://www.insdc.org/documents/country-qualifier-vocabulary
Date/time of collection: the date and time at which the specimen was collected should be provided, at least to the nearest year.
Where these can not be provided, please see updated guidelines for reporting INSDC missing values which are now usable for sample registration.
Below are a reminder of the timeline of phases when the new standards will be put into effect:
Phase I – new standard in place for BioSamples by the end of May 2023
It will become mandatory to provide country and collection date metadata for all new registered BioSamples associated with INSDC data following this date unless a valid exemption is declared. As a result, all new raw (SRA/ENA/DRA) data and genomes will have associated spatiotemporal metadata in the BioSample.
Phase II – new standard in place for sequences by the end of Dec 2024
It will become mandatory to provide country and collection date metadata for all newly submitted sequence records through any remaining submission routes within 2 years, this includes sequences submitted without BioSample references.
We continue to thank users for their feedback and encourage further feedback. Please provide your feedback to the INSDC member database to which you normally submit:
DDBJ: please email firstname.lastname@example.org
ENA (EMBL-EBI): please email email@example.com
GenBank and SRA (NCBI): please email firstname.lastname@example.org