TPA Submission Guidelines

Third Party Annotation (TPA) entries are submitted to the International Nucleotide Sequence Databases as part of the process of publishing biological studies that include the annotation of existing nucleotide sequences in the primary sequence database. Publicly accessible TPA records are therefore linked to a publication or publications that document that the data are supported by peer-reviewed biological evidence.

In order to draw a distinction between annotation supported by wet-lab. experimental evidence and inferred annotation, the TPA dataset is divided into TPA:experimental and TPA:inferential.

  • TPA:experimental contains only entries where the annotations presented are supported by peer-reviewed wet-lab. experimental evidence.
  • Sequences annotated by inference (where the source molecule or its product(s) have not been the subject of direct experimentation), are accommodated in TPA:inferential
  • Constructed genomes where no experimental evidence is presented (in TPA:inferential) are permitted to include only annotation relating to genes of known function (as opposed to hypothetical proteins, for example).
  • Entries containing annotation that has not resulted from peer-reviewed in vivo, in vitro or in silico experimentation are not accepted in TPA. The outputs of computational tools, feature identification algorithms and homology search tools alone, are not sufficient evidence for TPA.

Below is a list of typical TPA entry types and the tier to which they will be directed. Please note that this list is not exhaustive.

Record Type TPA Tier Description
1 Experimental CDS and related annotation applied to a sequence derived from existing genomic, EST and/or mRNA primary records with wet laboratory experimental evidence for existence of at least part of the transcript (eg. RT-PCR, Northern)
2 Experimental CDS and related annotation applied to a sequence derived from existing genomic, EST and/or mRNA primary records, in addition to novel sequencing, with wet laboratory experimental evidence for existence of at least part of the transcript (eg. RT-PCR, Northern)
3 Experimental CDS and related annotation applied to a sequence derived from existing genomic, EST and/or mRNA primary records with experimental evidence for the presence of the product (eg. antibody staining, biochemical assay)
4 Experimental CDS and related annotation applied to a sequence derived from existing genomic, EST and/or mRNA primary records, in addition to novel sequencing, with experimental evidence for the presence of the product (eg. antibody staining, biochemical assay)
5 Experimental Re-assignment of the product name and/or function of a coding gene where there is no change to existing annotated exon, mRNA and CDS locations and wet laboratory experimental evidence is presented
6 Experimental Annotation of non-coding transcripts, such as antisense regulators, with wet laboratory experimental evidence for their existence and/or function
7 Experimental Annotation of repeat features in association with transposon, retrotransposon, integron, iteron and insertion sequences with wet laboratory experimental evidence.
8 Experimental Annotation of functional RNA genes, such as tRNAs, scRNAs, etc. with wet laboratory experimental evidence
9 Experimental A record submitted as part of a collection of annotated members of a gene family, where wet laboratory experimental evidence exists for the annotation.
10 Inferential CDS and related annotation applied to a sequence derived from existing genomic, EST and/or mRNA primary records with reported wet laboratory experimental evidence for a homologous molecule, but no direct wet laboratory experimental evidence. The reported experimental evidence must have been generated by the submission group and must comply with TPA requirements for peer review.
11 Inferential CDS and related annotation applied to a sequence derived from existing genomic, EST and/or mRNA primary records, in addition to novel sequencing, with no wet laboratory experimental evidence. If novel sequence is used to bridge two pieces of sequence, experimental evidence for a homologous molecule should exist.
12 Inferential Record of sequence and annotation concepts covered in a review paper or discussion section, where wet laboratory experimental evidence is reported, but not generated by the TPA submitter
13 Inferential Annotation of non-coding genes and transcripts with no wet laboratory experimental evidence for their existence and/or function, when submitted as part of a collection of sequences with experimental evidence for at least one member of the collection
14 Inferential Annotation of pseudogenes with no wet laboratory experimental evidence, when submitted as part of a study that includes TPA records of functional homologues of the pseudogene
15 Inferential Annotation of pseudogenes that are not part of a study for which there exists experimental evidence
16 Inferential A record submitted as part of a collection of annotated members of a gene family, where wet laboratory experimental evidence does not exist for the annotation. One or more other members of the set should have experimental evidence and should have been submitted to TPA:experimental or to the INSDC primary database.
17 Inferential A record representing a completely sequenced genome, or completely sequenced naturally occurring extrachromosomal element, comprising features, most of which have assigned gene symbols or product identifiers, where the annotated features may be a mix of experimentally and inferentially determined data.

Below is a list of entry types that are not suitable for inclusion in the TPA dataset. Please note that this list is not exhaustive.

Record Type TPA Tier Description
A Not accepted Annotation of repeat (and no other) features
B Not accepted Annotation that has arisen from an automated tool, such as GeneMark, tRNA scan or ORF finder, where no further evidence, experimental or otherwise, is presented for the annotation. The annotation in these cases has not been the subject of the peer review of the publication.
C Not accepted A record representing a completely sequenced genome including only features that have not been assigned gene symbols or product identifiers, for which none has wet laboratory experimental evidence.