Currently GENCODE contains the largest number of transcripts per loci in any publically available genebuild. We believe it is of critical importance that researchers are made aware of all transcripts generated from a gene, as this is the logical starting point to begin deciphering the functionality of the locus. Furthermore, knowledge of transcriptional complexity can completely change our interpretation of experimental findings generated from a locus. As an example, consider our annotation of the ZNF193 locus:
The locus currently contains 5 coding transcripts; the first three of which contain the same CDS (truncated in the case of 1) linked to different 5’ UTRs. Transcript 4 contains an additional ‘cassette’ exon; this exon is in fact rather interesting, since it contains a variant site identified by the 1000 genomes project (indicated by *). This is termed a ‘loss of function’ (LoF) variant since it will break the CDS in which it is found. In this instance, the polymorphism creates a premature termination codon that is predicted to make the transcript a target for the Nonsense-Mediated Decay (NMD) degradation pathway. This means that genomes harbouring this variant will not make a protein corresponding to the CDS of transcript 4.
Note, however, that the CDS of the other transcripts are unaffected. In other words, we should consider LoF as an attribute not of a gene, but of a particular transcript within that gene. Additional 1000 genomes LoF SNPs can be found in our Science paper. If you wish to know how we can be sure that an alternative transcript containing a LoF variant is truly functional and not simply transcriptional ‘noise’, stay tuned for a future post.