Splicing is well known to be an error-prone process, enabling the genome to generate new splice sites and, therefore, new transcript models. However, the canonical ‘GT’ donor and ‘AG’ acceptor sites are only rarely deviated from and, even more rarely, are such events shown to be the driving force behind opposing functions of isoforms from a single locus; however, CD99 provides an interesting example of just that…
In the late 90’s, Hahn and co-workers reported that two separate isoforms of the signalling protein CD99 have opposing effects on the process of adhesion: the longer isoform (CD99wt) promotes adhesion, whilst the shorter isoform (CD99sh) is strongly inhibitory. CD99sh has a partially deleted intracellular domain when compared with the longer isoform, and Hahn et al. hypothesised that two amino acids in the middle of this domain, serine168 and threonine181, are the specific residues that modulate the adhesion properties of the proteins.
This hypothesis was further investigated by Scotlandi and co-workers who demonstrated that serine168 did indeed act as a phosphorylation target. They examined the effects of CD99sh and CD99wt on tumour malignancy and metastasis in prostate cancer and osteosarcoma cells and, again, the two isoforms of CD99 were found to have opposing effects.
The annotation of the CD99 gene is extremely interesting: the GENCODE datasets represent CD99sh and CD99wt as well as three other novel transcripts with protein coding potential isoforms and four transcripts that could not be assigned a CDS at the locus (Figure 1). CD99wt and CD99sh utilise different terminating codons and the smaller of the two includes a short, 51 bp, exon (circled in Figure1). CD99sh has been experimentally validated via RT-PCR and the donor splice site of this small exon is a non-canonical ‘GG’ that, although rarely used, is additionally supported by two independent EST sequences.
Figure 1: CD99 locus. CD99wt and CD99sh are highlighted, as is the 51bp non-canonical exon, unique to the smaller isoform.
We at GencodeGenes subject all examples of non-canonical splicing to extensive manual investigation, permitting the inclusion of such only when supported by at least one of the following:
- published support for the splice donor and acceptor
- evidence of a U12 intron (which may be slightly more permissive of splice junctions that would normally be defined as non-canonical)
- conservation of the splice site in other species
- mRNA editing
- presence of a genomic sequencing error or an SNP
As published experimental evidence exists for CD99sh, an annotation remark is made which links the transcript to that information and specifically references the supporting publication of Hahn and co-workers.
However, should no published evidence exist to support this non-canonical splice site, we would still be able to investigate the isoform on the basis of conservation shown by the GG donor in CD99 orthologs. The UCSC Genome browser is a very efficient method to view the extensive data generated by the ENCODE project (including the GENCODE datasets) and the 51 bp exon found in CD99sh, and its non-canonical GG donor, is conserved in human, chimp, orangutan, rhesus and baboon; but not in non-primates, evidencing the recent creation of this novel exon and splice junction(Figure 2).
Figure 2: UCSC screenshot showing conservation of the non-canonical splice donor site utilised by CD99sh and its orthologs.
Therefore, CD99 is an interesting example where alternatively spliced isoforms have opposing functions on a single cellular process. Only the GENCODE datasets, utilising manual sequence annotation in conjunction with experimental evidence, conservation and literature review are able to capture this complexity in such detail and relay it directly to researchers.