Insulin is a vertebrate hormone that is involved in the regulation of carbohydrate and fat metabolism. Disruptions to insulin production and processing can have severe pathological consequences in mammals, the most common being the onset of diabetes mellitus. This metabolic process has been intensely studied over the last century, and our understanding of human diabetes has depended greatly on the use of model organisms.
However, there are also interesting cross-species differences between the genes involved in this process. Here we focus on one involving human and mouse read-through transcripts.
The insulin and insulin-like growth factor 2 genes play a central role in regulating the concentration of blood glucose levels, and they are located in neighboring positions on human chromosome 11.
Insulin encodes a protein that is cleaved to generate two active peptides (insulin 2A and 2B) that form heterodimers which regulate blood glucose concentration; while insulin-like growth factor 2 encodes a protein that is processed to give rise to an active peptide, preptin. Preptin is co-secreted with Insulin and is regulated by glucose levels that, in turn, stimulate further insulin secretion, amplifying the effect of increased glucose levels. Furthermore, the insulin and insulin-like growth factor 2 genes are confirmed to be co-regulated.
Our GENCODE dataset contains four read-through transcripts – defined as transcripts that incorporate exons from multiple genes – that are supported by species-specific transcriptional evidence (Figure 1): one has been annotated as protein coding, one a target for the Nonsense Mediated Decay (NMD) pathway (due to an internal STOP codon that is >50 bp from the downstream splice donor site) while the other two cannot be confidently assigned a CDS.
Figure 1: Human insulin, insulin-like growth factor 2 and associated read-through transcripts.
The relative position and order of the mouse orthologs are conserved compared to human and located on chromosome 7, and again transcriptional evidence supports the annotation of read-through transcripts that span both loci (Figure 2): one protein coding and the other a target for the NMD pathway. However, there is a major difference between the CDS of the human and mouse protein coding read-through transcripts.
Figure 2: Mouse orthologs of human insulin and insulin-like growth factor 2 shown in Figure 1
The CDS of the mouse protein coding read-through transcript has been confirmed by high-confidence proteomic data and, at 371 amino acids, it is significantly longer than the 201 amino acid read-through CDS of the human locus.
The validated mouse read-through protein contains both insulin 2B and preptin but not the insulin 2A peptide. Whether the mouse read-through protein is functional and can be processed to generate these peptides remains to be determined, but it raises the possibility of an ultimate form of co-regulation of these two products, expressed in a single protein precursor. The human read-through protein, on the other hand, does not contain these domains; therefore this level of co-regulation is only viable in the mouse genome, raising the possibility of a significant difference between the two species in regulating insulin synthesis.
These genes are only described in such high detail within the manually annotated GENCODE datasets, allowing investigators the opportunity to better understand the complex transcriptional profile at these loci.