Processed Pseudogenes – Junk or Gene?

Processed pseudogenes arise by the integration of cDNA copies of the parent gene at a new locus in the genome (Figure 1). There are 8,426 processed pseudogenes in the current release of GENCODE datasets, but are any of these functional?







Figure 1: Integration of a processed pseudogene. The processed pseudogene (at the bottom) has arisen by transcription of the parent gene (at the top), post-transcriptional processing of this transcript, reverse-transcription and integration back into the genome. Over time, these pseudogenes accrue mutations (i.e. SNPs/DIPs) represented by the red stars in the pseudogene model. Therefore, although the pseudogene shows a high level of homology to the parent gene across its entire length, it is no longer able to encode for protein due to these mutations.


Well, the sure-fire assumption would be no. Surely not? These pseudogenes now lack the regulatory elements found at the parent locus that direct and control the specificity of expression. However, more and more examples of functionality associated with this gene class are cropping up in the literature (e.g. PTENP1; Poliseno, et al. 2009). The majority of these functional processed pseudogenes have been shown to regulate the expression of other genes in the genome and one proposed model is that they compete with other genes for the binding of small RNAs (Figure 2). It can therefore be easily hypothesised that over-expression of one gene results in the decreased-expression of the other and vice versa…but therein lies the quandary – this model relies on the processed pseudogene to be transcribed.

Figure 2: Model for functional processed pseudogenes Both ‘RNA X’ and ‘RNA Y’ share the same miRNA binding site and are therefore competing for this miRNA within the cell. If we hypothesise that ‘RNA X’ is a processed pseudogene and ‘RNA Y’ is its parent then we can hypothesise a model in which the expression of one modulates the expression of the other and vice versa.


Our GENCODE dataset has a detailed pseudogene classification hierarchy and we are the only annotation set that describes transcribed processed pseudogenes specifically. There are 159 transcribed processed pseudogenes within our dataset and each one has been manually annotated and verified by an experienced HAVANA annotator; generating an interesting list:

Figure 3: Example of transcribed processed pseudogene. In this example on Human chr. 7, locus specific cDNAs allow us to annotate two novel non-coding transcripts that link a processed and an unprocessed pseudogene. Furthermore, the unprocessed pseudogene is host to two snoRNA families. These snoRNAs are found at only three loci across the human genome: here, at another CCT6A transcribed unprocessed pseudogene and at the CCT6A parent gene. snoRNAs facilitate in the modification of RNAs resulting in the modulation of expression levels of loci where they bind so we can begin to hypothesise a model where the expression level of each of these three genes is able to regulate the expression level of another other; importantly, only one of these three loci is protein coding.


Therefore, processed pseudogenes are, at least in some cases, not junk, but gene.


One thought on “Processed Pseudogenes – Junk or Gene?

  1. Wow that was unusual. I just wrote an really long comment but after I clicked submit my comment didn’t appear.
    Grrrr… well I’m not writing all that over again.
    Anyhow, just wanted to say fantastic blog!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s