The Human Genome Project launched in 1990 with the aim to sequence 95% of the DNA in human cells in just 15 years. In 1992, John Sulston submitted a grant application for £40-50 million to fund a new centre – the Sanger Centre – which was to form the British arm of the Human Genome Project’s sequencing efforts. In 1993 – with funding from the Wellcome Trust and MRC – the Sanger Centre (now Sanger Institute) was officially opened. By 1994, the Sanger Institute had produced its first 100,000 bases of human DNA sequence and the benefit of the project quickly became clear; for example: in 1995, researchers from the Sanger Centre, with international collaborators, located the BRCA2 gene, associated with increased risk of breast cancer.
In September 2003 the ENCODE – the Encyclopedia Of DNA Elements – consortium was formed to carry out an ambitious project to identify all functional elements in the human genome sequence. Following on from a successful pilot phase on 1% of the genome, the scale-up to the entire genome is now underway. GENCODE is a sub-project of ENCODE which aims to annotate all evidence-based gene features in the entire human genome at a high accuracy resulting a set of annotations including all protein-coding loci with alternatively transcribed variants, non-coding loci with transcript evidence, and pseudogenes. This is achieved by a combination of initial manual annotation by the Human and Vertebrate Analysis and Annotation (HAVANA) team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results.
To create a gold standard reference annotation, we in HAVANA use tools developed in-house to manually annotate not only human as outlined above as part of GENCODE, but also mouse and zebrafish genomes (see Figure for example). The HAVANA team constantly updates its approach and guidelines by incorporating new data sources that are created as new technologies are developed. Our annotation is freely available and is displayed on the VEGA, Ensembl and UCSC genome browsers.
The ultimate aim of this blog to show just how useful this resource which is freely available to any interested person can be.
Figure 1: Example of HAVANA manual annotation of the Human genome