European Learning Laboratory for the Life Sciences

Our inspiring educational experiences share the scientific discoveries of EMBL with young learners aged 10-19 years and teachers in Europe and beyond.

This article is also available in  Čeština,  Français,  Ελληνικά,  Italiano and  Svenska

GFP treasure hunt


This treasure hunt was designed to introduce participants to various bioinformatics approaches which can be used to identify and analyse DNA and protein sequences. In the activity, we are going to analyse the DNA and amino acid sequences of the Green Fluorescent Protein (GFP) and closely-related molecules using different web-based databases and analysis tools. Biological data – e.g., DNA and protein sequences or structural information – are stored and curated in databases which are made available to scientists. As most of the data are produced with public money they are also publicly available. Similarly, bioinformatics tools are designed by scientists funded by public money and are therefore available to other scientists and the public. In this activity, we will be using such bioinformatics databases and tools for our analysis.

Technical requirements

Astex Viewer requires Java to be installed on your computer and enabled in your web browser. Java can be installed for free via java.com. Information on how to enable Java in your web browser can be found here.

Bioinformatics tools used in this activity

ENA (European Nucleotide Archive)
Database providing a comprehensive record of the world’s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. ENA is made up of a number of distinct databases that includes EMBL-Bank, the Sequence Read Archive (SRA) and the Trace Archive.

MUSCLE (Multiple Sequence Comparison by Log-Expectation)
Tool to align and compare multiple sequences, particularly suitable for amino acid sequences.

EMBOSS Transeq (European Molecular Biology Open Software Suite Transeq)
Tool which translates nucleic acid sequences to the corresponding peptide sequences.

Protein Data Bank in Europe (PDBe)
Database containing information on the structure of biological macromolecules.

Glossary: check out the ELLS Glossary for a growing number of bioinformatics-related terms

Activity navigation

The navigation menu below shows the individual steps of the activity. Start the treasure hunt by clicking on “GFP treasure hunt introduction” in the menu below.

GFP treasure hunt

Topic area:  Bioinformatics, Genome biology

Age group:  16-19

Author: Philipp Gebhardt