Science Education

Formerly known as European Learning Laboratory for the Life Sciences

Our inspiring educational experiences share the scientific discoveries of EMBL with young learners aged 10-19 years and teachers in Europe and beyond. We belong to EMBL’s Science Education and Public Engagement office.

This article is also available in  Čeština,  Français,  Ελληνικά and  Italiano

Part 3: Phylogenetic analysis of aligned protein sequences


In this part of the activity, we will construct a phylogenetic tree of the 24 opsin proteins. In order to do this, we will first have to remove the non-conserved amino acid residues which we identified in the multiple sequence alignment during Part 2 of the activity.

Your task

Proceed as described below:

1.   Edit the multiple sequence alignment from Part 2 to remove the non-conserved residues.

i.   Go back to the JalView window which contains the multiple sequence alignment from Part 2.

ii.   In JalView, delete any gaps or residues which appear as non-conserved by selecting the upper most area above the designated residue stretch with your cursor (a red box with a solid red top will appear) and hitting the backspace on your keyboard.

iii. Save the edited alignment as FASTA file: File > Output to Textbox > FASTA. Open a new text document on the desktop of your computer by right-clicking on your desktop: New > Text Document. Copy and paste the sequence data from the open window into the text document and leave text document open for later use.

Note: if, for any reason, you do not manage to edit the sequences or to save them, it does not matter too much for the purpose of this exercise. In this exercise we selected the sequences in such a way that they produce nice phylogenetic tree without the need for extensive editing. (Please note, however, that in real life, before creating a phylogenetic tree, it is always required to edit and remove non-aligned regions!)

2.   To create a phylogenetic tree file of the edited sequences, enter the edited alignment into MUSCLE (output format “ClustalW”). To do this, go the the “MUSCLE” tab and follow the instructions there.

3.   Try to answer some of the task questions.


1.   To create a phylogenetic tree file of the edited sequences, copy the FASTA data of your edited alignment into the MUSCLE input box (before hitting “Submit”, ensure that  “ClustalW” is selected as output format).

2.   On the results page, click “Send to ClustalW_Phylogeny” and in “Step 2” on the ClustalW phylogeny page select the following:
“Tree format”: Default; “Distance correction”: ON; “Exclude gaps”: ON (This is the most important step. By editing your sequences, you will have removed non-aligned gaps. However, in case you left any gaps in the alignment, selecting this option will take care that the algorithm works only with sequence positions not containing any gaps.); “Clustering method”: UPGMA; “P.I.M.”: ON

3.   Click “Submit”.

4.   At the bottom of the results window, beneath “Phylogram”, you will find an image of your phylogenetic tree. You can select the branch length as “Real” in order to see how fast your sequences have evolved. However, for the next part of the activity, the “Cladogram” view is handier. Looking at the tree structure, try to answer some of the task questions.


1.   Have a look at the structure of the tree. You should see one sequence which is an outgroup to all the others. Which one is it? Why do you think it is that one?
2.   The rest of the tree splits the sequences into two major groups. Does this split generally also reflect the evolutionary relationships between the species? Do you see any exceptions?

You can now either further analyse the opsin evolution in a short optional exercise (cf. “Optional Task” tab) or proceed to Part 4 of the activity by clicking on the link below.

Optional tasks

Further analysis of the multiple sequence alignment

We have now successfully constructed a phylogenetic tree of the opsin proteins. In this part of the activity we will use the multiple sequence alignment to further analyse the evolution of opsins.

Your Task

Go back to the multiple sequence alignment in JalView and, based on what you learnt from the phylogenetic analysis in Part 3, group the sequences according to the groupings you noticed in the tree in Part 3 (“Cladogram” view). Sequences can be moved around by clicking on the sequence name and using the up and down arrows on your keyboard. Note: ignore the Danio_mel_rec1A sequence at the top/bottom of the alignment in your analysis or remove the sequence from JalView completely (you can remove it by clicking on the sequence name and hitting the backspace button of your keyboard).


1. In G protein-coupled receptors, a tripeptide motive just after the transmembrane domain VII (the final one) is important for G-alpha binding. Can you find a tripeptide which also reflects the grouping of the sequences?
2. What do you think is the reason for this tripeptide to be so conserved?

Activity navigation