Edit

Science Education

Formerly known as European Learning Laboratory for the Life Sciences

Our inspiring educational experiences share the scientific discoveries of EMBL with young learners aged 10-19 years and teachers in Europe and beyond. We belong to EMBL’s Science Education and Public Engagement office.

This article is also available in  Deutsch

Part 1: From DNA to protein sequence

Overview

We just received an email from the TREC researchers containing the DNA sequence. To understand which protein it encodes, the first task is to convert this DNA sequence into an amino acid sequence, also known as the protein’s primary structure.

We will use the bioinformatic tool EMBOSS Transeq to assist us in this process. EMBOSS Transeq allows us to merge the biological processes of transcription (DNA to RNA) and translation (RNA to protein) and directly create an amino acid sequence from the DNA sequence.  

Begin the activity by following the instructions provided in the “Your task” tab and attempt to answer the accompanying questions.

Your task

Please follow the steps outlined below:

1. In the “Sequence” tab, you will find the unidentified DNA sequence labelled as “Unknown_DNA”.
2. Copy the sequence and proceed to the “EMBOSS Transeq” tab, where you will find instructions on how to identify the amino acid sequence of the unknown protein using the EMBOSS Transeq tool.
3. Try to answer the question in the “Question” tab.

Sequence

Input sequence:


>New species (Unknown) 
ATGAACGGCACCGAGGGCCCCTTCGGCTACATCCCCATGAGCAACGCCACCGGCCTGGTG
AGGAGCCCCTACGACTACCCCCAGTACTACCTGGTGCCCCCCTGGGGCTACGCCTGCCTG
GCCGCCTACATGTTCCTGCTGATCCTGACCGGCTTCCCCGTGAACTTCCTGACCCTGTAC
GTGACCATCGAGCACAAGAAGCTGAGGAGCCCCCTGAACTACATCCTGCTGAACCTGGCC
GTGGCCGACCTGTTCATGGTGATCGGCGGCTTCACCACCACCATGTGGACCAGCCTGGAC
GGCTACTTCGTGTTCGGCAGGATGGGCTGCAACATCGAGGGCTTCTTCGCCACCCTGGGC
GGCGAGATCGCCCTGTGGAGCCTGGTGGTGCTGAGCATGGAGAGGTGGATCGTGGTGTGC
AAGCCCATCAGCAACTTCAGGTTCGGCGAGAACCACGCCGTGATGGGCGTGGCCTTCAGC
TGGTTCATGGCCGCCGCCTGCGCCGTGCCCCCCCTGGTGGGCTGGAGCAGGTACATCCCC
GAGGGCATGCAGTGCAGCTGCGGCATCGACTACTACACCAGGGCCGAGGGCTTCAACAAC
GAGAGCTTCGTGATCTACATGTTCGTGGTGTTCTTCACCTGCCCCCTGACCATCATCACC
TTCTGCTACGGCAGGCTGGTGTGCACCGTGAAGGAGGCCGCCGCCCAGCAGCAGGAGAGC
GAGACCACCCAGAGGGCCGAGAGGGAGGTGACCAGGATGGTGATCATCACCTTCGTGGCC
TTCCTGGCCTGCTGGGTGCCCTACGCCAGCGTGGCCTGGTACATCTTCACCCACCAGGGC
AGCGAGTTCGGCCCCGTGTTCATGACCATCCCCGCCTTCTTCGCCAAGAGCAGCGCCGTG
TACAACCCCGTGATCTACATCTGCCTGAACAAGCAGTTCAGGCACTGCATGATCACCACC
CTGTGCTGCGGCAAGAACCCCTTCGAGGAGGAGGAGGGCAGCACCACCGCCAGCAAGACC
GAGGCCAGCAGCGTGTGCAGCGTGAGCCCCCACGCC

EMBOSS Transeq

1. Access the EMBOSS Transeq tool in the window below.
2. Paste your DNA sequence (including greater-than symbol (>) and sequence name) in the query box (STEP 1). In the  “Parameters” field (STEP 2), make sure to select “frame=1” and “Codon table=Standard codon”. Then, submit your search by clicking on “Submit”.
3. Examine the data table that appears and try to answer the questions related to the task. For better visualisation, click on “Show Colors”.
4. Download or copy the amino acid sequence for the next task. 

Note: The colours used in the output correspond to specific physicochemical properties of the amino acids. The amino acids are displayed in brackets as a one-letter code:
– Small + hydrophobic amino acids (AVFPMILW): Red 
– Acidic amino acids (DE): Blue
– Basic amino acids (RK): Magenta
– Amino acids with hydroxyl, sulfhydryl or amine groups + Glycine (STYHCNGQ): Green

Question

What are the most prevalent types of amino acids present in the protein sequence?

You can click on “Show Colors” to get information about the physicochemical properties of the individual amino acids.

Activity navigation

Share:

Edit