Edit

Science Education

Formerly known as European Learning Laboratory for the Life Sciences

Our inspiring educational experiences share the scientific discoveries of EMBL with young learners aged 10-19 years and teachers in Europe and beyond. We belong to EMBL’s Science Education and Public Engagement office.

April 28, 2021

Bioinformatics tools and sample data

Overview
Example data
EMBOSS Seqret
EMBOSS Needle
European Nucleotide Archive
Chromatogram viewer
Activity navigation

Overview

On this page, we provide a summary of the bioinformatics tools to use for the analysis of your DNA sequencing data. We also provide example sequence data which you can use in case you did not collect your own plant samples and wet-lab experimentation.

Example data

The example sequence data provide you with the opportunity to use barcode sequences without the need to carry out any experiments yourself. The sequence folder contains the .seq (text information) and .abi (chromatogram information) files for. You may use them to assemble contigs and then search the ENA database for matches. In case you would like to skip contig assembly, we have provided text files which contain the contig sequences of the respective examples.

Following sequence preparation and search using the example data, the identity and an image of the plant in question can be viewed by clicking on “Example 1”, “Example 2”, etc. below, respectively. Please enter the genus name of the identified plant (e.g. Olea) to access the image.

A folder containing all example data files (.seq and .abi files) can be accessed here.

Example 1

Forward and reverse sequencing reads for rbcL barcode

This sequence is identical to “TutorialExample”

Example 2

Forward and reverse sequencing reads for rbcL barcode

Example 3

Forward and reverse sequencing reads for rbcL barcode

Example 4

Forward and reverse sequencing reads for rbcL barcode
Contig for rbcL read
Reverse sequencing reads for matK barcode (please read note doc in folder)

Example 5

Forward and reverse sequencing reads for rbcL barcode
Contig for rbcL read
Forward and reverse sequencing reads for matK barcode
Contig for matK read

EMBOSS Seqret

Tool which can be used to reverse complement a nucleotide sequence.

https://www.ebi.ac.uk/Tools/sfc/emboss_seqret/

EMBOSS Needle

Tool to align two sequences. To align nucleotide sequences select “nucleotide”.

https://www.ebi.ac.uk/Tools/psa/emboss_needle/

Guide to EMBOSS Needle nucleotide sequence alignment result

Vertical lines indicate identical nucleotides (matches).

Dots or empty spaces indicate mismatches (i.e. conflicting nucleotide reads in forward and reverse sequence).

Empty spaces indicate gaps (i.e. a nucleotide read versus no nucleotide read in the two sequences).

Horizontal dashes within a sequence indicate gaps.

Horizontal dashes at either side of the alignment indicate that the two sequences did not produce an overlapping read in these areas (no alignment).

The letter “N” within a sequence is used to denote an unknown nucleotide.

European Nucleotide Archive

The ENA database provides a comprehensive record of the world’s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. ENA is made up of a number of distinct databases that includes EMBL-Bank, the Sequence Read Archive (SRA) and the Trace Archive.

https://www.ebi.ac.uk/ena/browser/sequence-search

ENA result columns in NCBI BLAST+ (definitions)

“Align.” reports the number which has been assigned to the search result according to the best alignment (i.e. the result with the best hit is assigned number 1).

“DB:ID” gives the database identification code of the database entry/search result.

“Source” provides information on the source of the DNA, i.e. organism name and sequence type (e.g. rbcL gene).

“Length” reports the length of the database sequence.

“Score” describes the quality of the alignment. The higher the score, the higher the similarity between query and database sequence.

“Identities” reports the number of identical residues that are found aligned between the query and database sequence.

“Positives” reports the number of aligned residues that score positively in the substitution matrix (i.e. similar types of residues).

“E()” gives the expect value for the alignment – this is a measure of how likely this particular alignment occurs by chance when searching a database of a particular size. An E-value of 0.0 is the lowest possible value and indicates that you would not expect to see this alignment to occur by chance alone. For example, an E-value of 10 means, in a database of current size, one might expect to see 10 matches with a similar or better score, simply by chance alone. Exponential numbers will appear when the value is very small above 0.

Chromatogram viewer

To study your chromatograms, we recommend you use one of the chromatogram viewers below and open the .ab1 files of the forward and reverse sequence.

Suggested freeware chromatogram viewers

Windows users: Chromas Lite

Mac users: 4Peaks

Suggested web browser-based alternative

For the purpose of sequence analysis and contig assembly it is recommended to use downloadable chromatogram viewer software. In case downloading software is not an option, a web browser-based viewer (see link below) could be used. Please note that the web browser-based alternative is less convenient for sequence analysis than software versions as it exports the chromatogram as images. Software functions such as nucleotide search are thus not available and need to be replaced manually has thus to be done visually rather than with the help of the software’s search function.

Web browser-based chromatogram viewer: abiview chromatogram viewer