{"id":25573,"date":"2021-04-28T18:46:35","date_gmt":"2021-04-28T16:46:35","guid":{"rendered":"http:\/\/emblog.embl.de\/ells\/?post_type=teachingbase&#038;p=25573"},"modified":"2023-02-07T09:04:20","modified_gmt":"2023-02-07T09:04:20","slug":"species-identification","status":"publish","type":"teachingbase","link":"https:\/\/www.embl.org\/ells\/teachingbase\/dna-barcoding-resource\/species-identification\/","title":{"rendered":"Species identification using bioinformatics"},"content":{"rendered":"\n<div class=\"vf-tabs\"><ul class=\"vf-tabs__list\" data-vf-js-tabs=\"true\"><li class=\"vf-tabs__item\"><a class=\"vf-tabs__link\" href=\"#vf-tabs__section-b7f3828e-07cc-49f8-a8e9-a2f34d2b98bd\" data-vf-js-location-nearest-activation-target=\"\">Overview<\/a><\/li><li class=\"vf-tabs__item\"><a class=\"vf-tabs__link\" href=\"#vf-tabs__section-d9c70a85-7ac1-4184-99f9-097d6338feb1\" data-vf-js-location-nearest-activation-target=\"\">Sequence preparation 1<\/a><\/li><li class=\"vf-tabs__item\"><a class=\"vf-tabs__link\" href=\"#vf-tabs__section-3963c1d9-d4f4-418a-a769-d205af48c1e0\" data-vf-js-location-nearest-activation-target=\"\">Sequence preparation 2<\/a><\/li><li class=\"vf-tabs__item\"><a class=\"vf-tabs__link\" href=\"#vf-tabs__section-4e25e7ac-b259-4994-baa8-8522a7cd4b0f\" data-vf-js-location-nearest-activation-target=\"\">Sequence preparation 3<\/a><\/li><li class=\"vf-tabs__item\"><a class=\"vf-tabs__link\" href=\"#vf-tabs__section-d2769e33-2bf1-4476-bcc1-3538b1b708d1\" data-vf-js-location-nearest-activation-target=\"\">Database search<\/a><\/li><li class=\"vf-tabs__item\"><a class=\"vf-tabs__link\" href=\"#vf-tabs__section-6446f30d-7a6a-415e-a2ec-06615967b299\" data-vf-js-location-nearest-activation-target=\"\">Activity navigation<\/a><\/li><\/ul><div class=\"vf-tabs-content\" data-vf-js-tabs-content=\"true\">\n<section class=\"vf-tabs__section\" id=\"vf-tabs__section-b7f3828e-07cc-49f8-a8e9-a2f34d2b98bd\"><h2>Overview<\/h2>\n<p>The following instructions guide through the analysis of the sequencing data of the DNA barcoding marker gene(s) which were obtained by DNA extraction and amplification in the wet-lab aspects of the DNA barcoding workflow. The analysis includes preparing the sequencing sample for database search and the database search itself, and will lead to identification of the plant species in question on a molecular basis.<\/p>\n<\/section>\n\n\n\n<section class=\"vf-tabs__section\" id=\"vf-tabs__section-d9c70a85-7ac1-4184-99f9-097d6338feb1\"><h2>Sequence preparation 1<\/h2>\n<p>You have now received a forward and reverse sequence of your plant DNA barcode from the sequencing service. Before you can search the barcode against the entries in the <a title=\"Tools and databases page &gt; ENA\" rel=\"noopener noreferrer\" href=\"\/ells\/dnabarcoding\/?page_id=5192#ena\" target=\"_blank\">European Nucleotide Archive (ENA)<\/a>, the sequences of the forward and reverse reads have to be assembled into a single consensus sequence called a contig.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Assembling the contig of the DNA barcode will involve the following steps:<\/h3>\n\n\n\n<p>1. converting the reverse sequence into its reverse complement<\/p>\n\n\n\n<p>2. aligning the forward and reverse sequence reads<\/p>\n\n\n\n<p>3. editing and assembling the consensus sequence<\/p>\n\n\n\n<p>To obtain your contig, follow the instructions in the tabs below. In case you would like to be guided through the instructions in more detail, please visit the <a href=\"https:\/\/www.embl.org\/ells\/teachingbase\/dna-barcoding-resource\/bioinformatics-tutorials\/\" data-type=\"teachingbase\" data-id=\"25570\">Bioinformatics Tutorial<\/a> page.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Reverse complement<\/h3>\n\n\n\n<p>To be able to align the&nbsp;forward and reverse sequence reads, the two sequences have to have the same orientation. This can be achieved by converting the reverse sequence into its&nbsp;reverse complement&nbsp;(i.e. converting the 3\u2032-5\u2032 sequence into a 5\u2032-3\u2032 orientation).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Proceed as described below:<\/strong><\/h4>\n\n\n\n<p>1. Open the&nbsp;.seq&nbsp;file of your reverse sequence using a text editor such as NotePad on Windows or TextEdit on Mac. The .seq file contains the sequence information in&nbsp;FASTA format.<\/p>\n\n\n\n<p>2. Copy the whole sequence including the \u201c&gt;\u201d sign and descriptive header (keyboard shortcut Ctrl + C).<\/p>\n\n\n\n<p>3. Paste all the information into the EMBOSS Seqret input box below (Ctrl + V). Alternatively, use the upload function to upload the reverse .seq file.<\/p>\n\n\n\n<p>4. In \u201cStep 1\u201d ensure \u201cDNA\u201d is selected as input data.<\/p>\n\n\n\n<p>In \u201cStep 2\u201d select \u201cFASTA format\u201d as input and output format. To receive the reverse complement of the sequence, click on \u201cMore options\u201d and select \u201cYes\u201d as \u201cReverse\u201d option.<\/p>\n\n\n\n<p>5. Click on \u201cSubmit\u201d.<\/p>\n\n\n\n<p>6. Open an empty text editor document on your computer.<\/p>\n\n\n\n<p>7. Once your reverse complement sequence is available in the \u201cTool Output\u201d window of EMBOSS Seqret, copy the whole sequence into the new text editor document (Ctrl + C and Ctrl + V). Again, include the \u201c<strong>&gt;<\/strong>\u201d sign and descriptive header when copying the sequence.<\/p>\n\n\n\n<p>8. Keep the \u201c<strong>&gt;<\/strong>\u201d sign at the beginning sequence information but replace the descriptive header by \u201cSampleID_RP_RevComp\u201d. Save the new text editor document as \u201cSampleID_RP_RevComp\u201d on your desktop.<\/p>\n\n\n\n<p>Proceed to the \u201cAlignment\u201d tab to align your sequences.<\/p>\n\n\n<div\n  class=\"vf-embed vf-embed--custom-ratio\"\n\n  style=\"--vf-embed-max-width: 100%;\n    --vf-embed-custom-ratio-x: 640;\n    --vf-embed-custom-ratio-y: 360;\"\n><iframe loading=\"lazy\" width=\"640\" height=\"360\" src=\"https:\/\/www.ebi.ac.uk\/Tools\/sfc\/emboss_seqret\/\" frameborder=\"0\" allow=\"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><\/div>\n<\/section>\n\n\n\n<section class=\"vf-tabs__section\" id=\"vf-tabs__section-3963c1d9-d4f4-418a-a769-d205af48c1e0\"><h2>Sequence preparation 2<\/h2>\n<h3 class=\"wp-block-heading\">Alignment<\/h3>\n\n\n\n<p>The reverse sequence which was reversed and complemented in the last step can now be aligned with the forward sequence. The alignment will reveal the consensus sequence as well as any nucleotide mismatches or gaps between the forward and reverse reads. Nucleotide positions with mismatches or gaps can then be cross-checked with the&nbsp;chromatogram-view&nbsp;(&gt; \u201cChromatogram\u201d tab) of the forward and reverse sequences and a single consensus barcode sequence can be assembled.<\/p>\n\n\n\n<p><strong>Proceed as described below:<\/strong><\/p>\n\n\n\n<p>1. Open the .seq file of your forward sequence. Copy and paste the whole sequence including the \u201c&gt;\u201d sign and descriptive header (keyboard shortcut Ctrl + C) into the first EMBOSS Needle input box below.<\/p>\n\n\n\n<p>2. Open the text file \u201cSampleID_RP_RevComp\u201d. Copy the whole sequence including the \u201c&gt;\u201d sign and descriptive header (keyboard shortcut Ctrl + C and Ctrl + V) into the second EMBOSS Needle input box below.<\/p>\n\n\n\n<p>Alternatively, use the upload function to upload the forward sequence and the edited reverse sequence.<\/p>\n\n\n\n<p>3. Keep the default settings in \u201cStep 2\u201d and click on \u201cSubmit\u201d.<\/p>\n\n\n\n<p>4. Once the alignment is available, click on \u201cView alignment file\u201d and scroll down to study the sequence alignment.<\/p>\n\n\n\n<p>For\u00a0<strong>a guide to EMBOSS Needle nucleotide sequence alignment result<\/strong>, click\u00a0<a rel=\"noreferrer noopener\" href=\"http:\/\/emblog.embl.de\/ells\/dnabarcoding\/?page_id=5192#needle\" target=\"_blank\">here<\/a>. An example of how to interpret results of the EMBOSS Needle nucleotide sequence alignment can be found at the\u00a0<a rel=\"noreferrer noopener\" href=\"https:\/\/www.embl.org\/ells\/teachingbase\/dna-barcoding-resource\/bioinformatics-tutorials\/\" data-type=\"teachingbase\" data-id=\"25570\" target=\"_blank\">Bioinformatics tutorial<\/a>\u00a0(> \u201cAlignment\u201d tab > 4.).<\/p>\n\n\n\n<p>5. Open the chromatograms of your forward and reverse sequence by opening the respective&nbsp;.ab1&nbsp;files using a&nbsp;chromatogram viewer. For easier analysis, reverse complement the reverse chromatogram (Chromas Lite: \u201cEdit\u201d &gt; \u201cReverse+Complement\u201d; 4Peaks: \u201cEdit\u201d &gt; \u201cFlip sequence\u201d).<\/p>\n\n\n\n<p>6. Now prepare a document which will hold your&nbsp;contig&nbsp;sequence in&nbsp;FASTA format. Open an empty text editor document on your computer. Copy the whole forward sequence from your .seq file into the new text document (Ctrl + C and Ctrl + V). Keep the \u201c&gt;\u201d sign at the beginning sequence information but replace the descriptive header by \u201cSampleID_Contig\u201d. Save the document as \u201cSampleID_Contig\u201d on your desktop.<\/p>\n\n\n\n<p>7. Go through the alignment and identify gaps or mismatches. For every mismatch or gap, go to the respective nucleotide position in the forward and reverse chromatograms&nbsp;(you can use the search function of the software to find the position within the sequence). Looking at the two chromatograms, compare the peaks at the respective nucleotide position and decide whether the forward or reverse read looks more reliable. You might also be able to identify the identity of any \u201cunknown\u201d nucleotides (\u201cN\u201d). In your \u201cSampleID_Contig\u201d text document edit the sequence according to your analysis (remember that you have copied the forward sequence).<\/p>\n\n\n\n<p>8. Once you have completed all the necessary edits, you have assembled the contig of your sample\u2019s barcode. Make sure you save the document!<\/p>\n\n\n\n<p>You are now ready to search the barcode against the entries in the&nbsp;<a rel=\"noreferrer noopener\" href=\"http:\/\/emblog.embl.de\/ells\/dnabarcoding\/?page_id=5192#ena\" target=\"_blank\">European Nucleotide Archive (ENA)<\/a>. To do this, proceed to the&nbsp;\u201cDatabase search\u201d&nbsp;tab.<\/p>\n\n\n<div\n  class=\"vf-embed vf-embed--custom-ratio\"\n\n  style=\"--vf-embed-max-width: 100%;\n    --vf-embed-custom-ratio-x: 640;\n    --vf-embed-custom-ratio-y: 360;\"\n><iframe loading=\"lazy\" width=\"640\" height=\"360\" src=\"https:\/\/www.ebi.ac.uk\/Tools\/psa\/emboss_needle\/\" frameborder=\"0\" allow=\"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><\/div>\n<\/section>\n\n\n\n<section class=\"vf-tabs__section\" id=\"vf-tabs__section-4e25e7ac-b259-4994-baa8-8522a7cd4b0f\"><h2>Sequence preparation 3<\/h2>\n<p>A\u00a0sequencing chromatogram\u00a0displays the data produced by the sequencing machine as a so-called trace. Analysing the chromatograms of your forward and reverse sequences will help you to check the quality of the sequences and to cross-check mismatches or gaps identified via the forward-reverse sequence alignment. <\/p>\n\n\n\n<p>To study your chromatograms, we recommend you use one of the chromatogram viewer solutions\u00a0<a href=\"https:\/\/www.embl.org\/ells\/teachingbase\/dna-barcoding-resource\/bioinformatics-tools-and-sample-data\/\" data-type=\"teachingbase\" data-id=\"25568\">here<\/a>. To find out more about how to analyse chromatogram information, click\u00a0<a href=\"https:\/\/www.embl.org\/ells\/teachingbase\/dna-barcoding-resource\/bioinformatics-tutorials\/\" data-type=\"teachingbase\" data-id=\"25570\">here<\/a>\u00a0(> \u201cChromatogram\u201d tab).<\/p>\n<\/section>\n\n\n\n<section class=\"vf-tabs__section\" id=\"vf-tabs__section-d2769e33-2bf1-4476-bcc1-3538b1b708d1\"><h2>Database search<\/h2>\n<p>Please follow the instructions below to identify nucleotide sequences which match your barcode in the European Nucleotide Archive (ENA).<\/p>\n\n\n\n<p>1. Copy your whole contig sequence from the text file <span style=\"color: #000000;\">&#8220;SampleID_Contig&#8221;,<\/span> including the \u201c&gt;\u201d sign and descriptive header, into the ENA search box below <span style=\"color: #000000;\">(keyboard shortcut Ctrl + C and Ctrl + V). Alternatively, use the upload function to upload your text file. <\/span><\/p>\n\n\n\n<p>2. In the field &#8220;Search against&#8221; select &#8221; <span class=\"gwt-RadioButton\">Assembled and annotated sequences<\/span>&#8221; and &#8220;Limit sequence by&#8221; &gt; &#8220;Data class&#8221; &gt; &#8220;Standard sequences (STD)&#8221;.<\/p>\n\n\n\n<p>3. Initiate the search by clicking on &#8220;Submit&#8221; (you might need to scroll back to the left to see the &#8220;Submit&#8221; button). The inserted sequence will now be compared to all the known sequences contained in the database and the best alignment hits will be displayed.<\/p>\n\n\n\n<p>4. In the &#8220;Summary table&#8221; you will see the top 50 sequence search results.<\/p>\n\n\n\n<p>By default, the search results are sorted according to their &#8220;Score&#8221;, with the highest at the top. Results may also be sorted according to any other value of the results columns by clicking on the up\/down arrows. However, for the purpose of identifying the closest match, keep the results sorted according to &#8220;Score&#8221;.<\/p>\n\n\n\n<p>5. To identify the best match, proceed as follows: sort the search results according to their score (highest at the top), if not done already. The result with the combined highest score and lowest E-value is your best match. In case there are multiple results which have the combined highest score and lowest E-value, choose the one with the highest identity percentage.<\/p>\n\n\n\n<p>In case there are two or more results with identical score\/E-value\/% identity, the database is unable to discriminate between the entries (e.g. due to inaccuracies in your input sequence) or might not contain an entry of your species. If this is the case, record all of your top results. You might still be able to identify your sample to genus level.<\/p>\n\n\n\n<p>For examples of search results and how to analyse them, visit the <a href=\"https:\/\/www.embl.org\/ells\/teachingbase\/dna-barcoding-resource\/bioinformatics-tutorials\/\" data-type=\"teachingbase\" data-id=\"25570\">Bioinformatics Tutorial <\/a>page.<\/p>\n\n\n\n<p>6. Can you identify the best database match for your sequence? Which organism does it belong to? Can you identify your sample to genus, or even species, level?<\/p>\n\n\n\n<p>You have now identified the genus and, possibly, species name of your plant sample. <\/p>\n\n\n<div\n  class=\"vf-embed vf-embed--custom-ratio\"\n\n  style=\"--vf-embed-max-width: 100%;\n    --vf-embed-custom-ratio-x: 640;\n    --vf-embed-custom-ratio-y: 360;\"\n><iframe loading=\"lazy\" width=\"640\" height=\"360\" src=\"https:\/\/www.ebi.ac.uk\/ena\/browser\/sequence-search\" frameborder=\"0\" allow=\"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><\/div>\n<\/section>\n\n\n\n<section class=\"vf-tabs__section\" id=\"vf-tabs__section-6446f30d-7a6a-415e-a2ec-06615967b299\"><h2>Activity navigation<\/h2>\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/www.embl.org\/ells\/teachingbase\/dna-barcoding-resource\/\" data-type=\"teachingbase\" data-id=\"22912\">Introductory page <\/a><\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Sample collection and wet-lab<\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/www.embl.org\/ells\/teachingbase\/dna-barcoding-resource\/wet-lab-protocols\/\" data-type=\"teachingbase\" data-id=\"25575\">Protocols for sample collection and wet-lab activity<\/a><\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Bioinformatics<\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/www.embl.org\/ells\/teachingbase\/dna-barcoding-resource\/bioinformatics-tutorials\/\" data-type=\"teachingbase\" data-id=\"25570\">Bioinformatics tutorial<\/a><\/li><li><a href=\"https:\/\/www.embl.org\/ells\/teachingbase\/dna-barcoding-resource\/species-identification\/\" data-type=\"teachingbase\" data-id=\"25573\">Species identification using bioinformatics<\/a><\/li><li><a href=\"https:\/\/www.embl.org\/ells\/teachingbase\/dna-barcoding-resource\/bioinformatics-tools-and-sample-data\/\" data-type=\"teachingbase\" data-id=\"25568\">Bioinformatics tools and sample data<\/a><\/li><li><a href=\"https:\/\/www.embl.org\/ells\/teachingbase\/dna-barcoding-resource\/troubleshooting-glossary\/\" data-type=\"teachingbase\" data-id=\"25566\">DNA barcoding troubleshooting &amp; glossary<\/a><\/li><\/ul>\n<\/section>\n<\/div><\/div>\n","protected":false},"excerpt":{"rendered":"","protected":false},"featured_media":430,"parent":22912,"menu_order":0,"template":"","class_list":["post-25573","teachingbase","type-teachingbase","status-publish","has-post-thumbnail","hentry"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.embl.org\/ells\/wp-json\/wp\/v2\/teachingbase\/25573","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.embl.org\/ells\/wp-json\/wp\/v2\/teachingbase"}],"about":[{"href":"https:\/\/www.embl.org\/ells\/wp-json\/wp\/v2\/types\/teachingbase"}],"up":[{"embeddable":true,"href":"https:\/\/www.embl.org\/ells\/wp-json\/wp\/v2\/teachingbase\/22912"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.embl.org\/ells\/wp-json\/wp\/v2\/media\/430"}],"wp:attachment":[{"href":"https:\/\/www.embl.org\/ells\/wp-json\/wp\/v2\/media?parent=25573"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}