DNA sequence analysis

 

in the first lecture just for  refreshing we discussed about the basics of bioinformatics with few examples and the different features of bioinformatics for example development of databases  algorithms and hypotheses structure based design and next in the sequencing and we discussed about the applications of bioinformatics on different complexities of biological systems if you talk about the dna it is genetic material on the information stored at dna allow the organisms that are able to regulate their internal chemical composition growth as well as reproduction for example it allows mothers curly hair and fathers eyes

and so on that we know that they getting the genetic information so various units that governs this characteristics and genetic level are called as genes so there are several genes and correctly form a genome sequence it contains various genes so if you look at the genes they contain specific sequence of nucleotides for example a t c g right adenine guanine cytosine and we nucleotide so this is an example so it contains a phosphate group here and here this is the central deoxyribose sugar and here is a base here there are four different base for the dna and the four different base  for rna three are the common and one is different thymine replaced to uracil so if you look into this dna as well as the rna so the name of dna that sense for deoxyribose nucleic acid come from the sugar molecule contained in the backbone so if you see this is the base and this is the phosphate and here this is sugar so here if you see this is starting form one here two three four and five so in the two prime here it is h this is deoxy so hence the name deoxyribose nucleic acid this dna for a case of rna so this is o h so because this is oxyribose so is the nucleic acid so there are four different bases in the case of dna we have adenine cytosine guanine and thymine and if you look into this four bases they are classified in two groups one is called purines and they are double ring shape ok here is one ring this is another ring shape with a pentagon attached to one side of a hexagon so which is the hexagon and this is another pentagon attached to this hexagon so in pyrimidines they have hexagon shape here right so made generally with the four carbons and two nitrogen if you see one two three four carbons and two nitrogens so cytosine is both the dna and rna and the difference between the thymine and uracil thymine you can see in the dna and the uracil you can see in rna so here this is the one c h three group that is the made difference between the dna and rna so difference comes on two difference aspects right what are two different aspects dna and rna deoxyribose and right what is at the sugar level and the second is another base level so you can see the thymine and for the dna and the uracil in the case of the rna so how they form so there are  the four different bases attached with a ribose sugar and the phosphate and how they form a chain of this dna dna sequence so here this for  example take molecule one this is a molecule two so in this three prime so here is o h right so here is the phosphate attached with o h right so this h and this o h for the condensation reaction right so eliminate water molecule and then you can see there is a linkage phosphodiester linkage because phosphodiester two esters right one with this one and this one so we can form the diester linkage so again this molecule so if you this this is the first one right and here  this is the second one here this is the linkage with the with the condensation of this water molecule right so when you have this phosphodiester linkage here i show the a continuous ones so the one  nucleator contains form here this line to this line so you can see this is the phosphate attached here and this is the ribose and here this is the base right and here  to this one you can see the one nucleotide so now if you look into this  dna molecules

within the cells typically of two double double strand so you can this is the one strand and this is the another strand right in the information content in the one one strand here right essentially redundant  on the information on the other if you see this is redundancy is not the same

because the direction is from see here  the direction from five prime three prime right so here this is one one is five prime three prime and the other one is and the three prime five prime and then you can see the complementary a of this basis so every g on one strand for example right and c is found on the complementary strand and the vice versa so here i show the example right so here this is the green adenine and this red one  is cytosine and the blue is a guanine and the emergent the is thymine right so adenine spread with the thymine right here this is cytosine is spread with the guanine and here this thymine is spread with adenine and the guanine is spread with cytosine so when they form make this model of this dna first they try to put that to  strands here this is the backbone net past by backbone here and then first will try to put the bases on the two other sides then they couldn't get the compatible structure so then they think about the ladder type structures when they they play lagos right they like to link this lagos each other right so likewise take they ladle structure and they put the two  backbones here and they made the side chains the bases in between the ladders then they found it exactly matched they could see the matching  bases right with respect to the chemical groups as well as the space right and then they found the hydrogen wanting pattern also they could exactly fit with the favorable energy so now the for this one is the Watson crick the proposed in model for the dna and they got the Nobel prize in nineteen sixty two for this structures so now how this base pairing happens so if you have adenine and this thymine right it forms two hydrogen bonds rights its pairing and the guanine and cytosine they are paired with three hydrogen bonds why this pairing is specific why not with the other basis why a start  pairing with a g and the t is pairing with the c because of some  because one is purine and other is pyrimidines so that will be uniform in space right so because if you ye you are right so if one is purine and what is the pyrimidine if you have try to have this type of three two  pyrimidine parents are two pyrimidines either they have excess space which also has loss of energy and also they are very crowded so they did historic interaction they are

not able to  pair with each other so two different ways one is the static hindrance and the second one is a chemical group so because of the two reasons adenine is always pairing thymine and the guanine is always pairing with cytosine so if you know one strand then we can know the complementary strand because a always pair with the t and the g always pair with c so if you see this one as discussed now the two strands of dna molecule are complementary right but they are not in the same five prime three prime direction one is in the five prime three prime direction one is in the three prime five prime direction right because  most of cellular process they involve in the dna occur in the five prime three prime this is why the right in the sequences in the five prime three prime directions right ok so now if you have one sequence for example

a c g t t a c g ok we say sequence and what is the complementary sequences if this is a five prime three prime sequence right when you go to the just complementary then you can get the pairing you can write this complementary sequence but that will be in the three prime five prime direction right so for the a what is the complementary for a t t or the c g it is g right so we write like this a to t c to g and g to c t to a and so on this is the five prime three prime direction and here this is three prime five prime direction so we need to the complementary t so we had to reverse direction so if you write the five prime three prime then you reverse this direction right so then start from c g t a a c g t right this is how to write the a a complementary sequence here a few example which is a this is c and here this is t and here this is g right so this is here if you write five prime three prime this is three prime five prime this is why just you can see the complementary here so and i give this example ok with this a sequence ok what is a complementary strand from five prime three prime direction ggccttaaggccaaggaaaat tttaaggccttttaaaaccgggcct right so we have to get the pair and this reverse the direction so write this is the complementary strand ok now in the bioinformatics you can write algorithms we have were any given sequence you can write the algorithm to get the complementary strand right how to write the algorithm first what we have to do we have to take this sequence take this sequence and put reverse it  reverse it and the complementary or you can get the complementary and reverse it right so you can do that likewise there are various programs available in the literature right because they collected various  programs to analyze the dna sequences  put it together as kind of a package so emboss is such a package emboss stands for european molecular biology open software suite right it is a compilation of several programs you can see the several programs at the end at the side here write the menu so for a sequence analysis say behave dna sequence we can use this sequence and  do way carry out various analysis right using this specific a software called emboss right how to get this complementary strand from emboss software so we go to the emboss website so then go the edit and finally you can see the  the model revseq thats reverse the sequence reverse and complementary a nucleotide sequence so either you can give sequence from database or here you can choose a file if you have your file in your computer or you can give a sequence manual type is sequence here i give a sequence a c t g a c c right so now you if you click on the run reverse sequence then you will get the sequence g g t c a g t is it correct right in reverse complementary of this sequence right this is fine now you get the complementary strand now next step is when you have the dna sequence you can translate this into proteins right you want to two steps involved in protein synthesis transcription one is a transcription another is the translation right in the transcription what happens in transcription dna is converted to messenger messenger transcription by messenger rna right  rna and the translation messenger rna to the proteins right so two steps first one we have to have the dna and it change to rna right in this case dna as a four different bases a t c g in the case of rna a u u c g right so this way if you see here this is the dna sequence right contents a t c g right but is a rna sequence this case no  t but instead of t have you right the a complementary a is t instead of t you put u here c is g and this is g and this a is u and a is u and so on now if you have the rna sequence right now the then this rna is translated into proteins by a ribosomes but in a transcription mainly the they they rna polymerize does this  transcription right now the ribosomes are responsible to convert this translate the m mrna to proteins right so there they are the different  nucleotides right they use the codons there three  nucleotide together form one codon so each codon they are code for a specific amino acid  right so there are four different nucleotides right but how many amino amino acid residues twenty right so one twenty correspond is not possible if you take one one to one so four nucleotides mean code for only the four amino acids but you have twenty if it is combination of two how many combinations four into four equal to sixteen so that is also not possible because we have twenty so this have the three d combinations  three means totally how many possibilities sixty four sixty four possibilities right to three to three to this so right so if you give the four into four into four sixty four combinations but a twenty different amino acids that means so there are several codons which code for same amino acid that is called the degeneracy not one triple single codon to one right same different codons they code for the same amino acids we see the phenylalanine right so there are two codons u u u and u u c they code for phenylalanine if a leucine you have six here u u a u u g and c u u right c u c c u a and c u g ok all this code for leucine so as a one stop codon what is a stop codon you can see here this is a stop codon u right u g a is a stop codon also tryptophan is code with the only one codon right u g g so this is also one of the reasons why tryptophan is occurring a very less in the case of the a protein sequences so eighteen of the twenty amino acids are coded with the more than one codon right and this called this is called the degeneracy system so if we have a dna sequence can we get the protein sequence if

you have dna sequence or rna sequence we had i gave rna sequence can we get the protein sequence from this rna sequence yes what to do ye first take three right take this three right this three three three three steps right fine so a c g what is a c g a here c here and g here right a c g so a g c is for the three right so the a c g code for t then u g c that is cysteine right u g c u g c right u g c is here this is cysteine so it can put cysteine right then g c a is for the and this again u g c bar cysteine and a a c for the as per c g a is for the and a u g a u g right a u g is for the methionine right so now that is so they could the x right so in this sequence what will happen if you delete this nucleotide a if this not there you will get the similar sequence are difference sequence difference sequence right because now the codons are c g u c g u g c g g c g c a u c a ug c a g c a a c c a c c a c c g a a g a a u g a u g a right so now c g u code for c g u

arginine right this arginine this is this is g c a t g a a g a a e and u g a u g a sub codon the they will put a sub codon here so if you remove one single nucleotide so if there is this totally right so if you this way if you have the dna sequence i will look for the protein sequence we should know exactly where we start the codon in otherwise you will get the different   protein sequence so there are various resources you can also use programming you can write you one code write you get the translate the rna sequence and protein sequence how to write a program right so first order you what the information required inputs you can need this input sequence right you need the you know the rna sequence here so you need to get the protein right so how to write the program to get the protein sequence in rna sequence what are the input necessary first you rna sequence right you only first we need the rna sequence for example

a c u a c u g c g c a u a u g g ok so you have the rna sequence other information require the code right so you get the the mapping right you have the table right the codon to amino acid table right codon to amino acid table right so then did will get the information regarding each codon codes for which amino acid then what you have to do isnt first change the rna into right you have to codons right so three nucleotides

say cut into three pieces right not the overlapping right to stop when i start from the next one so non overlapping right there is very important then you have to match right with the codons with the table right then you have to map match the sequence and then finally you obtain the amino acid right and finally if you do it till the end till the end of this sequence then finally we get the protein sequence say it get a rna sequence read the rna sequence and get the table right codon table and you have to cut the rna a sequence into the pieces of three a nucleotides that for each codon right non overlapping ones right then you map with the codon see the amino acid in the table then you get the amino acid right then finally do till the end and look at the protein sequence so nevertheless in the several resources are available in the literature so again you can use the emboss software to tra translate the rna sequence to protein sequence right so here you can use the transeq this is the module available in the emboss software right so here you put the sequence a c g u g c g c a u g c a c c g a u g a right so now if you go with one so it will ask for the different frames right so there are three six different frames right how many frames six six frames right so what are three six frames three forward frames and three i will just explain the next slide right so you get the moment you take this sequence and if you ask to convert the protein sequence right you can see this into this same sequence t ok i showed the example here the same sequence i give right a c g right so i get the same sequence t c a c n r m x so if you cut this one right the first one then you get the same sequence so for each sequence we have six reading frames then how do you get the six reading frames three in the forward direction  three in the reverse direction how do you get the three in the forward direction for example this is the five prime three prime this a dna sequence right so first you continue three nucleotides so get the you call it c a a t g g c t a and so on at the dna so now you go with this  when i sits the c a a codes for q and t g g code for w and this code for l so it get the amino acid sequence right this is one and the second one it take this one out c then you make in the another three so a a t g g c t a g and so on so now i check this table for the amino acid so here you can see the  amino acid and the third one if you take this to c and a right and then it start with the c a and a t g g c t a g g t a c and so on so now looked into the table  each corner right you will get the m a r t and so on what will happen if you more again again the same the same will repeat right say if it t g g so it will start from here right you don't have this amino acid but you start from same same sequence so right so in this case you will get the three three different likewise in the reverse frame so how do i get the reverse frame complement right so let me have this five prime three prime now if you take the complementary right then if you take the complementary then it will get the three prime five prime direction right and you reverse it right then you get the same strand they give from the five prime three prime then from this five three three prime sequence then you make the codons right here c t c g g a t t t and so on well this is the this will code for l g g a code for g and t t t code for w f right so now you can get the amino acids likewise just you take one out then you will get the second frame and the third frame so that total will have sixth difference three forward frames and three reverse frames so you can use this transeq right to get the sixth difference so here if you give the input sequence and then here you have options whether this is the forward reading frames or the reverse reading frames right are everything say if you go with the all six frames right you go here a click on all six frames so you will get the sequence right so this is the forward frame one this is two this is three and the reverse one reverse two and reverse three so then if you go into the same software emboss so if you see there are various tools available in the emboss right so everyone have the different different values and they have gives different algorithms to get the values so we can use any of these different module available in the emboss right to get the characteristic features of this dna sequences or rna sequences

 

Comments

Popular posts from this blog

Database Systems (ICAR ASRB NET Bioinformatics Unit 3)

ICAR ASRB NET – Bioinformatics 2023 model paper