DNA sequence analysis
in the first lecture just for refreshing
we discussed about the basics of bioinformatics with few examples and the
different features of bioinformatics for example development of databases algorithms and hypotheses structure based
design and next in the sequencing and we discussed about the applications of
bioinformatics on different complexities of biological systems if you talk
about the dna it is genetic material on the information stored at dna allow the
organisms that are able to regulate their internal chemical composition growth
as well as reproduction for example it allows mothers curly hair and fathers
eyes
and so on that we
know that they getting the genetic information so various units that governs
this characteristics and genetic level are called as genes so there are several
genes and correctly form a genome sequence it contains various genes so if you look
at the genes they contain specific sequence of nucleotides for example a t c g
right adenine guanine cytosine and we nucleotide so this is an example so it
contains a phosphate group here and here this is the central deoxyribose sugar
and here is a base here there are four different base for the dna and the four
different base for rna three are the
common and one is different thymine replaced to uracil so if you look into this
dna as well as the rna so the name of dna that sense for deoxyribose nucleic acid
come from the sugar molecule contained in the backbone so if you see this is
the base and this is the phosphate and here this is sugar so here if you see
this is starting form one here two three four and five so in the two prime here
it is h this is deoxy so hence the name deoxyribose nucleic acid this dna for a
case of rna so this is o h so because this is oxyribose so is the nucleic acid
so there are four different bases in the case of dna we have adenine cytosine
guanine and thymine and if you look into this four bases they are classified in
two groups one is called purines and they are double ring shape ok here is one
ring this is another ring shape with a pentagon attached to one side of a
hexagon so which is the hexagon and this is another pentagon attached to this
hexagon so in pyrimidines they have hexagon shape here right so made generally
with the four carbons and two nitrogen if you see one two three four carbons
and two nitrogens so cytosine is both the dna and rna and the difference between
the thymine and uracil thymine you can see in the dna and the uracil you can see
in rna so here this is the one c h three group that is the made difference
between the dna and rna so difference comes on two difference aspects right
what are two different aspects dna and rna deoxyribose and right what is at the
sugar level and the second is another base level so you can see the thymine and
for the dna and the uracil in the case of the rna so how they form so there are
the four different bases attached with a
ribose sugar and the phosphate and how they form a chain of this dna dna sequence
so here this for example take molecule
one this is a molecule two so in this three prime so here is o h right so here
is the phosphate attached with o h right so this h and this o h for the
condensation reaction right so eliminate water molecule and then you can see
there is a linkage phosphodiester linkage because phosphodiester two esters right
one with this one and this one so we can form the diester linkage so again this
molecule so if you this this is the first one right and here this is the second one here this is the
linkage with the with the condensation of this water molecule right so when you
have this phosphodiester linkage here i show the a continuous ones so the one nucleator contains form here this line to this
line so you can see this is the phosphate attached here and this is the ribose
and here this is the base right and here to this one you can see the one nucleotide so
now if you look into this dna molecules
within the cells
typically of two double double strand so you can this is the one strand and this
is the another strand right in the information content in the one one strand
here right essentially redundant on the
information on the other if you see this is redundancy is not the same
because the direction
is from see here the direction from five
prime three prime right so here this is one one is five prime three prime and
the other one is and the three prime five prime and then you can see the
complementary a of this basis so every g on one strand for example right and c
is found on the complementary strand and the vice versa so here i show the
example right so here this is the green adenine and this red one is cytosine and the blue is a guanine and the
emergent the is thymine right so adenine spread with the thymine right here
this is cytosine is spread with the guanine and here this thymine is spread
with adenine and the guanine is spread with cytosine so when they form make
this model of this dna first they try to put that to strands here this is the backbone net past by
backbone here and then first will try to put the bases on the two other sides
then they couldn't get the compatible structure so then they think about the
ladder type structures when they they play lagos right they like to link this lagos
each other right so likewise take they ladle structure and they put the two backbones here and they made the side chains
the bases in between the ladders then they found it exactly matched they could
see the matching bases right with
respect to the chemical groups as well as the space right and then they found
the hydrogen wanting pattern also they could exactly fit with the favorable
energy so now the for this one is the Watson crick the proposed in model for
the dna and they got the Nobel prize in nineteen sixty two for this structures
so now how this base pairing happens so if you have adenine and this thymine
right it forms two hydrogen bonds rights its pairing and the guanine and
cytosine they are paired with three hydrogen bonds why this pairing is specific
why not with the other basis why a start pairing with a g and the t is pairing with the
c because of some because one is purine
and other is pyrimidines so that will be uniform in space right so because if
you ye you are right so if one is purine and what is the pyrimidine if you have
try to have this type of three two pyrimidine parents are two pyrimidines
either they have excess space which also has loss of energy and also they are
very crowded so they did historic interaction they are
not able to pair with each other so two different ways one
is the static hindrance and the second one is a chemical group so because of
the two reasons adenine is always pairing thymine and the guanine is always
pairing with cytosine so if you know one strand then we can know the
complementary strand because a always pair with the t and the g always pair
with c so if you see this one as discussed now the two strands of dna molecule
are complementary right but they are not in the same five prime three prime
direction one is in the five prime three prime direction one is in the three prime
five prime direction right because most
of cellular process they involve in the dna occur in the five prime three prime
this is why the right in the sequences in the five prime three prime directions
right ok so now if you have one sequence for example
a c g t t a c g ok
we say sequence and what is the complementary sequences if this is a five prime
three prime sequence right when you go to the just complementary then you can
get the pairing you can write this complementary sequence but that will be in
the three prime five prime direction right so for the a what is the
complementary for a t t or the c g it is g right so we write like this a to t c
to g and g to c t to a and so on this is the five prime three prime direction
and here this is three prime five prime direction so we need to the
complementary t so we had to reverse direction so if you write the five prime
three prime then you reverse this direction right so then start from c g t a a
c g t right this is how to write the a a complementary sequence here a few
example which is a this is c and here this is t and here this is g right so
this is here if you write five prime three prime this is three prime five prime
this is why just you can see the complementary here so and i give this example
ok with this a sequence ok what is a complementary strand from five prime three
prime direction ggccttaaggccaaggaaaat tttaaggccttttaaaaccgggcct right so we
have to get the pair and this reverse the direction so write this is the complementary
strand ok now in the bioinformatics you can write algorithms we have were any given
sequence you can write the algorithm to get the complementary strand right how to
write the algorithm first what we have to do we have to take this sequence take
this sequence and put reverse it reverse
it and the complementary or you can get the complementary and reverse it right
so you can do that likewise there are various programs available in the
literature right because they collected various programs to analyze the dna sequences put it together as kind of a package so emboss
is such a package emboss stands for european molecular biology open software suite
right it is a compilation of several programs you can see the several programs at
the end at the side here write the menu so for a sequence analysis say behave
dna sequence we can use this sequence and do way carry out various analysis right using this
specific a software called emboss right how to get this complementary strand
from emboss software so we go to the emboss website so then go the edit and
finally you can see the the model revseq
thats reverse the sequence reverse and complementary a nucleotide sequence so
either you can give sequence from database or here you can choose a file if you
have your file in your computer or you can give a sequence manual type is
sequence here i give a sequence a c t g a c c right so now you if you click on
the run reverse sequence then you will get the sequence g g t c a g t is it
correct right in reverse complementary of this sequence right this is fine now
you get the complementary strand now next step is when you have the dna
sequence you can translate this into proteins right you want to two steps
involved in protein synthesis transcription one is a transcription another is
the translation right in the transcription what happens in transcription dna is
converted to messenger messenger transcription by messenger rna right rna and the translation messenger rna to the
proteins right so two steps first one we have to have the dna and it change to
rna right in this case dna as a four different bases a t c g in the case of rna
a u u c g right so this way if you see here this is the dna sequence right
contents a t c g right but is a rna sequence this case no t but instead of t have you right the a
complementary a is t instead of t you put u here c is g and this is g and this
a is u and a is u and so on now if you have the rna sequence right now the then
this rna is translated into proteins by a ribosomes but in a transcription
mainly the they they rna polymerize does this transcription right now the ribosomes are
responsible to convert this translate the m mrna to proteins right so there
they are the different nucleotides right
they use the codons there three nucleotide together form one codon so each
codon they are code for a specific amino acid right so there are four different nucleotides
right but how many amino amino acid residues twenty right so one twenty
correspond is not possible if you take one one to one so four nucleotides mean
code for only the four amino acids but you have twenty if it is
combination of two how many combinations four into four equal to sixteen so
that is also not possible because we have twenty so this have the three d
combinations three means totally how
many possibilities sixty four sixty four possibilities right to three to three
to this so right so if you give the four into four into four sixty four
combinations but a twenty different amino acids that means so there are several
codons which code for same amino acid that is called the degeneracy not one
triple single codon to one right same different codons they code for the same
amino acids we see the phenylalanine right so there are two codons u u u and u
u c they code for phenylalanine if a leucine you have six here u u a u u g and
c u u right c u c c u a and c u g ok all this code for leucine so as a one stop
codon what is a stop codon you can see here this is a stop codon u right u g a
is a stop codon also tryptophan is code with the only one codon right u g g so
this is also one of the reasons why tryptophan is occurring a very less in the
case of the a protein sequences so eighteen of the twenty amino acids are coded
with the more than one codon right and this called this is called the
degeneracy system so if we have a dna sequence can we get the protein sequence
if
you have dna
sequence or rna sequence we had i gave rna sequence can we get the protein sequence
from this rna sequence yes what to do ye first take three right take this three
right this three three three three steps right fine so a c g what is a c g a here
c here and g here right a c g so a g c is for the three right so the a c g code
for t then u g c that is cysteine right u g c u g c right u g c is here this is
cysteine so it can put cysteine right then g c a is for the and this again u g
c bar cysteine and a a c for the as per c g a is for the and a u g a u g right
a u g is for the methionine right so now that is so they could the x right so
in this sequence what will happen if you delete this nucleotide a if this not
there you will get the similar sequence are difference sequence difference
sequence right because now the codons are c g u c g u g c g g c g c a u c a ug
c a g c a a c c a c c a c c g a a g a a u g a u g a right so now c g u code for
c g u
arginine right this
arginine this is this is g c a t g a a g a a e and u g a u g a sub codon the
they will put a sub codon here so if you remove one single nucleotide so if
there is this totally right so if you this way if you have the dna sequence i
will look for the protein sequence we should know exactly where we start the
codon in otherwise you will get the different protein
sequence so there are various resources you can also use programming you can
write you one code write you get the translate the rna sequence and protein sequence
how to write a program right so first order you what the information required
inputs you can need this input sequence right you need the you know the rna
sequence here so you need to get the protein right so how to write the program
to get the protein sequence in rna sequence what are the input necessary first
you rna sequence right you only first we need the rna sequence for example
a c u a c u g c g c
a u a u g g ok so you have the rna sequence other information require the code
right so you get the the mapping right you have the table right the codon to
amino acid table right codon to amino acid table right so then did will get the
information regarding each codon codes for which amino acid then what you have
to do isnt first change the rna into right you have to codons right so three
nucleotides
say cut into three
pieces right not the overlapping right to stop when i start from the next one so
non overlapping right there is very important then you have to match right with
the codons with the table right then you have to map match the sequence and
then finally you obtain the amino acid right and finally if you do it till the
end till the end of this sequence then finally we get the protein sequence say
it get a rna sequence read the rna sequence and get the table right codon table
and you have to cut the rna a sequence into the pieces of three a nucleotides
that for each codon right non overlapping ones right then you map with the
codon see the amino acid in the table then you get the amino acid right then
finally do till the end and look at the protein sequence so nevertheless in the
several resources are available in the literature so again you can use the
emboss software to tra translate the rna sequence to protein sequence right so here
you can use the transeq this is the module available in the emboss software
right so here you put the sequence a c g u g c g c a u g c a c c g a u g a
right so now if you go with one so it will ask for the different frames right
so there are three six different frames right how many frames six six frames
right so what are three six frames three forward frames and three i will just explain
the next slide right so you get the moment you take this sequence and if you
ask to convert the protein sequence right you can see this into this same
sequence t ok i showed the example here the same sequence i give right a c g
right so i get the same sequence t c a c n r m x so if you cut this one right
the first one then you get the same sequence so for each sequence we have six
reading frames then how do you get the six reading frames three in the forward
direction three in the reverse direction
how do you get the three in the forward direction for example this is the five
prime three prime this a dna sequence right so first you continue three
nucleotides so get the you call it c a a t g g c t a and so on at the dna so
now you go with this when i sits the c a
a codes for q and t g g code for w and this code for l so it get the amino acid
sequence right this is one and the second one it take this one out c then you
make in the another three so a a t g g c t a g and so on so now i check this table
for the amino acid so here you can see the amino acid and the third one if you take this
to c and a right and then it start with the c a and a t g g c t a g g t a c and
so on so now looked into the table each
corner right you will get the m a r t and so on what will happen if you more
again again the same the same will repeat right say if it t g g so it will
start from here right you don't have this amino acid but you start from same same
sequence so right so in this case you will get the three three different
likewise in the reverse frame so how do i get the reverse frame complement
right so let me have this five prime three prime now if you take the
complementary right then if you take the complementary then it will get the
three prime five prime direction right and you reverse it right then you get
the same strand they give from the five prime three prime then from this five
three three prime sequence then you make the codons right here c t c g g a t t
t and so on well this is the this will code for l g g a code for g and t t t
code for w f right so now you can get the amino acids likewise just you take
one out then you will get the second frame and the third frame so that total
will have sixth difference three forward frames and three reverse frames so you
can use this transeq right to get the sixth difference so here if you give the
input sequence and then here you have options whether this is the forward
reading frames or the reverse reading frames right are everything say if you go
with the all six frames right you go here a click on all six frames so you will
get the sequence right so this is the forward frame one this is two this is
three and the reverse one reverse two and reverse three so then if you go into
the same software emboss so if you see there are various tools available in the
emboss right so everyone have the different different values and they have
gives different algorithms to get the values so we can use any of these
different module available in the emboss right to get the characteristic features
of this dna sequences or rna sequences
Comments
Post a Comment