overview on bioinformatics

bioinformatics to deal with the applications of the different science in biological systems so we look into the different fields of science how the different fields contribute to the birth and the development of bioinformatics for example computer science can you tell one example how the computer science contribute to bioinformatics machine learning can be used probably and in computer science there are a lot of computer algorithms have been used for solving the biological problems correct so you can develop download several programming’s and you can extract the hidden data available in biological information’s also you can use different machine learning techniques and the algorithms to understand and to capture the information as well as for the prediction purposes so can you tell one example how the mathematics or statistics is used to the development of bioinformatics

example would be use of statistics for an example in the use of plot they plot the phi and psi angles of protein by whole correct so you can use mathematics right to derive some principles and relate the data for example the protein sequences or how the distribution of the rest use in the plot and so on using this mathematics and you verify the data whether you obtain any model so whether they are statistical significant or not you can relate with the correlation analysis you can relate the regression techniques as well as you can see whether the data are statistically significant or not so now if you use the information technology how the information technology contribute to the development of mathematics there has been a increased computational resources over the decades which is used to do a large scale data analysis correct so you can use for a logic analysis you can develop the online resources right you can see the computer storage and so on in this case that will enhance the applications of bioinformatics to various fields likewise physics if you talk about physics so you can

see the concept of various types of interactions like electrostatic interactions interaction auto interactions how these interactions are important to understand to the folding mechanism proteins for example if you take protein folding in the unfolded state of a protein so it is it is like wobbling and you can see it is very like a random coil confirmation

when this protein folds in the specific three dimensional structures it can form a specific three d structures and how a protein can attain a specific three d structures from its sequence right this can be explained by various types of interactions like disulphide bonds electrostatic interactions front verse interactions so understand the principles governing the folding state of a protein it requires the physical concepts so you can use physics to understand the mechanism of the folding of in a of a proteins so we consider all the fields if we say life science computing maths stat information technology or physics or chemistry which one is the some major field for the birth of bioinformatics life sciences life sciences right because we need the data so without data even if you have several fields we cannot apply it to different data right so we need a specific data so data where shall we get the data so we can get the date from biological experiments so if we look into this life sciences there are produced a lot of data right on a various aspects such as the macromolecule sequences for example dna sequence or protein sequence right and the structures protein structures dna structures complex structures and so on different expression profiles different pathways and so on so how the bioinformatics is used to understand the data so bioinformatics is help to acquire the data to manage the data and to analyze the data and to understand the data right so bioinformatics is the major field right to understand the concepts and understand to the hidden information available by the experiments produced in life sciences so now what are the various aspects how the bioinformatics is grown what are the various applications of bioinformatics how the bioinformatics contribute a society that there are various aspects so i briefly so i briefly i put into five bullet points the first one is well organized databases the biologists they do the experiments and they produce data and theyt publish the data and the literature it is very important to collect all the information and put in the form of database for example if data are available scattered here and there right so it is very important to collect all the information and put in a proper form right and then you give some options to extract the data from this database this is what the bioinformatics do to develop a plenty of databases which are well organized or in computable form once we have to data right submission number of data which is very essential and it is required for any analysis right otherwise you do not get any statistical significant data so second option is once you have the database you can derive hypothesis what will happen what is the relationship between some specific features as well as any function if you know the function or if you know any specific characteristics of any biological systems what makes these characteristics to a particular systems right so here we try to develop some features right and using these features you can link the function of any biological system right for example if you take a protein sequences or protein structures there are different types proteins say for example if you get a protein structures globular proteins membrane proteins and different types of globular proteins say alpha proteins beta proteins and so on and whether we can identify these types of proteins from a pool of sequences right so you can left side you can see the sequences because if you have sequence you know the information regarding the amino acid residues right so you can get the number of residues of which type so there are twenty different types of residues right present in these proteins and you can relate these fields there is dominant of some specific residues for example hydrophobic residues then you can say that this could be a protein likewise for any proteins having different functions or different type of diseases right so you can derive the hypothesis what is the basic principle for having this biological systems once you derive the hypothesis to understand ok these are the major factors for any specific systems right the next step is whether we are able to describe any algorithm whether we are be able to derive any algorithm right so if you see the features and if you see the functions and if you carry the relationship so here you have the features this side here you have the function right whether we can relate these features and function then what is the mathematical equation to characterize the function in terms of features right so we can do a function right to understand the function from the features these when you do these we can make the algorithm once algorithmic study then we can use it for public in this case we use web servers or online applications right with the earlier days when the internet was not fast enough so at that time see everyone they create the servers they create their own algorithm they keep themselves it is difficult to transfer but currently due to the advancements of these computers and biology and computational techniques and fast internet facilities several observers have been developed to give the applications to the others right in our laboratory also we have developed various tools which are widely used in the literature so many people use these databases and as well as the tools try to understand any biological systems so the fourth one so i will little bit discuss about the virtual screening so for example in the case of drug design currently it is very popular because there are a lot of small molecules are available in the literature and the people are affected with the several types of diseases like the cardiovascular diseases cancer and so on and currently were developed with the chikungunya dengue and so on so in all these cases to identify drugs they try to find a target and then we see what are the functions of the particular target what are the actions what are the important residues right and then we try to inhibit

that activity so that we can reduce these disease so here is one example for the structure based drug design so if you have a protein so this is a target so here i show a target of c yes kinase because the these c yes kinases are very important for several cellular activities right so there are several kinases one is the c yes kinase here this is very important

for the colorectal cancer so this is an attractive target to define the inhibitor for the colorectal cancers to do this there are various options how to derive a particular to be an inhibitor right so it is a very large pool so how to derive it so in this case i will tell one example so finding a fish in a pond so if you see it is a pond here so can you see different fishes in a different ponds so if you want to catch a fish where will you put your net in the number one right number one or number two if you put you will get a fish if you put your net in number four so you will not get anything you only will spend much time you will not get anything right so if some of you will tell you that ok you are catching trying to catch a fish for long time ok so you try to use this one to put that in a particular side then if you do it and if you get a fish then you will be very happy right because you do not have to waste your time (Refer Time: 12:00) so this is the case so if for any disease so there are different compounds for example

compound one compound two compound three and so on if we take compound one and you try this is failed so it is not a drug and you have compound two this is also not a good drug and then compound three is probably a drug right there are millions of compounds if you look into the literature right so there are a lot of compounds for example if you

go to a zinc database there are thirty five million compounds and in the enamine database two point two million compounds and in the natural compounds in the chinese medicine thirty five thousand compounds so if you want to try one by one right when you try everything then the patient will die at that time second case if you try to use each compounds experimentally if will take long time it needs long manpower and also it needs it is lots of money right so in this case how to do it so among these thirty five million compounds if someone can

reduce to thirty five thousand compounds then the number of experiments will be reduced by one thousand times so if you instead of this two point two million compounds we have hundred compounds or thousand compounds then you can reduce enormously the search option how to do it so here is a solutions bioinformatics can do it because currently we have very fast computers and we have very good techniques so it can assist one to searching drug target and designing drug for many millions of compounds and with that second one is how to use hypothesis ok here i show you an example so here i have five of known values so experimentally known so we take the example of number one the point one so rice fifty percent wheat twenty five percent meat ten percent fruits ten percent and vegetables five percent like if you do this so then we can see that this is not controlled for example take the food pattern and weight control and go over the second one so rice thirty percent wheat five percent meat ten percent fruits thirty percent and vegetable twenty five percent in this case also it is not controlled and if you go for the third one rice thirty five percent wheat ten percent meat ten percent fruits seventy five percent and vegetables thirty percent here it is controlled likewise the five examples so now i have a test case ok this is a test case right to another one consumes twenty percent rice ten percent wheat fruits twenty percent meat ten percent vegetable forty percent right so now the question is here it is controlled or not what is the answer is controlled right it is correct so why it is controlled how do you know this is controlled can you tell one example the fifth point vegetables are vegetables vegetable here also are twenty five percent but it is this is not controlled but here vegetables is less so we can derive some principles you can derive some equation

series statistics right so show one example right we can say that if answer is controlled you are right ok so we can see the right hand wheat is less than thirty five percent and meat is less than fifteen percent and vegetables more than thirty percent so likewise you can derive several equations right you can properly study the initial data sets experimental data sets from this experimental data sets you can derive some some equations right some conditions where this will fit and apply these conditions to any set of data and then you can see whether that is controlled or not so likewise with bioinformatics can handle large amount of data and provide possible solutions ok so i will explain a little bit more about the virtual screening of the compounds here is show one example how we use the virtual screening to understand the drug design so it is shown an example here is the protein right this

is the c yes kinase in a protein so there are different domains so we can see one domain left side that is a domain s s three domain and s s two domain and here this is the and the here is the catalytic domain and here is a phospho relation side the tyrosine four and six and this c lobe the question is ok this is very is very important for the colorectal cancer this is a target so it is important to identify a probable hit target for this particular c yes kinase enzyme so this is one aspect here one side we have the protein so you have the target to c yes kinase and the other side so how to design a inhibitor so they have a library of enamine library you have two point two million compounds among the two point two million compounds how to choose the probable compounds which can be a lead compound for a drug so and if i do it for a two point two million compounds it can take long time it takes lot of money because it is compound cost of thirty to forty thousand rupees right so it will do this spend time to do for all the two point two million compounds so how to do that so in this case you can derive some methodology first you see whether the structure is known if the structure is known then you can use the particular structure if the structure is not known then you may need to model this structure and then we have to stick for the activation sides where are the activation sides they will again combine

and see this pockets which are the binding sides right here is the side so now we see two point two million compounds you check the features of all the compounds and make some conditions to fit with this particular a pocket right then you can use some molecular weight you can use the hydrogen or the acceptors right various options you take and then you eliminate the compounds finally you can use a virtual screening like docking you can do with these compounds and finally you derive some compounds so in two thousand four the turkey institute of technology organized a competition to identify a inhibitors for this particular target we also contributed in that right we it is identified about one twenty compounds and they tested fifty compounds summing to one twenty that and we showed that four compounds showed inhibition and one is the probable hit compound they continued the same in the next year two thousand fifteen right there are about two thousand compounds right they found five showed inhibitions and to are hits so in the down side of floor we can see this figure i show they how they and interact with the protein you can see the green ones right so the green shows the and the surroundings ones are the protein side so these they they have some specific interactions you can see the hydrogen bonds or the hydrophobic interactions and the interactions and they because of these interactions they tightly binds with the protein and they they act as an inhibitor for these particular kinase in the later classes i will explain about the more details on the structure based drug design so till now we discussed few aspects of bioinformatics so what are the asp different aspects we discussed databases the one is well organized databases bioinformatics contribute to organized databases right so then the second one computationally derived hypothesis when you have the data then you can develop several function right so to relate the features and the functions right and once we derived the hypothesis then we can develop several algorithms right for the prediction and then once we predict then we can make it as online applications in the form of web severs there are several web servers for example protein structure prediction protein function prediction right so how the dna can bend and how the dna can interact with the proteins and so on then the fourth one we discussed now regarding the virtual screening of compounds how they do the screening for the drug development and currently if you see the bioinformatics is widely applied in next generation sequence analysis now all are interested in personalized medicine few years ago it was very expensive to sequence it you know now it is very cheap to get a sequence right so everyone wants to see there you know and what are the proteins they have and what are the functions they will do and to understand what are the probability of having any specific mutation to your protein and so on so in this case now currently we have a lot of data right from obtained from next generation sequences right for example the illumine sequencing right so we can due to the advancements in the sequencing techniques now there are a lot of data available in the literature but the question is how to analyse the data so how to extract information from this specific sequences right so there are several ways to get the sequences they have a usage short reads and to get the final sequence for example if you some patients which are affected with a cancer or affected with the parkinson disease and alzheimer disease and so on how they different from healthy individuals so they get the data for the patients and they get the full sequence and they get the data from the individual healthy individuals and they compare so what how are the features what are the variations or the mutations right where the mutations are in the protein coding regions or non-coding regions and then they relate how these mutations are or these residues are important why are they are involved in different pathways and how they are influencing the different diseases they try to see the information and then they go with the treatment for example if you are affected with a cancer or any specific diseases they are treated so they go to the hospital they get the patients data as well as they get the information regarding drug and the drug response and they make a database you can do it right so from that information you can see that if in the specific variations ok this specific drug will work so we have this information then this will be helpful for the personalized medicine for different for different diseases likewise which is bioinformatics plays a major role on different aspects in human health as well as for medicine

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

इस लेक्चर में मैं बायोइनफॉर्मेटिक्स

उसके विभिन्न पहलुओं उसके उपयोग तथा जैविक

या बायोलॉजिकल सिस्टम की जता के बारे में

बताऊंगा।

आगामी लेक्चर में मैं विभिन्न पहलुओं

का विस्तार से वर्णन करूंगा ।

इस कोर्स में मैं मेरे द्वारा लिखी

गई बुक प्रोटीन बायोइनफॉर्मेटिक्स जो 2010 में इसीवियरऔर

एकेडमिक प्रेस[Elsevier and Academic Press] मैं प्रकाशित

की गई थी और क्रेन एंड रेमर की 2006 में

प्रकाशित पुस्तक.फंडामेंटल कंसेप्ट [Fundamentals concepts

of Bioinfermatics]बायोइनफॉर्मेटिक्स का अनुसरण करूंगा

बायोइनफॉर्मेटिक्स के सामान्य अध्ययन

के लिए मैं क्रेन एंड रे मर की पुस्तक

का उपयोग करूंगा औरप्रोटीन के विभिन्न

उपयोगों और पहलू जैसे प्रोटीन सीक्वेंस

एनालिसिस, प्रोटीन स्ट्रक्चर एनालिसिस,

प्रोटीन स्ट्रक्चर प्रेडिक्शन, प्रोटीन

फोल्डिंग को समझाने के लिए मैं स्वयं

की पुस्तक का उपयोग करूंगा बायोइनफॉर्मेटिक्स

क्या है ? तुम इसे दो भाग में विभाजन

करने पर बायो +इनफॉर्मेटिक्स, तो बायोलॉजिकल सिस्टम

में इनफॉर्मेटिक्स का उपयोग समझाता

है|

. मैं बायोइनफॉर्मेटिक्स

को मध्य भाग में रखूंगा।

इस तरह यह विज्ञान की वह शाखा है जिसमेंtबायोलॉजी

मुख्य भूमिका मैं है और इसे ,साइंस की

दूसरी शाखाओं जैसे कंप्यूटर साइंस,

इनफार्मेशन टेक्नोलॉजी और अन्य के साथ जोड़कर

एक डिसिप्लिन या शाखा बना दिया गया

है जिसमें बायोलॉजिकल डाटा को स्टैटिसटिकल

टेक्निक या जैव सांख्यिकी तथा कंप्यूटर एल्गोरिदम

के द्वारा विश्लेषण किया जाता है तो, यदि

आप चित्र में देखें तो, मैंने बायोइनफॉर्मेटिक्सको

सभी क्षेत्रों के मध्य में रखा है तथा

यह सभी बायोइनफॉर्मेटिक्स से जुड़ रहे हैं।पिछले

कुछ दशकों से बायोलॉजिकल सिस्टम का एनालिसिस

छोटे रूप में होना शुरू हुआ था, 1979मैं

पाओलीन हांजबेगा[ PaulienHogeweg] ने सर्वप्रथम

बायोइनफॉर्मेटिक्स शब्द दिया।यह जीव

विज्ञान या बायोलॉजी में विज्ञान की दूसरी

शाखा का उपयोग विश्लेषण करने को समझाता है।

तो, हम देखेंगे की विज्ञान की विभिन्न

शाखाएं बायोइनफॉर्मेटिक्स के विकास में किस

तरह महत्वपूर्ण हैबायोइनफॉर्मेटिक्स में कंप्यूटर साइंस

के उपयोग का एक उदाहरण लेते हैं।

छात्र: मशीन लर्निंग का उपयोग किया जा

सकता है और कंप्यूटर साइंस में बहुत सारा

कंप्यूटर एलॉग्र्रिदम है जिसका उपयोग बायोलॉजिकल

प्रॉब्लम्स को सॉल्व करने में उपयोग किया

जा सकता है|,

सही, तो तुम बहुत सारे प्रोग्रामिंग को

विकसित कर सकते हो और उन छुपे [hidden]हुएडाटा

को प्राप्त कर सकते हो जो बायोलॉजिकल

इनफार्मेशन में उपलब्ध है।

इसके साथ ही तुम विभिन्न मशीन लर्निंग टेक्निक

और एलॉग्र्रिदम का भी उपयोग समझने, इंफॉर्मेशन

को संरक्षित करने और भविष्यवाणी करने

के लिए कर सकते हो।

क्या आप गणित या सांख्यिकी के एक उपयोग के बारे

में बता सकते हैं छात्र: रामाचंद्रन

प्लॉट मैं उन्होंने प्रोटीन एंगल बनाने

में सांख्यिकी का उपयोग किया

सही,अब आप गणित का उपयोग कर कुछ सिद्धांत

बना सकते हैं और उन्हें डाटा से संबंधित

कर सकते हैं । उदाहरण के तौर पर प्रोटीन

सीक्वेंसेस या रेसिड्यू का रामाचंद्रन प्लॉट

में डिस्ट्रीब्यूशन को गणित द्वारा समझा

सकते हैं । गणित के उपयोग से किसी भी

मॉडल के डाटा को वेरीफाई किया जा सकता है।

इस तरह को रिलेशन एनालिसिस और रिग्रेशन

टेक्निक का उपयोग कर किसी भी डाटा का

यह पता लगाया जा सकता है कि वह स्टैटिसटिकल

सिग्निफिकेंट या महत्वपूर्ण है की

नहीं तो इस तरह इनफार्मेशन टेक्नोलॉजी का उपयोग

मैथमेटिक्स के विकास में भी किया जा सकता

है छात्र: लार्ज स्केल

डाटा एनालिसिस में कंप्यूटेशनल रिसोर्सेज

का उपयोग दशकों से किया जा रहा है।सही

तो आप इसका उपयोग लार्ज स्केल एनालिसिस

में कर सकते हैं।

आप ऑनलाइन रिसोर्सेज भी विकसित कर सकते

हैं।इस तरह आप कंप्यूटर स्टोरेज और भी बहुत

कुछ देखकर बायोइनफॉर्मेटिक्स का उपयोग अलग अलग

क्षेत्रों में कर सकते हैं; जैसे यदि

फ़िज़िक्स की बात करें तो बहुत प्रकार

के इंटरेक्शन देखे जा सकते हैं जैसेवेंडर

बॉल्स इंटरेक्शन , हाइड्रोफोबिक इंटरेक्शंस

यह इंटरेक्शंस प्रोटीन फोल्डिंग को समझने

और प्रोटीन के अनफोल्डेड स्टेट को समझने में

मदद करते हैं तो यह वाबिलिंग के समान

है और रेंडम कॉल कन्फर्मेशन से बहुत मिलता-जुलता

है|

जब यह प्रोटीन विशेष थ्री डाइमेंशनल स्ट्रक्चर

के रूप में फोल्ड होता है तो यह स्पेसिफिक

3D स्ट्रक्चर या संरचना बनाता है और प्रोटीन

कैसे 3D संरचना और सीक्वेंस बनाता है

देखते हैं।

इससे विभिन्न प्रकार के इंटरेक्शंस जैसे

डाईसल्फाइड बॉन्ड्स, इलेक्ट्रोस्टेटिक

इंटरेक्शंस, वंडरवॉल इंटरेक्शंस आदि से

समझाया जा सकता है।

तो इस तरह प्रोटीन के फोल्डिंग स्टेट

के प्रमुख सिद्धांत समझे जा सकते हैं.

इस तरह ग्लोबुलर प्रोटीन के फोल्डिंग

की विधि को फिजिक्स के द्वारा समझा जा

सकता है|

इस तरह हम सभी क्षेत्रों का विचार कर सकते

हैं ,यदि हम लाइफ साइंस, कंप्यूटिंग, मैथ,

स्टेट , इनफार्मेशन टेक्नोलॉजी या फिजिक्स

या केमिस्ट्री कहें तो बायोइनफॉर्मेटिक

के जन्म के लिए मुख्य कौन सा क्षेत्र माना

जाएगा।

छात्र: लाइफ साइंसेज लाइफ साइंसेज सही

क्योंकि हमें डाटा की आवश्यकता होती

है तो सभी क्षेत्रों की उपलब्धि के बावजूद

बगैर डाटा के हम कुछ नहीं कर सकते हमें

यह डाटा बायोलॉजिकल एक्सपेरिमेंट के

द्वारा प्राप्त होता है।लाइफ साइंसेज

में विभिन्न प्रकार का डाटा प्राप्त

होता है जैसे बड़े मॉलिक्यूल सीक्वेंसेस

[macro molecule sequence]उदाहरण के तौर पर डीएनए सीक्वेंस

या प्रोटीन सीक्वेंस और विभिन्न संरचनाएं

जैसे डीएनए स्ट्रक्चर, प्रोटीन स्ट्रक्चर,

अन्य जटिल संरचनाएं विभिन्न पाथवे, विभिन्न

एक्सप्रेशन प्रोफाइल आदि।, तो बायोइनफॉर्मेटिक्स

डाटा को समझने में कैसे मदद करता है?

बायोइनफॉर्मेटिक्स डाटा प्राप्त करने

में, डाटा समझने में, संभालने में और एनालिसिस

करने में मदद करता है|इस

तरह बायोइनफॉर्मेटिक्स लाइफ साइंसेज में

एक्सपेरिमेंट्स के द्वारा प्राप्त

हिडन इंफॉर्मेशन को समझने में मदद

करता है

तो, बायो इनफॉर्मेटिक्स का विकास कैसे हुआ

इसके विभिन्न पहलुओं को समझते हैं. कैसे

बायोइनफॉर्मेटिक्स का उपयोग होता है,

और कैसे यह समाज में उपयोगी है, इसके विभिन्न

पहलू हैं।यहां मैं संक्षेप में 5 बुलेट

प्वाइंट्स में समझाता हूं, पहला पॉइंट ऑर्गेनाइज

डाटाबेस । बायोलॉजिस्ट प्रयोग कर डाटा प्राप्त

करते हैं वह इस डाटा को तथा लिटरेचर को

प्रकाशित करते हैं|

यह महत्वपूर्ण है की सभी इंफॉर्मेशन

को डेटाबेस के रूप में रखा जाए।

उदाहरण के तौर पर यदि डाटा फैला हुआ

हो तो उसे सही तरह से इकट्ठा कर उसे

उचित रूप से संभाला जाए|

और फिर इस डेटाबेस से महत्वपूर्ण जानकारी

निकालने के लिए कुछ विकल्प दिए जाना

चाहिए।इनफॉर्मेटिक्स डेटाबेस को अच्छी

तरह से कंप्यूटेबल फॉर्म[computable form] में

व्यवस्थित करने में मदद करती है

किसी भी एनालिसिस के लिए डाटा का पर्याप्त

मात्रा में उपलब्ध होना आवश्यक है अन्यथा

कोई स्टैटिसटिकल सिग्निफिकेंट डाटा

नहीं मिल पाएगा।दूसरी महत्वपूर्ण बात है

की एक बार डेटाबेस मिल जाए तो तुम्हें

एक परिकल्पना या हाइपोथेसिस derive करनी

होगी की विभिन्न विशेषताओं का कार्यों

के संपादन में क्या महत्व हो सकता है?

If यदिकिसी बायोलॉजिकल सिस्टम का विशेष

लक्षण या कार्य मालूम हो तो हम कुछ विशेष

लक्षण या फीचर्स विकसित कर सकते हैं

और इन फीचर्स का उपयोग कर इसे हम बायोलॉजिकल

सिस्टम से लिंक कर सकते हैं उदाहरण

के तौर पर यदि हम प्रोटीन सीक्वेंसेस या प्रोटीन

की संरचना ले तो प्रोटींस विभिन्न

प्रकार के होते हैं, जैसे ग्लोबुलर प्रोटीन,

मेंब्रेन प्रोटींस ,विभिन्न प्रकार

के ग्लोबुलर प्रोटींस जैसे अल्फा प्रोटीन,

बीटा प्रोटींस आदि. इन्हें हम प्रोटीन

के ग्रुप्स से आईडेंटिफाई कर सकते हैं तो आप

लेफ्ट साइड में सीक्वेंस देख सकते हैं यदि

आपके पास सीक्वेंस हो और एमिनो एसिड

रेसिड्यूज के बारे में जानकारी हो तो

आप रेसिड्यूज के प्रकारों का नंबर

जान सकते हैं | लगभग 20 प्रकार के रेसिड्यूज

इन प्रोटींस में पाए जाते हैं और आप

इस फील्ड में किस विशेष रेसिड्यू का

प्रभाव है या डोमिनेंस है पता कर सकते हैं|

उदाहरण के तौर पर, हाइड्रोफोबिक रेसिड्यूज,

तो आप कह सकते हैं यह प्रोटीन है; इसी

तरह से प्रत्येक प्रोटीन का अलग फंक्शन

है या अलग-अलग तरह की बीमारियों का

कारण हो सकता है।तो इस तरह से आप एक हाइपोथिसिस

या परिकल्पना निर्धारित कर सकते हैं की बायोलॉजिकल

सिस्टम का बेसिक प्रिंसिपल क्या है

एक बार जब आप एक परिकल्पना निर्धारित कर लेते

हैंकी किसी विशेष सिस्टम का मेजर फैक्टर

क्या है तो अगला स्टेप होता है कि. क्या हम

किसी एलोगरिथम को डिस्क्राइब कर सकते

हैं।

तो यदि आप विशेषताएं देखें और कार्यों

को देखें और इनमें संबंध स्थापित कर

सकते हैं|

यहां इस साइड में विशेषताएं और साथ

में कार्य दिए हैं।

इन्हें संबंधित करने के लिए मैथमेटिकल

इक्वेशन का उपयोग किया जाता है इस तरह

हम विशेषताओं को देखते हुए कार्य

को समझ सकते हैं।

यह संबंध देखकर एल्गोरिथ्म बना लेने से एल्गोरिथमिक

स्टडी आसान हो जाती है, इसके द्वारा सामान्य

जन के लिए इस तरह की स्टडी उपलब्ध हो

जाती है।

शुरुआत के दिनों में वेब सर्वर का

उपयोग किया जाता था जब इंटरनेट की

स्पीड इतनी तेज नहीं थी|उस

समय प्रत्येक सरवर बनाता था और अपने

स्वयं का एल्गोरिथ्म क्रिएट करता थाइसे

ट्रांसफर करना मुश्किल होता था।किंतु अब

आधुनिक कंप्यूटर और बायोलॉजी तथा

कंप्यूटेशनल टेक्निक, फास्ट इंटरनेट फैसिलिटी

होने सेकई वेब सर्वर विकसित किए गए हैं

जो दूसरों के लिए भी उपयोगी होते हैं।

.हमारी

लैब में भी कई टूल्स विकसित किए गए हैं

जिन्हें लिटरेचर में उपयोग किया जाता

है। . बहुत

से व्यक्ति इन डाटाबेस और टूल्स का इस्तेमाल

बायोलॉजिकल सिस्टम को समझने के लिए कर

रहे हैं|चौथा पॉइंट

है वर्चुअल स्क्रीनिंग उदाहरण के तौर पर

ड्रग डिजाइनिंग, यह आजकल बहुत प्रचलित

है।क्योंकि लिटरेचर में बहुत से छोटे

मॉलिक्यूल उपलब्ध है और लोग बहुत सी

बीमारियां जैसे कार्डियोवैस्कुलर डिसीज,कैंसर आदि

से ग्रसित है, आजकल चिकनगुनिया और डेंगू

जैसी बीमारियां हो रही है |इन सभी

केसेस में ड्रग आईडेंटिफाई की जा रही है जो सीधे

टारगेट पर एक्शन करती है|. इन टारगेट

पर

मॉलिक्यूल किस तरह एक्शन करेंगे और

क्या इंपॉर्टेंट रेसिड्यूज और उन्हें

किस तरह से निष्क्रिय किया जा सकता है जिससे

बीमारियों को कम किया जा सके|

तो यहां पर

स्ट्रक्चर आधारित ड्रग डिजाइन का एक

उदाहरण है। यदि आपके पास

प्रोटीन है तो यह टारगेट है।. तो यदि

मैंcYes Kinaseके टारगेट को प्रदर्शित करूं

और जोकि बहुत सारी कोशिकीय गतिविधियों

के लिए आवश्यक colorectal यह cYesKinaseI कोलोरेक्टल

कैंसर में बहुत महत्वपूर्ण है।यह कोलोरेक्टल

कैंसरcancers.मैं महत्वपूर्ण और उपयोगी टारगेट

प्रतिरोधक हो सकता है।

यह करने के लिए पर्टिकुलर ligand inhibitor को समझना आवश्यक

है|

इसके लिए बहुत सारे ऑप्शंस यह एक बड़ी

प्रक्रिया है इसे मैं एक उदाहरण से

समझाता हूं।

उदाहरण - तालाब में मछली को ढूंढना

यदि आप यहां एक तालाब देखें ,तो क्या आप

अलग-अलग तलाब में अलग-अलग मछलियां

देख रहे हैं यदि आप मछली पकड़ना चाहे

तो आपको जाल डालना पड़ेगा A

यदि आप यह सोचते रह जाते हैं की नंबर

1 या नंबर दो मैं जाल डाला जाए या नंबर

4 में डाला जाए तो आप केवल अपना समय

बर्बाद करेंगे और कुछ नहीं पाएंगे

अब यदि कोई आपको यह बता दे की इस विशेष

जगह से फिश मिल सकती है और उसी जगह पर आप

नेट डालकर फिश या मछली प्राप्त कर

लेंगे तो आप बहुत प्रसन्न होंगे क्योंकि

आपका समय खराब नहीं होगा।

तो ऐसी बात किसी भी बीमारी के लिए लागू

होती है।

उदाहरण के तौर पर बहुत से अलग-अलग कंपाउंड

हैं जैसे कंपाउंड 1, कंपाउंड 2, कंपाउंड

3 और इसी तरह से।

यदि हम कंपाउंड वन ले और ट्राई करें

यह असफल है तो यह ड्रग नहीं है यदि हम कंपाउंड2

ले वह भी अच्छी ड्रग ना हो तो कंपाउंड

3 ड्रग हो सकता है, इस तरह हजारों कंपाउंड

होते हैं यदि लिटरेचर में देखा जाए तो यदि

जिंक डाटाबेस देखे जाएं तो 35 मिलियन

कंपाउंड है और enamine डेटाबेस में 2.2 मिलियन

कंपाउंड.

तथा चाइनीस मेडिसिन मैं 35000 प्राकृतिक

कंपाउंड्स होते हैं इस तरह यदि रोगी में

यह कंपाउंड्स एक के बाद एक ट्राई किए

जाएं तो इतना समय लगेगा की पेशेंट

तब तक मर जाएगा।

दूसरे रूप में यदि इन कंपाउंड की प्रयोगशाला

में जांच की जाए तो बहुत सारा पैसा और

मेन पावर की आवश्यकता होगी।

तो यदि इस स्थिति में कोई 35 मिलियन

कंपाउंड को 35000 कंपाउंड्स में रिड्यूस कर दे

प्रयोगों का वक्त 1000 टाइम कम हो जाएगा।अतः

यदि 2.2 मिलियन कंपाउंड की जगह100 या 1000 कंपाउंड

हां तो आप का समय काफी बच सकता है।

बायोइनफॉर्मेटिक्स के उपयोग से यह समय

बच सकता है क्योंकि हमारे पास बहुत फास्ट

कंप्यूटर और बहुत अच्छी टेक्निक्स

हैं जो ड्रग टारगेटसर्चिंग में और ड्रग डिजाइनिंग

मैं मदद कर सकती हैं।

अब दूसरी बात परिकल्पना या हाइपोथेसिस का

किस तरह उपयोग किया जाए।.

यहां मैं एक उदाहरण प्रस्तुत कर रहा

हूं

तो, यहां मेरे पास 5 ज्ञात वैल्यू है

नंबर वन 0.1 यहां चावल 50%, गेहूं 25%, मीट 10%फल

10% सब्जियां 5% है यदि हम इसे देखें तो यह

food पैटर्न और वेट कंट्रोल के लिए कंट्रोल उदाहरण

नहीं हो सकता।

यदि चावल 30% गेहूं 5% मीट 10% फल 30% और सब्जियां

25 परसेंट हो तो भी यह कंट्रोल्ड वैल्यू

नहीं है A So, अब तीसरी स्थिति

में चावल 25% गेहूं 10% मीट 10%फल 25 परसेंट

और सब्जियां 35% है तो यह कंट्रोल्ड

वैल्यू है अब मेरे पास टेस्ट केस है

जो 20% चावल 10 परसेंट गेहूं 20% फल मीट 10% और

सब्जियां 20 परसेंट खाता हो तो प्रश्न

यह है कि यह कंट्रोल है कि नहीं?

यह कंट्रोल है किंतु यह कंट्रोल क्यों

है इसे कैसे पता किया जा सकता है क्या तुम

एक उदाहरण दे सकते हो छात्र; सब्जियां

ज्यादा ले रहा है. वेजिटेबल हाई है;

यहां पर भी वेजिटेबल 25% है

किंतु यह कंट्रोल्ड नहीं है, अतः हम एक

सिद्धांत बनाते हैं A

यहां पर एक इक्वेशन सीरीज स्टैटिसटिक्स

बनाते हैं यदि उत्तर कंट्रोल है तो तुम

सही होगे।

हम देखते हैं दाहिनी हाथ की तरफ गेहूं

35% से कम और मीट 15% से कम वेजिटेबल 35% से

ज्यादा है, इस तरह तुम कई इक्वेशन derive

कर सकते हो, तुम इनिशियल डाटासेट को अच्छी

तरह से स्टडी कर सकते हो एक्सपेरिमेंटल

डाटा सेट के भी स्टडी कर सकते हो और इस एक्सपेरिमेंटल

डाटा सेट से कुछ इक्वेशंसderiveकर सकते हो कुछ कंडीशन

इसमें फिट हो सकती हैं उन्हें किसी

भी डाटासेट में अप्लाई किया जा सकता है और

तुम देख सकते हो कि यह कंट्रोल है या

नहीं।

इस तरह बायोइनफॉर्मेटिक्स काफी बड़े अमाउंट

के डाटा को भी हैंडल कर सकता है और उचित

सलूशन भी दे सकता है अब मैं वर्चुअल

स्क्रीनिंग के बारे में थोड़ा और बताऊंगा

। यहां पर एक उदाहरण

प्रस्तुत है जो वर्चुअल स्क्रीनिंग के द्वारा

होने वाली ड्रग डिजाइन के बारे में समझाएगा।

यह एक उदाहरण है जिसमें प्रोटीन है और प्रोटीन

मेंcYes Kinase हैतो, यहां विभिन्न डोमैंस है

एक डोमेन बाएं तरफ में है यह SH3 डोमेन

है । यहां SH2 डोमेन भी है,

इसके अलावा टायरोसिन Kinase डोमेन और कैटालिटिक

डोमेन भी है इसके अलावा एक फास्फोर्यलेशन

साइट टायरोसिन 416भी है और यह लूप है।

यहां प्रश्न है की यह कोलोरेक्टल कैंसर

के लिए बहुत बहुत इंपॉर्टेंट है यह

टारगेट है अतः यह महत्वपूर्ण है कीcYES

Kinase एंजाइम के लिए प्रोबेबल हिट टारगेट

को पहचाना जाए।

यह एक एस्पेक्ट हो सकता है यहां एक तरफ

प्रोटीन है । तो.दूसरी तरफ हमेंcYES

Kinase को टारगेट करना है तो inhibitor को कैसे

डिजाइन किया जाए उनकीEnamine लाइब्रेरी

में 2.2 मिलियन कंपाउंड्स है।

इन 2.2 मिलियन कंपाउंड्स में से कैसे चुने?

प्रोबेबल कंपाउंड जो ड्रग के लिए उपयोगी

हो कैसे प्राप्त हो?

यदि सभी 2.2 मिलन कंपाउंड्स लिए जाए तो यह बहुत

महंगा होगा क्योंकि एक कंपाउंड 30000 से

₹40000 तब का होता है और प्रत्येक के प्रयोग

में लंबा समय लगेगा ।

तो क्या किया जाए?

इस केस में हम कुछ मेथाडोलॉजी विकसित

कर सकते हैं, पहले यह देखेंगे की स्ट्रक्चर

ज्ञात है क्या यदि स्ट्रक्चर मालूम

हो तो उसे उपयोग किया जा सकता है ।

यदि स्ट्रक्चर ना मालूम हो तो इस स्ट्रक्चर

को मॉडल बनाकर एक्टिवेशन साइट पर स्टिक करना

होगा,.

इस तरह 2.2 मिलियन कंपाउंड्स के फीचर्स को चेक

किया जा सकता है और इस तरह की कंडीशन

बनाई जाती है कि कि कोई कंपाउंड फिट

हो सके इसके पश्चात मॉलिक्यूलर वेट पर

ध्यान दिया जाता है, हाइड्रोजन बांड

डोनर या acceptor का भी उपयोग किया जा सकता

है । कंपाउंड्स को अलग

करने के लिए कोई भी ऑप्शन लिया जा सकता

है।

अंत में वर्चुअल स्क्रीनिंग जैसे

docking आदि की जाती है और कुछ कंपाउंड्स

को derive कर लिया जाता है|

में टोक्यो इंस्टिट्यूट ऑफ़ टेक्नोलॉजी ने

एक कंपटीशन ऑर्गेनाइज किया था, जो इस विशेष

टारगेट के लिए inhibitors को पहचानने के लिए

था|

हमने भी इसमें भाग लिया था, 120 कंपाउंड

आईडेंटिफाई किए गए इनमें से 50 कंपाउंड्स

टेस्ट किए गए, हमने 4 कंपाउंड को inhibitor देखा

और उनमें से एक शायद हिट कंपाउंड है|

2015 में उन्होंने पुनः कंपटीशन किया और

लगभग 2000 कंपाउंड देखे गए|

तो, नीचे की तरफ चित्र में देख सकते हैं

किligand कैसे प्रोटीन से इंटरेक्ट करते

हैं|

.चित्र में ligand ग्रीन से दिख रहे हैं और

उसके चारों तरफ प्रोटीन साइड है इस तरह यह

कुछ विशेष इंटरेक्शन दिखाते हैं|

यह इंटरेक्शन हाइड्रोजन बांड या हाइड्रोफोबिक

इंटरेक्शंस या वंडरवॉल इंटरेक्शन हो सकता

है इनके द्वारा यह प्रोटीन से टाइटली

जुड़ जाते हैं और इस विशेष kinase के लिएinhibitor

की तरह कार्य करते हैं|

अगली कक्षा में मैं स्ट्रक्चर आधारित

ड्रग डिजाइन के बारे में विस्तार से बताऊंगा|

अभी तक हमने बायोइनफॉर्मेटिक्स के कुछ aspect डिस्कस

किए|

बायोइनफॉर्मेटिक ऑर्गेनाइज्ड डाटाबेस

देता है और इसे कंप्यूटेशन से समझाया जा सकता

ह यदि हाइपोथेसिस या परिकल्पना बना

ली जाए तो उससे हम कई फंक्शन विकसित

कर सकते हैं|

तो यदि फीचर्स और फंक्शन को जोड़कर

परिकल्पना बनाई जाए तो बहुत सारे एल्गोरिदम

विकसित किए जा सकते हैं जिसे प्रेडिक्शन

के लिए उपयोग किया जा सकता है इसे ऑनलाइन

एप्लीकेशन के रूप में web servers में डाला

जा सकता है, जैसे प्रोटीन स्ट्रक्चर प्रेडिक्शन

,प्रोटीन फंक्शन प्रिडिक्शन कुछ वेब

सरवर के उदाहरण हैं इस तरह डीएनए बैंड

और डीएनए इंटरेक्शन प्रोटीन के साथ भी

देखा जा सकता है|

चौथा पॉइंट हमने वर्चुअल स्क्रीनिंग

का ड्रग विकास में महत्त्व समझा|

आजकल बायोइनफॉर्मेटिक्स का उपयोग नेक्स्ट

जेनरेशन सीक्वेंस एनालिसिस में भी

हो रहा है|

आज सभी लोग व्यक्तिगत मेडिसिन में रुचि

रखते हैं कुछ समय पहले यह बहुत कठिन

था कि जीनोम सीक्वेंस किया जाए अब यह बहुत

सस्ता हो गया है अतः आज सभी अपने जीनोम

के बारे में जानना चाहते हैं और उनमें

क्या प्रोटींस हैं और यह क्या कार्य

कर रहे हैं इन प्रोटींस में कौन सा म्यूटेशन

हो सकता है इस तरह अब बहुत सारा

डाटा है जो नेक्स्ट जनरेशन सीक्वेंसेस

को पाने में उपयोगी होगा|

उदाहरण के तौर पर इल्यूमिना सीक्वेंसिंग|

तो हम सीक्वेंसिंग में और ज्यादा एडवांसमेंट

कर सकते हैं अब लिटरेचर में बहुत सारा डाटा

उपलब्ध है किंतु प्रश्न यह उठता है

की इसे कैसे एनालाइज किया जा इससे इंफॉर्मेशन

कैसे निकाली जाए|

सीक्वेंस को प्राप्त करने की बहुत सारे

रास्ते हैं, बहुत कम रीड्स के द्वारा

फाइनल सीक्वेंस प्राप्त किया जा सकता है |

उदाहरण के तौर पर यदि कोई पेशेंट [रोगी]

कैंसर या पार्किंसन या अल्जाइमर से प्रभावित

है, तो इसके जीनोम और प्रोटियोम को

अध्ययन कर स्वस्थ व्यक्ति के जीनोम

से भिन्नता का अध्ययन किया जा सकता है ,और

वेरिएशंस तथा म्यूटेशन को विशेष रूप से देखा

जा सकता है।

यह म्यूटेशन प्रोटीन कोडिंग रीजंस या

नॉनकोडिंग रीजंस में है यह भी देखा

जा सकता है|

फिर यह म्यूटेशंस या रेसिड्यूज के

महत्व को यह देख कर पहचाना जा सकता है

की क्या यह विभिन्न पाथवेज में इंवॉल्व

है और यह कैसे विभिन्न बीमारियों को प्रभावित

करते हैं|

इंफॉर्मेशन से आगे रोग निदान करने में

सहायता मिल सकती है।

उदाहरण के तौर पर यदि आप कैंसर से पीड़ित

हैं या किसी विशेष बीमारी से ग्रस्त

है तो आपका रोग निदान हो सकता है।हॉस्पिटल

में जाकर वैज्ञानिक रोगी का डाटा और ड्रग

तथा ड्रग रिस्पांस के बारे में जानकारी

लेकर डाटाबेस तैयार कर सकते हैं

इस जानकारी के आधार पर विशेष वेरिएशन

का अध्ययन किया जा सकता है और विशेष

दवाई या स्पेसिफिक ड्रग दी जा सकती है।

इस तरह रोगी की पूर्ण जानकारी व्यक्तिगत

मेडिसिन के लिए बहुत उपयोगी हो सकती है।

इस तरह बायोइनफॉर्मेटिक्स मनुष्यों के स्वास्थ्य

में विशेष भूमिका अदा करता है साथ ही

मेडिसिन के क्षेत्र में भी विशेष महत्वपूर्ण

है .

Search This Blog

ICAR ASRB NET BIOINFORMATICS

overview on bioinformatics

Comments

Post a Comment

Popular posts from this blog

Unit 1 Computing

Database Systems (ICAR ASRB NET Bioinformatics Unit 3)

ICAR ASRB NET – Bioinformatics 2023 model paper