|
Introduction
Biology is in the middle of a major paradigm shift
driven by computing technology. Although it is already an informational science
in many respects, the field has been rapidly becoming much more computational
and analytical. Rapid progress in genetics and biochemistry research combined
with the tools provided by modern biotechnology has generated massive volumes of
genetic and protein sequence data.
Bioinformatics has been defined as a means for
analysing, comparing, graphically displaying, modeling, storing, systemising,
searching, and ultimately distributing biological information, which includes
sequences, structures, function, and phylogeny. Thus bioinformatics may be
defined as a discipline that generates computational tools, databases, and
methods to support genomic and postgenomic research. It comprises the study of
DNA structure and function, gene and protein expression, protein production,
structure and function, genetic regulatory systems, and clinical applications.
Bioinformatics needs the expertise from Computer Science, Mathematics,
Statistics, Medicine, and Biology.
Knowledge Base in Biology
In the last 10 years or so, numerous innovations have
seen light and the consequence is the development of a new biological research
paradigm, one that is information-heavy and computer-driven. As the genetic
information is being made as computerized databases and their sizes are steadily
growing, molecular biologists need effective and efficient computational tools
to store and retrieve the cognate information such as bibliographic or
biological information from the databases, to analyze the sequence patterns they
contain and to extract the biological knowledge the sequences have. On the other
hand, there is a strong need for mathematical methods and computational
techniques for challenging computational tasks such as predicting the
three-dimensional structure of the molecules the sequences represent, and to
construct evolutionary trees from the sequence data. These tools will also be
used to learn basic facts about biology such which sequences of DNA are used to
code proteins , which other combinations of DNA are not used for protein
synthesis, for greater understanding of gens and how they influence diseases.
Biology employs a digital language for represening its
information using the four basic alphabets (A, C, G, T). All the chromosomes in
an organism' cell have been represented and being identified using these
alphabets. The demanding challenge here is to determine how this digital
language of the chromosomes is being converted into the three-dimensional and
sometimes four-dimensional languages of living and breathing organisms.
Information Technology in Biology
As it was found that performing all these
above-mentioned tasks manually is nearly impossible due to the massive volumes
of biological data and the preciseness of works, it became mandatory to use
computers for these purposes. Thus this subject of bioinformatics deals with
designing and deploying efficient software tools for accomplishing the above
quoted tasks in a fast and precise manner. So, bridging the gap between the real
world of biology and precise logical nature of computers requires an
interdisciplinary perspective.
Software and Hardware Advancements in Biology
The tools of computer science, statistics, and
mathematics are very critical for studying biology as an informational science
subject.
Some of the recent advances happened include improved
DNA sequencing methods, new approaches to identify protein structure, and
revolutionary methods to monitor the expression of many genes in parallel. The
design of techniques able to deal with different sources of incomplete and noisy
data has become another crucial goal for the bioinformatics community. In
addition, there is the need to implement computational solutions based on
theoretical frameworks to allow scientists to perform complex inferences about
the phenomena under study.
Genomics in the recent past has triggered the
development of high-throughput instrumentation for DNA sequencing, DNA arrays,
genotyping, proteomics, etc. These instruments have catalyzed a new type of
science for biology termed discovery science.
Human Genome Project - An Introduction
The Human Genome Project has encouraged a series of
paradigm changes to the view that biology is an informational science. The draft
of the human genome has given us a genetics parts list of what is necessary for
building a human: approximately 35,000 genes, their regulatory regions, a
lexicon of motifs that are the building block components of proteins and genes,
and access to the human variability that make us each different from one user.
Genomes - Discovering Methodology and Study
Discovery science defines all of the elements in a
biological system. For example, sequence of the genome, identification and
quantitation of all of the mRNAs or proteins in a particular cell type -
respectively, genome, transcriptome, and the proteome. Discovery science creates
databases of information, in contrast to the more classical hypothesis-driven
science that formulates hypotheses and attempts to test them. The
high-throughput tools both provide the means for discovery science and can assay
how global information sets, for example, transcriptomes or protemes change as
systems are perturbed.
The genomes of the model organisms yeast, worm, fly
etc., have demonstrated the fundamental conservation among all living organisms
of the basic informational pathways. Hence systems can be perturbed in model
organisms to gain insight into their functioning, and these data will provide
fundamental insights into human biology. From the genome, the information
pathways and networks can be extracted to begin understanding their logic of
life. Further more, different genomes can be compared to identify similarities
and differences in the strategies for the logic of life and these provide
fundamental insights into development, physiology and evolution. The first
eukaryotic genome that has been fully sequenced and annotated is Saccharomyces
cerevisiae. This highly helps to develop biological and computational tools for
genomic and postgenomic research.
In the era of automated DNA sequencing and
revolutionary advances in DNA sequence analysis, the attention of many
researchers is now shifting away from the study of single genes or small gene
clusters to whole genome analyses. Knowing the complete sequence of a genome is
only the first step in understanding how the myriad of information contained
within the genes is transcribed and ultimately translated into functional
proteins. In the post genomic era, functional genomic and proteomic studies
helps to obtain an image of the dynamic cell.
System
Biology
Biology is a highly informational science. There are
mainly two types of biological information.
- The information of genes or proteins, which are the
molecular machines of life
- The information of the regularity networks that
coordinate and specify the expression patterns of the genes and proteins.
All biological information is hierarchical. Initially
DNA will change over to mRNA, which in turn goes to protein. Proteins enacts
protein interactions, which creates some informational pathways. These pathways
form informational networks, which in turn become cells. Now cells forms
networks of cells. Finally an individual is a collection of cells. A host of
individuals forms population and a variety of populations becomes ecologies.
This evolution brings a primary challenge for researchers and scientists to
create tools and mechanisms to capture and integrate these different levels of
biological information and integrate it towards gaining insight of their curious
functionings.
All of these paradigm shift lead to the view that the
major challenges for biology and medicine in this new century will be the study
of complex systems and the approach necessary for studying these biological
complexities. Here comes a viable approach.
- Identify all elements, such as sequence of genomes
in the system with currently available discovery tools
- Use current knowledge of the sytem to formulate a
model predicting its behavior
- Perturb the system in a model organism using
biological, genetic or environmental perturbations, capture information at
all relevant levels, such as DNA, mRNA, protein, protein interactions, etc.
and integrate the collected information
- Compare theoretical predictions and experimental
data, carry out additional perturbations to bring theory and experiment into
closer apposition, integrate new data into model,
- Iterate steps iii) and iv) till the mathematical
model can predict the structure of the system and its systems or emergent
properties given particular perturbations.
System Biology - Challenges Ahead
- The Integration of technology, biology, and
computation.
- The integration of the various levels of biological
information and the modeling .
- The proper annotation of biological information and
its its storage and integration in databases.
- The inclusion of other molecules, large and small,
in the systems approach.
- The integration imperatives of systems biology
presents many challenges to industry and academia.
Conclusion
With the confluence of biology and computer science,
the computer applications of molecular biology are drawing a greater attention
among the life science researchers and scientists these days. As it becomes
imperative for biologists to seek the help of information technology
professionals to accomplish the ever growing computational requirements of a
host of exciting and needy biological problems, the synergy between modern
biology and computer science is to blossum in the days to come. Thus the
research scope for all the mathematical techniques and algorithms coupled with
software programming languages, software development and deployment tools are to
get a real boost. In addition, information technologies such as databases,
middleware, graphical user interface(GUI) design, distributed object computing,
storage area networks (SAN), data compression, network and communication and
remote management are all set to play a very critical role in taking forward the
goals for which the bioinformatics field came into existence.
|