这是我在上半年写的一篇小文,目的是介绍生物信息学;现在塞进来作为介绍生物计算的一点内容。
李淑召
This article is a brief review of the field of Bioinformatics. Because its interdisciplinary nature and vague definition, a description of bioinformatics will inevitably involve many technical terms. I shall try my best to make it accessible to a broad audience, including technical professionals and business people.
As the time of my writing, there is a good entry for bioinformatics at Wikipedia (http://en.wikipedia.org/wiki/Bioinformatics). The Wikipedia article focuses on new research activities in the “post-genomic” era by laying out major research frontiers (which is done with great academic spirit and curiosity), but fails to give an overall picture of the field. Bioinformatics had been an established field before the boost of genomics, both as an academic discipline and as an industry. Besides its content coverage, that article could also benefit from improving its accuracy and clarity in biology.
Bioinformatics is a discipline that applies computing tools to biological researches.
Biology is about information: how hereditary information is transmitted from one generation to another, how genomic codes are executed through various molecules, how an error in gene sequences causes functional defects, how to use molecular specificity to isolate a piece of information and a lot more. The central dogma of molecular biology, as one may recall from high school biology, is essentially a mechanism of information translation: genetic information flows from DNA to RNA and to proteins. However, decades had pasted, as molecular biologists worked hard on major components of cells, before enough data accumulated to enlighten the digital nature of biology. The first problem bioinformatics faced and faces is sequence handling. (To get a feel, try to read the 4.6 million letters in E. coli genome at http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=4917599.... Do it only if you have a fast internet connection and some strong coffee at hand.) Intensive work has gone to developing algorithms, computer programs and databases to handle such sequences. When time came to assemble the human genome in 2000, Celera Genomics had to build one of the fastest supercomputer at the time, and the public Human Genome project used a cluster of 100 Linux computers at UCSC. Today, sequence retrieval, analysis and processing has become part of daily job in biological researches.
From a historical view, bioinformatics was involved early in structural biology. Protein structures from either X-ray or NMR require intensive computing. The deposit of structures (and earlier amino acid sequences) led the birth of protein databases. The visualization and simulation of structural models also had a good drive of both hardware and software market (e.g. Insight, a leading molecular visualization program, on a SGI workstation was a common setup).
Molecular evolution and genetics were also linked to bioinformatics early. Evolution has always been in the center of biological theories. During the evolution of bioinformatics itself as a discipline, hallmark software packages like PHYLIP and GCG appeared. (The latter was also involved into commercial developments.) Now SNP analysis, a breed of genetics, also become a hot topic of bioinformatics.
But sequence handling is by far the most popular task biologists have carried out since the dawn of PC age. A typical application is the design of PCR primers. Although many academic and free programs have been developed, companies like ABI and Biosoft still earn premiums on primer designing software, especially along with the advancing new fluorescent quantitative PCR technology. Another routine task biologists take is to “BLAST” a sequence against Genebank, which is now an integrated part of NCBI giant databases - “BLAST” is a popular program for searching similar biological sequences.
As these bioinformatics services became to mature, the boom of genomics and functional genomics since later last decade has brought many new challenges and opportunities to bioinformatics. Huge amount of new genomic sequences have enough puzzles to entertain a whole generation of scientists. Large scale data are also poured in by DNA microarrays, proteomics, antibody arrays, Y2H, Chip-on-chip and many high throughput technologies. Hence come the new frontiers of processing, analyzing and modeling these data, as partly described by that Wikepedia article. The informatics around these new technological platforms is a significant part of biotech developments. Integrated models (systems biology) also try to get a share in drug development business.
Just like others, big biotech companies and research institutions also need good IT infrastructure. This rather belongs to conventional IT areas, not a part of bioinformatics. Computation could also be involved deeply into and drive research questions of a biological nature. In such cases, the term of “computational biology” is often used, though it is interchangeable with “bioinformatics” in some occasions. So by training, a bioinformatician is expected to run databases, perform programming on research related tasks and use major bioinformatics programs. From the view of biologists, bioinformatics is mostly the computing services they use to carry out researches. The last not the least, bioinformatics roots deeply in resource sharing with the help of internet.
In a broad sense, some bioinformatics areas may also overlap with bio-IT, for which I shall elaborate in a companion article.