Pages

Sunday, December 15, 2013

Computational science: the most powerful tool to understand the world

To understand it from its name, computational science is a way to study the world with computational methods. In other word, everything should be stored with “1” or “0” and manipulated by computer algorithms. Basically, this process consists of two parts, constructing mathematical modeling and quantitative analysis.

People might suspect about the accuracy of the computational model, since it is discrete while the real world objects are mostly continuous without gaps. This is correct because computational models are fundamentally only consist two discrete statuses, such as “0” and “1”. However, it is also this discrete model that provides computational methods high efficiency when solving problems modeled in this way. Fortunately, computational scientists have developed various digitalization methods to provide a solution with trade-off between accuracy and efficiency. Considering the difference between the normal screen and the retina screen of iPad, you might think that the later retina screen will represent the image more accurate than the older one. However both of them are using huge numbers of discrete points to model the picture. The only difference is that the retina screen will use much more points than the normal ones. The more accurate model we construct, the bigger amount of data we will use and the more complex problem we should solve. Besides the digitalization, a good mathematical modeling will need to consider lots of other issues, such as boundary conditions, the outliers and the structure of the model which is similar to the real problems, etc. Some parametric models will need model fitting to determine the value of parameters.

Ref: Biocomputation Research, http://simbios.stanford.edu/BioAlg.htm



With a good computational model, we can perform quantitative analysis for solving a given question. For example, given a network model of an integrated circuit and the timing model of each gate in the circuit, we can perform quantitative analysis to calculate the longest delay of the whole circuit under some environment temperature. This technique will be very useful to chip design companies since they need not to determine whether the chip will function well until post-manufacture. To realize this simulation process, we need to develop a suitable algorithm to traverse the whole network graph and on each node, we need to perform numerical integration to solve the non-linear functions with variables such as current and voltage. The similar process will be used to simulate biology system such as metabolism system. As soon as we obtained an accurate enough model, we can perform simulation with little cost and at any speed. With less than an hour, we can obtain the behavior of circuits after thirty years or know the evolutionary status of a living organism.  In this way, computational science provides us an unprecedented tool to study the unknown world and I can’t wait to see more amazing achievements in these fields.

Computational biology: explore the magic of life by the power of computing

Life science is such a field that takes high respect of experiment data. With the fast progress in high-throughput genomics sequencers, huge amounts of sequencing data are accumulating, which brought with big opportunities for treatment of many tough diseases while how to dig valuable knowledge from these data will need help of computational biology.

As we mentioned above, the first step to explore the life system is to understand the building blocks, such as the genome. Gene, as a sequence of four words “ATCG”, contains the rules to guide the development of proteins, which construct the life system and modulate the running of the system. The genome, as a sequence of genes, is the container of these rules.  In order to obtain the accurate genome sequence, biologists designed a set of machines to extract DNA from tissues and then truncate them into small parts and record the code in each part. Just like Jigsaw puzzle, the recorded parts need to be connected to each other to present the whole picture of the genome. For a normal plant, such as wheat, we need to analyze about 5 Giga Byte data of parts to extract a genome sequence with around 200 Mega Byte size. It is far beyond humans’ ability and the only solution is to depend on computational software and algorithms to store the data and perform the analysis automatically.

sequencing to identify life secret (ref: http://en.wikipedia.org/wiki/RNA_sequencing)

Based on the accurate genome, we can further study the evolution of living system. For example, by comparing the genome sequence of people living in one area with others across the world, we can find some specific features which might do help to study the specific hereditary disease in that area. This kind of research will benefit from some existing computer science research such as text mining. Also some state-of-art database techniques such as Hadoop has also been employed for storing huge data and assist parallel data access in an efficient way[1]. 

As a dynamic system, the living organism will always maintain a complex metabolism network, which is consisted of a set of chemical reactions and its related modulator as enzymes. Computational biology can also help to simulate such a system with a network model. To put it simply, each node in the network will represent the reactant and product and each edge will represent a reaction. The reaction process can be modeled as differential equations and the concentration of reactants and products will be the variables. By setting the suitable boundary conditions to the network model, we can simulate the whole metabolism system. More interesting research is to simulate the result after manual intervention. Without a long time waiting for the wet-lab experiments, we can predict the organism’s behavior after some modulation on the environment conditions or the enzyme types, which takes great benefit of the computational model and algorithms for the simulation.

What I listed here are only small parts of examples that computing can be used in biology research. I believe that this is a promising research area which is just at its beginning.

Reference
[1] Charles Schmitt, UNC Big Data Analytics Stories: Genomic Sequencing, http://www.intel.com/content/www/us/en/big-data/renci-peer-story.html

Sunday, December 8, 2013

Image processing: how to find the principal content automatically


Image processing is a very specific research area in computer science, while it is also a very popular topic due to its magic application in daily life. You might not be an expert on complex image processing algorithms, while you might be very familiar with iPhone’s panorama App, which can help to take an amazing panorama photo when you go around by holding your iPhone.  Isight camera on iPhone will take several photos during the process while image processing algorithms will seamlessly put these photos together automatically and smartly.

In order to fulfill the job like described above, image processing needs to “understand” the content of the photo. Most of time, understanding the principal content will be a practical way to go. Here we don’t depend on Artificial Intelligence, although we mirror the way that human beings interpret an image. Think about what you will notice when seeing a sun rising from sea level. It is “sea level” and the outline of the sun, since these areas contain big changes on color, brightness or shape. These areas are the “principal content” of a photo, which are needed to describe the data in the image economically. Automatically identify the “principal content” of a photo will help us to easily compare two images and find the common-shared part, which is the critical part when stitching photos together.

The first eight eigenfaces abstracted from a principal component analysis (PCA) of the Ekman and Friesen (1976) faces [1]

How to automatically identify the “principal content” is the critical problem if we want machines to help us perform lots of image processing jobs. The basic algorithm to perform this task is Principle Component Analysis (PCA) [2], which was invented in 1901 by Karl Pearson and has been applied in statistics for a long time. I don’t want to discuss about the detailed mathematical content of these algorithms, while I would like to talk a little about the basic idea. Considering the way that used to describe a point in 2-dimension graph, yes, a pair of x and y axes will be enough to fulfill the job and uniquely describe the point without any redundant information. For a 3-dimensional object such as a line, we will naturally use a combination of x, y and z axes to describe. The reason for choosing these two or three mutually perpendicular axes is that they have the best ability to describe the object in the given space. In other word, these perpendicular axes are the “principal component” which can be used to describe the object in the most economically way. PCA is such an algorithm to identify these “principal component” inside a picture. In order to perform this algorithm on an image, we need to digitalize an image to be represented by a matrix first. PCA is to perform a kind of matrix decomposition named SVD (Singular value decomposition) [3] and then determine which set of “axes” (Eigen vector) are the most important to keep in order to describe the photo in a most economically way. 

Reference
[3] Singular value decomposition, 
http://en.wikipedia.org/wiki/Singular_value_decomposition