Introduction
"Gene expression" is the term used to describe the transcription of the information contained within the DNA, the repository of genetic information, into messenger RNA (mRNA) molecules that are then translated into the proteins that perform most of the critical functions of cells. Scientists study the kinds and amounts of mRNA produced by a cell to learn which genes are expressed, which in turn provides insights into how the cell responds to its changing needs. Gene expression is a highly complex and tightly regulated process that allows a cell to respond dynamically both to environmental stimuli and to its own changing needs. This mechanism acts as both an "on/off" switch to control which genes are expressed in a cell as well as a "volume control" that increases or decreases the level of expression of particular genes as necessary. The proper and harmonious expression of a large number of genes is a critical component of normal growth and development and the maintenance of proper health. Disruptions or changes in gene expression are responsible for many diseases. A simplistic example is illustrated in Figure 1[1].

Gene expression comparison between a normal cell line and a malignant cell line
“ X” axis : Gene’s expressed
“Y “ axis : the amount of RNA present indicating the “volume” of the gene
The Human Genome Project has allowed considerable progress in the construction of physical and genetic maps and the identification of genes involved in human sicknesses. The accelerated accumulation of biological information and knowledge is due in large part to the sequencing projects of other organisms, which in fact paved the way for the Human Genome Project. In parallel, recently developed techniques which take advantage of genomic sequences allow large scale molecular analyses resulting in the functional annotation of many of the proteins represented by these genes. This is the goal of functional genomics [2].
DNA microarrays are tools for assessing the functional dynamics of genes and genomes in a highly parallel fashion. Historically, DNA Microarrays are defined as ordered collections of DNA probes for the specific detection of complementary DNA targets [3] .Amongst the developing technologies, DNA microarrays are playing a dominant role compared to the other developing technologies since they are relatively easy to make and use and are applicable to numerous scientific inquiries. They allow the simultaneous analysis of several thousands of genes in biological samples from sick or healthy tissues, at the genome or transcriptome level. The data obtained is expected to result in major advances in the health sciences. In addition to an improved understanding of the complex molecular interaction networks of healthy cells and tissues, a more precise genetic characterization of the molecular mechanisms involved in pathology should result in the identification of new therapeutic targets and the development of new medicines. The genetic profiles thus obtained should also permit the definition of new pathologic subclasses not recognizable by traditional clinical factors, as well as new markers for susceptibility to certain illnesses, and new prognostic markers or methods of predicting responses to treatment [4,5]
DNA Microarray: Basic Concept
DNA Microarrays are small, solid supports onto which the sequences from thousands of different genes are immobilized, or attached, at fixed locations. The supports themselves are usually glass microscope slides, but can also be silicon chips or nylon membranes. The DNA is printed, spotted, or actually synthesized directly onto the support. The gene sequences in a microarray are attached to their support in an orderly or fixed way, because a researcher uses the location of each spot in the array to identify a particular gene sequence. The spots themselves can be DNA, cDNA, or oligonucleotides (short fragment of a single-stranded DNA that is typically 5 to 50 nucleotides long)(Fig 2, 3).


The whole process is based on hybridization probing, a technique that uses fluorescently labeled nucleic acid molecules as "mobile probes" to identify complementary molecules, sequences that are able to base-pair with one another. Each single-stranded DNA fragment is made up of four different nucleotides, adenine (A), thymine (T), guanine (G), and cytosine (C) that are linked end to end. Adenine is the complement of, or will always pair with, thymine, and guanine is the complement of cytosine. Therefore, the complementary sequence to G-T-C-C-T-A will be C-A-G-G-A-T. When two complementary sequences find each other, such as the immobilized target DNA and the mobile probe DNA, cDNA, or mRNA, they will lock together, or hybridize. (Fig 4)

Now, consider two cells: cell type 1, a healthy cell, and cell type 2, a diseased cell. Both contain an identical set of four genes, A, B, C, and D. Scientists are interested in determining the expression profile of these four genes in the two cell types. To do this, scientists isolate mRNA from each cell type and use this mRNA as templates to generate cDNA with a "fluorescent tag" attached. The two labeled samples are then mixed and incubated with a microarray containing the immobilized genes A, B, C, and D. The labeled molecules bind to the sites on the array corresponding to the genes expressed in each cell. (Fig 5) .After this hybridization step is complete, a researcher will place the microarray in a "reader" or "scanner" that consists of some lasers, a special microscope, and a camera. The fluorescent tags are excited by the laser, and the microscope and camera work together to create a digital image of the array. These data are then stored in a computer, and a special program is used either to calculate the red-to-green fluorescence ratio or to subtract out background data for each microarray spot by analyzing the digital image of the array. (Fig 6) If calculating ratios, the program then creates a table that contains the ratios of the intensity of red-to-green fluorescence for every spot on the array. For example, using the scenario outlined above, the computer may conclude that both cell types express gene A at the same level, that cell 1 expresses more of gene B, that cell 2 expresses more of gene C, and that neither cell expresses gene D .By using an array containing many DNA samples, researchers can determine, in a single experiment, the expression levels of hundreds or thousands of genes within a cell by measuring the amount of mRNA bound to each site on the array. With the aid of a computer, the amount of mRNA bound to the spots on the microarray is precisely measured, generating a profile of gene expression in the cell (Fig 7) [6,7].



Types of Microarrays
There are three basic types of samples that can be used to construct DNA microarrays, two are genomic and the other is "transcriptomic", that is, it measures mRNA levels. What makes them different from each other is the kind of immobilized DNA used to generate the array and, ultimately, the kind of information that is derived from the chip. The target DNA used will also determine the type of control and sample DNA that is used in the hybridization solution [8].
Microarrays are employed in clinical research when they utilize three possible properties in genomics
I. Changes in Gene Expression Levels
Determining the level, or volume, at which a certain gene is expressed is called microarray expression analysis, and the arrays used in this kind of analysis are called "expression chips". The immobilized DNA is cDNA derived from the mRNA of known genes, and once again, at least in some experiments, the control and sample DNA hybridized to the chip is cDNA derived from the mRNA of normal and diseased tissue, respectively. If a gene is over expressed in a certain disease state, then more sample cDNA, as compared to control cDNA, will hybridize to the spot representing that expressed gene. In turn, the spot will fluoresce with greater intensity. Once researchers have characterized the expression patterns of various genes involved in many diseases, cDNA derived from diseased tissue from any individual can be hybridized to determine whether the expression pattern of the gene from the individual matches the expression pattern of a known disease. If this is the case, treatment appropriate for that disease can be initiated [9].As researchers use expression chips to detect expression patterns— whether a particular gene(s) is being expressed more or less under certain circumstances—expression chips may also be used to examine changes in gene expression over a given period of time, such as within the cell cycle. There are a variety of genes involved in regulating the stages of the cell cycle. Also built into this network are mechanisms designed to protect the body when this system fails or breaks down because of mutations within one of the "control genes", as is the case with cancerous cell growth. An expression microarray "experiment" could be designed where cell cycle data are generated in multiple arrays and referenced to time "zero". Analysis of the collected data could further elucidate details of the cell cycle and its "clock", providing much needed data on the points at which gene mutation leads to cancerous growth as well as sources of therapeutic intervention [10]
Until recently, diagnostic and prognostic assessment of diseased tissues in a pathology laboratory relied on histological and immunohistohemical studies. DNA microarray technology now allows the simultaneous analysis of up to thousands of different genes in histological or cytological specimens. Thus, the microarray techniques offer opportunities for the pathologist to obtain 'molecular signatures' of the state of activity of diseased cells in patient tissue samples, providing new information, such as the biological staging of tumors, a risk assessment of pre-malignant lesions, resistance to, and side effects of, treatment[11].
A case in point is breast cancer where it is known that pathological and clinical heterogeneity, partly responsible of therapeutic failures, reflects its poorly documented complex and combinatory molecular basis. Now .genome-wide expression microarray studies have revealed that the biological and clinical heterogeneity of breast cancer can be partly explained by information embedded within a complex but ordered transcriptional architecture. Comprising this architecture are gene expression networks, or signatures, reflecting biochemical and behavioral properties of tumors that might be harnessed to improve disease sub typing, patient prognosis and prediction of therapeutic response. Emerging 'hypothesis-driven' strategies that incorporate knowledge of pathways and other biological phenomena in the signature discovery process are linking prognosis and therapy prediction with transcriptional readouts of tumorigenic mechanisms that better inform therapeutic options [12,13] .
In the same way, expression chips can be used to develop new drugs. For instance, if a certain gene is over expressed in a particular form of cancer, researchers can use expression chips to see if a new drug will reduce over expression and force the cancer into remission[14,15]
II. Genomic Gains and Losses
DNA repair genes are thought to be the body's frontline defense against mutations and, as such, play a major role in cancer. Mutations within these genes often manifest themselves as lost or broken chromosomes. It has been hypothesized that certain chromosomal gains and losses are related to cancer progression and that the patterns of these changes are relevant to clinical prognosis. Using different laboratory methods, researchers can measure gains and losses in the copy number of chromosomal regions in tumor cells. Then, using mathematical models to analyze these data, they can predict which chromosomal regions are most likely to harbor important genes for tumor initiation and disease progression. The results of such an analysis may be depicted as a hierarchical treelike branching diagram, referred to as a "tree model of tumor progression"[16,17]
Researchers use a technique called microarray Comparative Genomic Hybridization (CGH) to look for genomic gains and losses or for a change in the number of copies of a particular gene involved in a disease state. In microarray CGH, large pieces of genomic DNA serve as the target DNA, and each spot of target DNA in the array has a known chromosomal location. The hybridization mixture will contain fluorescently labeled genomic DNA harvested from both normal (control) and diseased (sample) tissue. Therefore, if the number of copies of a particular target gene has increased, a large amount of sample DNA will hybridize to those spots on the microarray that represent the gene involved in that disease, whereas comparatively small amounts of control DNA will hybridize to those same spots. As a result, those spots containing the disease gene will fluoresce with greater intensity than otherwise, indicating that the number of copies of the gene involved in the disease has gone up[18].
III. Mutations in DNA
When researchers use microarrays to detect mutations or polymorphisms in a gene sequence, the target, or immobilized DNA, is usually that of a single gene. In this case though, the target sequence placed on any given spot within the array will differ from that of other spots in the same microarray, sometimes by only one or a few specific nucleotides. One type of sequence commonly used in this type of analysis is called a Single Nucleotide Polymorphism, or SNP, a small genetic change or variation that can occur within a person's DNA sequence. Another difference in mutation microarray analysis, as compared to expression or CGH microarrays, is that this type of experiment only requires genomic DNA derived from a normal sample for use in the hybridization mixture. Once researchers have established that a SNP pattern is associated with a particular disease, they can use SNP microarray technology to test an individual for that disease expression pattern to determine whether he or she is susceptible to (at risk of developing) that disease. When genomic DNA from an individual is hybridized to an array loaded with various SNPs, the sample DNA will hybridize with greater frequency only to specific SNPs associated with that person. Those spots on the microarray will then fluoresce with greater intensity, demonstrating that the individual being tested may have, or is at risk for developing, that disease[19]
Microarray Data Management
Today, proficiency in generating data using microarray technology is fast overcoming the capacity for storing and analyzing it .As more laboratories acquire this technology, this avalanche of data requires standardization of storage, sharing, and publishing techniques .It is necessary to have a uniform system that will manage and provide a disbursement point for microarray data. Consider the amount of data that can potentially be generated using a single microarray chip. Suppose that chip contains 30,000 spots of target DNA. Researchers interpreting the data generated by that chip would need to know the biological identity of each target—what gene is where; the biological properties of the control and sample DNA; the experimental conditions and procedures used in setting up the experiment; and finally, the results. Although experiments such as these will undoubtedly push forward our current understanding of gene expression and regulation, many new challenges are presented in terms of data tracking and analysis [20]
New array analysis methods to classify tumors are continuously being developed and are becoming increasingly sophisticated and computationally intensive. Two approaches, in general, are commonly used to classify cancers using gene expression profiling data, the unsupervised and the supervised clustering. Unsupervised clustering analyses typically use pattern-recognition algorithms to define groups of samples that have similar global patterns of gene expression. Likewise, such analysis also identifies genes whose expression pattern is similar across a set of samples. Unsupervised analyses minimize a priori assumptions about the data and thus identify structure in array data without regard to known clinical parameters. Thus, such analysis is useful for distinguishing subgroups of same disease that differ from each other in the expression of large numbers of genes, presumably unique biologic entities. Although unsupervised analyses are effective at classifying tumors that have similar expression patterns for a large number of genes, such analyses are far less effective at identifying differences in the expression of small numbers of genes that nonetheless correlate with clinical parameters, including response to therapy. Such genes may be useful as markers for the development of differentiating tests that refine our ability to classify tumors and predict response to therapy beyond that achievable using current clinical data or array-based unsupervised classification methods, or both. Identifying these relationships often requires supervised analysis, in which statistical algorithms are used to identify genes whose expression is significantly correlated with a specific clinical parameter such as outcome. The power of these genes may be subsequently validated on an independent set of tumors by clinically classifying these samples based only on the expression levels of these preselected genes alone[21].
Gene Expression Omnibus (GEO).
GEO is an online repository launched by National Center for Biotechnology Information (NCBI), USA to support the public use and dissemination of gene expression data
Microarray Markup Language (MAML)
Developed by the "MAML" working group of the Microarray Gene Expression Database (MGED), this is a first attempt to provide a standard platform for submitting and analyzing the enormous amounts of microarray expression data generated by different laboratories around the world. The goal of this group is to facilitate the adoption of standards for DNA-array experiment annotation and data representation, as well as the introduction of standard experimental controls and data normalization methods. MAML proposes a framework for describing information about a DNA-array experiment and a data format for communicating this information, including details about:
(a)Experimental design: the set of the hybridization experiments as a whole
(b)Array design : each array used and each spot on the array
(c) Samples : samples used, the extract preparation, and labeling
(d)Hybridizations : procedures and parameters
(e)Measurements : images, quantization, and specifications
(f)Controls : types, values, and specifications
MAML is independent of the particular experimental platform and provides a framework for describing experiments done on all types of DNA arrays, including spotted and synthesized arrays, as well as oligo and cDNA arrays. What's more, MAML provides format to represent microarray data in a flexible way, which allows analysis of data obtained from not only any existing microarray platforms but also many of the possible future variants, including protein arrays.
Limitations of Microarray Technology
Exciting as the possibilities offered by this technology may appear, there are certain inherent limitations as yet which prevent full realization of its potential .The limitations may be because of procedure itself or else because of interpretation of data. Limitations in procedure can occur at every stage. The issues of sample heterogeneity and the impact of "contaminating" normal cell types are frequently raised in the context of global gene expression profiling of tumors. For example, biopsies frequently vary in the content of genetically normal, non-cancerous cells including lymphocytes and stromal cells. Because heterogeneity may confound the biologic interpretation of expression profiling experiments, many have recommended the use of techniques that are capable of enriching tumor cells, including micro dissection or flow-sorting, before expression profiling to ensure that only malignant cells are being profiled [22]. The need for such techniques in the clinic would clearly complicate array-based diagnostics and in some cases reintroduce variability attributable to the individual skill of the practitioner.
The inherent genetic heterogeneity of tumors may also potentially confound array-based diagnostics. Such heterogeneity is clearly observed, for instance, in the expression of molecular markers of metastasis in medulloblastoma, which are often not expressed by the entire tumor [23].
mRNA is inherently a fragile molecule and tends to degrade rapidly in free state. Problems of RNA quality usually are severe when working with blood, urine, or archived samples because many fixing and embedding protocols may damage RNA integrity [24] There are also variations in tissue handling, processing and RNA extractions. These factors combine to alter results .Various schemes are used to circumvent this problem. We can use mRNA Later (Ambion, CA, USA), a commercially available solution to store the extracted mRNA. The commercially available kits for DNA or RNA extractions include internal controls at every step of extraction procedure and incorporate algorithms for noise reduction. Another problem is that though mRNA is measured and hence gene expression profile inferred, it truly doesn’t represent the post –translation protein expression in vivo as it doesn’t take into account alternate splicing and secondary post translational protein modifications. One way to subvert this is to use” chips” which have alternate splicing patterns embedded. Another phenomenon which merits consideration is “RNA interference “. This phenomenon directly results in gene modulation as, small complementary RNA segments or microRNA, block specific regions on .DNA single strand. MicroRNA or short “hairpin RNA “, as they are sometimes referred to, as are small modulatory RNAs. [25,26]
Limitations of interpretation results from the sheer volume of data which is generated in each experimental run. It requires customized software and trained technical manpower to carry out the biostatistical analyses. Real time PCRs are required to be run to validate the gene expression profiles suggested by the microarray data analysis. Then again we require to increase the sample size to account for functional taxonomy of genes [27].
Conclusion
Microarrays represent a major methodological advance both because they may contain a very large number of genes and because of their small size. Microarrays are therefore useful when one wants to survey a large number of genes quickly or when the sample to be studied is small. Microarrays may be used to assay gene expression within a single sample or to compare gene expression in two different cell types or tissue samples, such as in healthy and diseased tissue. This technology is still considered to be in its infancy; therefore, many initial studies using microarrays have represented simple surveys of gene expression profiles in a variety of cell types. With new advances, researchers will be able to infer probable functions of new genes based on similarities in expression patterns with those of known genes. Ultimately, these studies promise to expand the size of existing gene families, reveal new patterns of coordinated gene expression across gene families, and uncover entirely new categories of genes. Furthermore, because the product of any one gene usually interacts with those of many others, our understanding of how these genes coordinate will become clearer through such analyses, and precise knowledge of these inter-relationships will emerge. The use of microarrays may also speed the identification of genes involved in the development of various diseases. This technology will also aid the examination of the integration of gene expression and function at the cellular level, revealing how multiple gene products work together to produce physical and chemical responses to both static and changing cellular needs. For the first time, arrays offer hope for obtaining global views of biological processes—simultaneous readouts of all the body's components—by providing a systematic way to survey DNA and RNA variation. There is little doubt that microarrays will revolutionize our ability to quantify the complex changes that occur in gene expression during disease development. The greatest challenge that lies ahead is how to translate this knowledge into clinically useful diagnostic and therapeutic tools.
Table 1

References
- Shoemaker DD, Schadt E, Armour CD, GarrettP , McDonagh PD, Loer PM. Experimental annotation of the human genome using microarray technology. Nature 2001 ;409 : 922 – 7.
- Bertucci F, Loriod B, Tagett R et al. DNA arrays: technological aspects and applications. Bull Cancer 2001;88:243-52.
- Gershon D. DNA microarrays: more than gene expression. Nature 2006; 437:1195-8.
- Guo QM. DNA microarray and cancer. Curr Opin Oncol 2007 ;15:36-43.
- Shih IeM, Wang TL. Apply innovative technologies to explore cancer genome Curr Opin Oncol. 2005 ;17:33-8.
- Schena M, Davis RW. Genes, Genomes and Chips. In DNA Microarrays: A Practical Approach . M. Schena Editor Oxford University Press, Oxford, UK, 1999; 12-36.
- Cortese JD. Array of Options: Instrumentation to exploit the DNA microarray explosion. The Scientist , 2000 ;14:26-9.
- Lockhart DJ, Winzeler EA . Genomics, gene expression and DNA arrays. Nature, 2000; 405:827-36.
- Macgregor PF. Gene expression in cancer: the application of microarrays. Expert Rev Mol Diagn 2003 ;3:185-200.
- Marx J. DNA Arrays Reveal Cancer in Its Many Forms. Science 2002; 289: 1670-2.
- van de Rijn M, Gilks CB Applications of microarrays to histopathology. Histopathol 2004 ;44:97-108
- Miller LD, Liu ET. Expression genomics in breast cancer research: microarrays at the crossroads of biology and medicine. Breast Cancer Res 2007;9:206-9.
- Bertucci F, Viens P, Birnbaum D. DNA microarrays for gene expression profiling of breast cancer: principles and prognostic applications. Pathol Biol . 2006 ;54:49-54.
- Pommier Y, Botstein D, Brown PO, Weinstein JN. A gene expression database for the molecular pharmacology of cancer. Nat Genet 2003 ;24:236-44.
- Weeraratna AT, Nagel JE, de Mello-Coelho V, Taub DD. Gene expression profiling: from microarrays to medicine. J Clin Immunol. 2008 ;24:213-24.
- Khan J, Saal LH, Bittner ML, Chen Y, Trent JM, Meltzer PS. Expression profiling in cancer using cDNA microarrays. Electrophoresis 1999 ;20:223-9.
- Hofman P.DNA microarrays. Nephron Physiol. 2008;99:85-9.
- Pollack JR, Perou CM, Alizadeh AA et al . Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat Genet 2002;23:41-6.
- Wang DG, Fan JB, Lander ES. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 1998;280:1077-82
- Sinclair B. Everything's Great When It Sits on a Chip - A bright future for DNA arrays. The Scientist, 1999 ; 24: 18-20.
- Brown KA, Hedenfalk I A, Trent JA . Molecular Methods in Oncology In : Eds Devita V T , Hellman S . Cancer , Principle and Practice of Oncology 7th Edition Lippincott Williams and Wilkins , Philadelphia 2005.
- Specht K, Richter T, Muller U, Walch A, Werner M, Hofler H. Quantitative gene expression analysis in microdissected archival formalin-fixed and paraffin-embedded tumor tissue. Am J Pathol 2007;158:419-29.
- MacDonald TJ, Brown KM, LaFleur B, et al. Expression profiling of medulloblastoma: PDGFRA and the RAS/MAPK pathway as therapeutic targets for metastatic disease. Nat Genet 2001;29:143.
- Klimecki WT, Futscher BW, Dalton WS. Effects of ethanol and paraformaldehyde on RNA yield and quality. Biotechniques 1994;16:1021-1023.
- Paddison PJ, Hannon GJ. RNA interference: the new somatic cell genetics? Cancer Cell 2002;2:17.
- Dykxhoorn DM, Novina CD, Sharp PA. Killing the messenger: short RNAs that silence gene expression. Nat Rev Mol Cell Biol 2003;4:457.
- Schubert CM. Microarray to be used as routine clinical screen. Nat Med 2003;9:9-15.
- Marshall, A, Hodgson J. DNA chips - an array of possibilities. Nature Biotechnology 2001; 16: 27-3.