Skip Navigation
Text Size
Home > ARRA Stories > modENCODE: Revealing the Inner Workings of the Genome
modENCODE: Revealing the Inner Workings of the Genome

ARRA funds support data analysis in collaborative effort to map gene regulation networks

By Susan Johnson

March 3, 2011

The modENCODE Consortium is a prime example of the movement towards team science (modENCODE stands for model organism ENCyclopedia Of DNA Elements)

Ten years ago, the human genome—life’s instruction manual—was sequenced. But many mysteries remained: For example, how does the body know which instructions to read at the right moments?

A single scientist would need several lifetimes to answer such a complicated question, but teams of investigators could no doubt provide the answer much more quickly. Recently, hundreds of scientists around the country and throughout the world set out to start answering the question, working together under the leadership of the National Human Genome Research Institute (NHGRI). With a key assist from the American Recovery and Reinvestment Act (ARRA), these collaborators have gathered enormous amounts of information to illuminate the genetic control systems that coordinate vast signaling networks that turn DNA “on” and “off.” These data—freely available on the web—will be powerful resources for scientists trying to understand and alleviate human disease for years to come.

ARRA Fills a Gap

The modENCODE Consortium is a prime example of the movement towards team science (modENCODE stands for model organism ENCyclopedia Of DNA Elements). Team science is a cost-efficient paradigm that transforms scientific competitors into partners who share data and divide labor. Researchers who are part of modENCODE study fruit flies and roundworms, animals with a history of providing enormous insights into human biology. NHGRI has funded both modENCODE and its analog in humans, ENCODE. Using advanced sequencing and computing methods, the collaborators generated huge amounts of data that allowed identification of functional elements across both organisms’ entire genomes. Since its start in 2007, modENCODE has created more than 700 data sets for flies alone.

“The sheer volume of data produced at the multiple centers caught us by surprise,” says Peter Good, Ph.D., a modENCODE program director at NHGRI. Dr. Good and his colleagues realized that the project would benefit from a specialized data analysis center; this center would have to analyze and systematically integrate the influx of genomic data from the animals.

NHGRI leapt at the opportunity to use some of its allocated ARRA funds to create this resource. After a competitive review, NHGRI selected Manolis Kellis, Ph.D., of the Massachusetts Institute of Technology to receive $2.8 million to lead the data analysis center. Under this ARRA grant, Dr. Kellis coordinated the many fly research projects and the accumulation of fly data, while Yale University’s Mark Gerstein, Ph.D., who was also funded by this grant, led the coordination of worm research projects.

“The data analysis center fulfilled a major missing component in the modENCODE project,” says Dr. Kellis. “These funds were not just for each group to analyze their own datasets, but for integrative analysis at the interface between participating research groups.”

With the help of newly hired researchers, staff at the data analysis center created and managed common analytic procedures for all participating research groups. In turn, modENCODE researchers used the procedures developed by the center to interpret shared data in a consistent way. This joint approach generated many more discoveries than if the groups had analyzed each data set in isolation. ARRA supported modENCODE in other ways as well. ARRA-funded grant supplements helped participating groups improve their equipment, hire more personnel and otherwise increase productivity.

Solving the Problem

modENCODE is an extension of NHGRI’s longtime support of efforts to understand fly and worm genomes; for example, NHGRI supported the sequencing of their genomes. For decades, such experimental animals—or modelorganisms—have opened the door to discoveries. Because these animals grow and reproduce rapidly, they are easy to work with in the laboratory, and yet they share many features with the biology of humans.

The model organism data obtained by modENCODE provide a shortcut for researchers interested in human disease and could, in time, hasten the development of useful therapies. Many disorders, from cancer to Alzheimer’s disease, are caused by problems in gene control systems. If the gene that is associated with a particular disease in humans is known and there is a comparable gene in the model organism, then a scientist anywhere in the world could look up this gene in the modENCODE database and find important information about it. The information there would suggest a starting point for further studies.

In gathering data on their model organisms’ genomes, modENCODE production centers used high-throughput methods (these techniques quickly conduct many analyses across the entire genome). Some of the analyses determined which segments of DNA were being “turned on,” or transcribed, in different types of cells at different times. Other analyses used fluorescent tags to identify the specific DNA sequences in the genome where proteins attach (a multitude of different proteins can bind to DNA either to help or to inhibit gene transcription).

Computational techniques allowed the scientists to combine the results to create the big picture. The completed view uncovered an astounding number of previously unknown locations in the genome that control DNA transcription—many of them appear to be similar between fly and worm. The project also revealed the patterns of gene regulation that occur in a complex network (see illustration). In brief, the modENCODE project has helped scientists understand how the elements in the network interact with each other, with gene sequences and with the three-dimensional structure of chromosomes to tell cells how and when to follow their genetic instructions.

Looking to the Future

The long-standing, dynamic product of modENCODE is the data being collected into the public database ( Anyone can download data from the project web site free of cost, and the site also provides interactive data visualization. modENCODE, which will continue for at least another year, continually produces, checks, and uploads large amounts of data to the site.

Going forward, the modENCODE team will extend its high-throughput analyses to search for additional types of functional elements and the networks in which they are involved. Team members also plan to search for commonalities between flies and worms in gene regulation. Once the human ENCODE project completes its full-genome analyses, these projects will compare their data. Both groups are very interested in finding the commonalities between flies, worms and humans.

By harnessing the power of collaborative science and gathering huge quantities of information from the fly and worm model organisms, the modENCODE Consortium has opened a new door to our understanding of human biology. Through data available at the modENCODE web site, researchers can much more rapidly approach the answers to complex research questions in biological and medical research.

Recovery Act Investment: A Data Analysis Center for Integration of Fly and Worm modENCODE Datasets”; Manolis Kellis; Massachusetts Institute of Technology; 2009: $1,473,507 (1RC2HG005639-01); 2010: $1,316,360 (5RC2HG005639-02). Funded by the National Human Genome Research Institute.

modENCODE Consortium et al. Identification of functional elements and regulatory circuits by Drosophila modENCODE . Science, 2010;330(6012):1787-1797.

Gerstein MB et al. Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project. Science, 2010;330(6012):1775-1787.

An interactive tool on the modENCODE web site illustrates the fruit fly’s networks of gene control ( Gene regulation is very similar in fruit flies and humans; modENCODE discovered many new elements of the fruit fly networks and organized them for the first time. In this screen capture, green circles represent different transcription factors, proteins that attach to DNA to affect how genes are controlled. Red circles represent microRNAs (miRNAs), which silence genes’ messages to the cell. This diagram highlights how just one transcription factor, called dorsal (top right), is linked to other elements in the network. Depending on an organism’s life stage and environment, the active elements of this network in each cell in the body will shift to turn specific genes on and off.

Related Links

Search Stories:

Project Details

Research/Disease Category

  • Genetics
  • Human Genome
  • Networking and Information Technology R&D
Check this website regularly for new stories of advancement and discovery.