Penn A&S masthead

life sciences logo

Joseph Ecker photo
No More One-By-One
Writing the Reference Book for All Plant Genes

On the fifth floor of the Goddard Labs, a computer printout that reads "Arabidopsis thaliana Genome Center at the University of Pennsylvania" is taped on the wall. The row of perforated pages, which substitute for more formal signage, mark the entry to a laboratory where some of the world's leading genetic research is underway. Biology Professor Joe Ecker, in a rumpled yellow shirt, materializes suddenly. Not one to waste time on pleasantries, he launches immediately into an explanation of the genetic sequencing project being carried out in his lab. Ecker has just come from the Mudd Building, where researchers in his second laboratory are studying bits of the genetic jigsaw puzzle being pieced together in Goddard.

photoThe Center is part of the Multinational Coordinated Arabidopsis thaliana Genome Research Initiative, which involves six consortia in the U.S., Europe, and Japan. All six groups have secured funding for their contribution to the coordinated sequencing of Arabidopsis, or mouse-ear cress, a common back-yard weed of the mustard family. In the fall of 1998, the National Science Foundation and the departments of Agriculture and Energy jointly awarded Ecker $4 million to continue working on his share of the large-scale sequencing project. To avoid duplication, the work is divided among the research laboratories, with each consortium having responsibility for a well defined region of the genome. Penn, along with labs from Stanford and UC Berkeley, make up the SPP Consortium, which has responsibility for sequencing chromosome 1. Arapidopsis has six chromosomes. Ecker and his collaborators expect the entire plant genome to be sequenced by the end of the year 2000.

A complete set of information to build and sustain the life of an organism is encoded in its genome. The coiled strands of deoxyribonucleic acid (DNA) and associated protein molecules that comprise the genome are present in every cell of every living being. From there, they direct the production and assembly of proteins that carry out the complex and profoundly intricate chemistry of life. Sequencing consists of determining the exact order of the genome's base pairs, the rungs of adenine, thymine, cytosine, and guanine that hold together the two sides of the DNA molecule's twisting-ladder structure or double helix. The sequence of base pairs encodes the genetic instructions for creating a particular organism and maintaining vital cellular, developmental, and metabolic processes. Genome size is usually stated in terms of base pairs: Arabidopsis has about 120 million base pairs, which comprise an estimated 25,000 genes; the human genome is made up of three billion base pairs and about 100,000 genes. Knowing the sequence of base pairs allows researchers to identify specific genes and their position on the genome, and to create the genetic blueprint.

Ecker's office is a tiny closet with windows in a corner of the lab. There is space enough for a desk with a computer, a few books, and two chairs whose occupants' knees just about touch. The minimalist office accommodations preserve space for the laboratory, a somewhat larger closet crammed with computers, a rotating apparatus that swirls chemical elixirs in beakers, robotics that prepare the DNA clones for sequencing, and technicians and researchers poised before monitors displaying grids of letters and numbers. Some of the researchers are former undergraduates of Ecker's.

Much of the sequencing process is automated. In a computer-beige box, cloned fragments of stained Arabidopsis DNA are falling through an electrically charged gel held between panes of quartz glass. The process is called electrophoresis. A laser scans the molecular blizzard, and the output is a four-color chromagram displaying peaks that represent each of the four DNA bases. The sequencer now in use is already obsolete. Nearby, the discarded carcass of that technology's forebear rests on a shelf. The next generation of high-tech equipment is on order; it will conduct sequencing operations at a faster rate and less expensively--important considerations given the dense configurations of data that must be scanned, analyzed, and stored. After the DNA fragments are "read," researchers use custom-made software to reassemble the data into continuous stretches that are analyzed for errors, gene-coding regions, and other characteristics. Study of genetic structures is possible in large measure because new computational methods and devices allow investigators to manage galaxies of data. In a sense, the genetic revolution stands upon the shoulders of the information revolution. Each week, the DNA sequence that's been compiled is posted on the Center's website. To date, the SPP Consortium has completed sequencing about one-sixth of chromosome 1.

If this endeavor seems to be a monumental undertaking, the sequencing project is only part of what's involved in unraveling the long skein of DNA folded up in Arabidopsis. In Ecker's other laboratory, a team of researchers is at work trying to disentangle what role each of the 25,000 genes plays in the life of the plant and how it executes those functions. "Determining the sequence of the genome is called structural genomics," explains Ecker. "This is the easy part. The next phase is functional genomics, and it involves the study of gene expression at the RNA level, at the protein level: how the proteins associate with one another, how they make bigger structures of these proteins, how cells talk to one another, how they respond to the environment--the whole shebang. This is a more daunting task."

call-out

This phase of the project involves "knocking out" each gene, eliminating it from the genome, and then looking to see how the plant is affected. Making that determination is not always obvious and sometimes requires that experimenters stress the mutant organism in various ways to tease out what aspect of its life processes is governed by the dislodged gene. "I've just organized a meeting at Stanford," says Ecker. "We're going to propose a plan that begins at the end of the sequencing phase and runs for about 10 years. The project is to determine the function of all Arabidopsis genes."

Once a year, the world-wide community of Arabidopsis researchers gathers to review and discuss what has been accomplished in their field. The proceedings for the most recent Conference on Arabidopsis Research are published in a 600-page book that summarizes each laboratory's progress in one or two pages. Any plant process you can think of--and many you can't--is studied by one of these groups. Ecker is growing about 17 thousand Arabidopsis plants in the basement of Leidy Lab, aiming to produce complete sets of mutant seeds, about 25,000 per set, that can be distributed and studied by these labs. Each research group will look for mutations in the specific process that is the focus of their expertise.

On top of structural and functional genomics, Ecker and his colleagues have been closely studying ethylene signal transduction in Arabidopsis, the genetic pathway whose activation allows the weed to detect the gas. Ethylene is a growth-inhibiting gas, the equivalent of a hormone in animals. Biology chairman Andy Binns notes that Ecker is "one of the international leaders in plant developmental genetics, which is the utilization of genetics to understand plant developmental response pathways. While he's developing this incredible research program, he's deciding, 'I'm going to have to have all the DNA sequence information in that organism.' So he embarks on this sequencing project and becomes an international leader in that. This is an intellectual and organizational tour de force. He still has people working on the plant developmental process, plus he's running this enormous genomics project."

Arabidopsis is recognized by researchers as a model genetic system: it's genome is relatively small and the plant's life cycle short, making it easy to study the effects of genetic manipulation across generations. "I like to call Arabidopsis a reference plant," says Ecker, "it's the book of all plant genes, and we're creating the plant reference book." Once Ecker's reference book is done--a comprehensive map of the genome with detailed information on how each gene functions in the organism--investigators will use it to explore and manipulate comparable genes in other plants. "Say you're interested in ethylene response in wheat," Ecker explains. "You can use the reference book to locate the gene in Arapidopsis that stimulates the ethylene response. I guarantee you it's going to be the same gene in wheat. So, by genetically defining the genes in the ethylene pathway of Arabidopsis, we're defining the genes involved in ethylene signaling for all plants." The reference book, which is really a database, will be particularly useful in agricultural research and genetic engineering of crops like wheat, whose genetic structure is eight times the size of the human genome."

call-out

Because many of the genes that make up plant genomes are also present in animals, including humans, understanding a genetic system like Arabidopsis will provide insights into the function of related genes in non-plant systems. The data generated by projects that elucidate the genetic machinery of model systems like fruit fly, mouse, yeast, and worm are changing the way science is done and accelerating the pace of discoveries. "Once life figured out how to do something," Ecker says, "it kept the mechanism across all species. Our ultimate goal is to understand how all multicellualar eukaryotic organisms work." Laboratories are assembling genomic databases for each organism being sequenced, but Ecker is working with other researchers to make sure there is a single database that incorporates the genetic information for all sequenced genomes. "That way you can have an integrated view of what a particular gene does in all the model organisms."

Ecker has a proselytizer's conviction when discussing how the accumulation of genetic information is changing the nature of the scientific enterprise. "Being able to study all genes by having knock outs of all genes and the sequence of all genes is a conceptually different approach to biology," he says. "It's going to change how evolutionary biologists work; how population biologists work; how molecular biologists, medical researchers, and geneticists work." Traditionally, researchers form hypotheses, based on field observations, and then conduct experiments to test their hunches. The amount of genetic data becoming available is dramatically changing the quality and accuracy of predictions scientists can make in shaping hypotheses. "The questions are the same, but by having all this information and then forming your hypothesis, you're at a much more advanced stage in answering the question."

The scale and complexity of genomics makes it impossible for a single investigator working alone in a laboratory to do this kind of work. Ecker expanded the focus of his research when he understood that studying processes like ethylene signaling using classical techniques--one gene at a time--was not nearly as powerful and fruitful as the new data-driven and collaborative methods. "If we continue to do the one-by-one approach," he declares, "we probably won't figure out the whole ethylene pathway until I retire. There's a new generation of tools required to do biology--the boat is leaving the dock. The boat is genomics: the study of genes and their expression on a global scale. No more one-by-one."

The conversation done, Ecker offers a parting handshake and immediately turns and strides off down the hall--all in a single motion, it seems--on his way to the next task.


Return to Table of Contents