The genome isn't a code and we can't read it
By Tom Bethell
The principal actors had appeared in the White House
last June -- Francis Collins of the National Human
Genome Research Institute, and J. Craig Venter of Celera
Genomics. Now they were back with a supporting cast
and a more detailed analysis, in the Capital Hilton Hotel,
with the TV lights glinting off the ballroom chandeliers,
250 journalists packed into the hot room, and James
Watson of DNA fame on hand to take a bow. There
would be one more blaze of publicity about the project
to decipher the human genome. The new findings were
about to be published in long articles, with a comical
abundance of co-authors, in the journals Nature and
Science.
New Mexico's Sen. Pete Domenici, an early and eager
supporter of the project on Capitol Hill, received a
vigorous round of applause. He was sitting next to
Watson, and in his remarks Domenici said that Watson
had just whispered to him, "You must say that this
project was congressionally driven." The senator added,
"And that's true. This project, in terms of the U.S.
government, was truly started in the Congress." One of
the new buildings going up on the "campus" of the
National Institutes of Health will surely be named after
Domenici.
One news item was prominently reported. The number
of human genes is now believed to be about 30,000,
one-third or even one-fourth the number recently
estimated. At first this was played as the familiar object
lesson in humility for us self-satisfied anthropoids. We
thought we were at the center of the universe. Silly old
us! Now, our supposedly overweening pride receives
another setback. For we have "only twice as many genes
as a fruit fly, or a lowly nematode worm," said the
ever-so-humble Eric Lander, head of genome research at
the NIH-funded Whitehead Institute in Cambridge,
Mass. "What a comedown!" The journalists roared on
cue. That would be the sound-bite for National Public
Radio, you knew, and the Washington Post would
publish it the next day.
There was, however, a more disturbing implication. It
took a few days to sink in. There followed a kind of
appalled silence, and then the alarm bells began to ring,
if only faintly. "The way these genes work must
therefore be far more complicated than the mechanism
long taught," whispered the Washington Post. The
alarms will grow louder. For if what Craig Venter said
is true -- and it was accepted by James Watson when I
spoke to him immediately after the press conference --
the genetics textbooks will have to be rewritten and the
therapeutic breakthroughs promised by the map of the
genome may not come for decades, if ever. No one at the
press conference disputed Venter's claims. That included
the editors of Science and Nature, who made brief
remarks.
Craig Venter's opening statement contained the
bombshell. Since last June, he said:
"[O]ur understanding of the human genome has
changed in the most fundamental ways. The small
number of genes -- some 30,000 -- supports the notion
that we are not hard wired. We now know the notion
that one gene leads to one protein, and perhaps one
disease, is false.
"One gene leads to many different protein products that
can change dramatically once they are produced. We
know that some of the regions that are not genes may be
some of the keys to the complexity that we see in
ourselves. We now know that the environment acting on
our biological steps may be as important in making us
what we are as our genetic code."
The old dogma of genetics, prevailing in the textbooks to
this day, was that one gene made one protein. George
Beadle and Edward Tatum won the Nobel Prize in 1958
for their formulation of this doctrine. Usually it is put,
"one gene, one enzyme," but an enzyme is a special kind
of protein, and today it is most often expressed as "one
gene, one protein." Now, in front of some of the
country's most eminent molecular biologists, Venter was
telling us that there may be ten times as many proteins as
genes. "Perhaps 300,000 proteins," he said at one point.
The genome consists of a string of four nucleotide bases,
symbolized by the letters A, C, G, and T, and the string
is over 3 billion letters long. Over 98 percent of the
genome appears to be inactive, consisting of
"non-coding regions," and some dismiss it as "junk
DNA." (Not Collins or Venter, however, who say we
just don't know what it does.) The intermittent "coding"
segments along the way are called genes, and they give
instructions for the manufacture of the body's proteins.
In people with heritable diseases, some of those proteins
are defective. So what the science of genomics would do
was find the defective genes along the genome string by
comparing the nucleotide sequence of sick people with
the genomes of well people. If the defect could then be
corrected, the protein made by that gene would be
restored to its healthy state. That is the underlying theory
of gene therapy.
Seemingly intractable problems have arisen, however,
and they have been well known to the gene hunters for
several years. There are indeed diseases that are caused
by a simple defect in the genome -- just as in a rare case
a single typographical error will radically alter the
meaning of a text. (But most "typos" are immediately
apparent as such and cause no defect in the reader's
understanding.) The genetic character of these diseases --
sickle cell anemia, cystic fibrosis, and Huntington's
Disease are among the best known -- was initially
established not by searching the genome but by tracing
them in family histories. This showed their predictably
inheritable character. When both parents carry a copy of
the "typo," there is a good chance that their child will
have the disease.
With many of these diseases the defective gene has
indeed been discovered. The problem is that a cure is still
no closer. "We've had our gene since 1989," Dr. Robert
Beall, president of the Cystic Fibrosis Foundation told
the Wall Street Journal last June. Yet no gene therapy
for the 30,000 cystic fibrosis sufferers has yet emerged.
In other words, while we have waited for the
"breakthrough" in mapping the human genome, so that
we may cure diseases, this "breakthrough" already
occurred for cystic fibrosis a dozen years ago, to no
effect. In the case of sickle cell anemia, the genetic defect
has been known for over 30 years, yet the disease can
only be treated by non-genetic therapy. It is the same
with all the other heritable diseases. Because the genetic
defect is in the germ-line, the "typo" occurs in every one
of the one hundred trillion cells in the body. The
problem for genetic engineering is how to get the
"corrected" gene into enough cells to make a difference.
It's an unsolved problem.
True genetic diseases are rare, appearing in only one or
two percent of all births. Therefore, it is said, they "do
not fit the business model." Nonetheless, millions of
dollars were spent by medical research institutions to
locate these genetic defects on the genome. Still, nothing
has come of these findings, beyond the patenting of
screening tests, which can be used to warn couples that
any of their children might have a one-in-four chance of
being born with a disease.
But the field of biotechnology can expect little or no
payoff from diseases that affect anywhere from a handful
to a few thousand people worldwide. As a result, some
years back the focus of gene therapy quietly shifted
toward far more common and potentially highly
profitable diseases; in particular cancer, heart disease,
AIDS, and Alzheimer's. Dr. Collins told us that his own
lab at NIH is engaged in a "huge and very complicated"
search for genes for adult-onset diabetes, the latest cause
celebre for gene hunters.
All along, however, the idea that these very widespread
conditions are caused by the sort of clear, isolated
genetic "misspellings" that seem to explain sickle cell or
cystic fibrosis was entirely speculative. In the case of
cancer, nothing definite has been found after a 20-year
search. Cancer researchers will tell you otherwise and
mean it, but it is not at all reassuring to learn that they
claim to have found over one hundred "oncogenes" that
"predispose" us to, or are "associated with" cancer; and
that about 30 defective "tumor suppressor" genes have
also been located. That is far too many to be useful, and
therapeutic benefits have been elusive. Some of these
genes have already been patented, which means that
companies can, once again, charge a monopoly price to
"screen for" the presence of these oncogenes. But their
causal role has never been established and probably
never will be. There has been a concerted effort to blur
the difference between those rare diseases that are plainly
and predictably heritable, and the common ones that are
not.
Contrary to what the headlines say, the genome has not
yet been decoded. It might never be, as the genome now
does not appear to be a code at all in the conventional
sense. It turns out that genes are not simple "strings,"
each one encoding for one message, but are
combinations of separated segments along the genome.
Between them lie intervening segments which can be cut
out by the cell, as it translates DNA into proteins, and the
relevant or coding parts (called exons, as opposed to the
intervening parts, which are called introns) can be put
together in numerous different ways. Gene therepy send
different messages and make a variety of proteins as the
occasion demands.
Imagine that an intelligence service were to discover
some unintelligible messages being sent by a spy. At first
the intelligence agents naturally assume they are looking
at a code. They assume the task of decoding will be
straightforward. But on closer analysis it turns out that
the message means one thing if the signal has been
received and acted upon, another thing if it has been
received and not acted upon, another thing if the
receiving apparatus is not switched on, and so on. Rather
than just a code the message is a bit like a set of rules for
a rather complex interactive game. There are feedback
loops, and circuits within circuits, and a lot of things
happening inside the cell but outside the genome, in the
unfashionable realm of cytogenetics. NIH-funded
geneticists don't even want to think about that, because
they thought that by sticking to the four nucleotide
bases, they had the problem neatly "digitized."
Computers would hum away unaided, 24 hours a day,
and unravel the mysteries for them while they slept.
We should have known that it would not be so simple.
Successful biological systems resist simple analysis for
the very same reason that they are successful. Every time
we gain greater knowledge of any such system we
discover that it is far more complicated, redundant,
self-healing, adaptable, and resistant to "single points of
failure" than it first appeared. If the functioning of the
genome were as simple -- and therefore easily
manipulated -- as the advocates of the genome project
have been implying, it would be impossibly fragile.
Significant genetic defects would be far more common,
assuming any organism based on such an easily cracked
and therefore easily corrupted code could survive in the
first place.
In the case of genetics, the illusion of simplicity arises in
large part from the genius of Mendel's insight in
constructing the original genetic metaphor. Studying
peas in a monastery garden, Gregor Mendel sorted them
by their outward traits (size, shape, and color), and,
observing that these traits appeared in the regular ratio of
three to one, he ingeniously posited internal "factors"
which occurred in "dominant" and "recessive" form.
These were the genes. No one actually observed them,
for no one then had the microscopes or the machinery to
do so. But the theory was that when the parental
contributions to reproduction are combined, one set of
genes from each parent, the factors or genes do not
"blend" but live on internally, one "dominating" the
other in the expression of the trait that each gene
controls. One gene, one trait. That was how a gene was
defined. Hence, it was said, we have genes "for" this trait
or for that; genes for skin color, for example. Where
blending is obviously real, as in the case of skin color (a
black mother and a white father have a child of
intermediate color), geneticists could just say that there
were several genes "for" that trait, and one or more from
each parent was expressed in the child. No one has yet
located the "skin color" genes on the DNA, incidentally.
Mendel made no claims about the structure of genes or
how they might accomplish their task. The Mendelian
gene was a hypothetical construct, possibly standing for
an infinitely more complex set of processes.
Along came Thomas Hunt Morgan at Columbia
University. The chromosomes had been discovered and
in 1902 Walter Sutton noticed that they came in pairs --
a convenient fit with the dominant-recessive
understanding of the gene. Morgan set out to "map" the
chromosomes, which is where the genes would have to
be located. He didn't get very far with that but he won
the Nobel for his valiant effort. The word Drosophila
began to appear in the newspapers. Hermann J. Muller
zapped a lot of fruit flies with x-rays and heritable body
changes were observed -- more Nobels for that. Enter
Watson and Crick. Their discovery of the double helical
structure of DNA worked nicely too, because it showed
how a complex message could be replicated. "Genes,"
henceforth, would be thought of as nucleotide sequences
along the string of DNA, which in turn was packed
inside the chromosomes.
This reformulated gene was not an entirely satisfactory
fit with the old Mendelian gene. But labels are powerful.
In the decades since, genetics has largely consisted of an
awkward attempt to combine under one name the
Mendelian gene of the late 19th century and the DNA of
the 20th century.
What DNA actually did was carry the instructions for
making proteins, as we have seen. And proteins were
not the same thing as the visible and outward bodily
traits, such as chins and noses and skin color, that the
Mendelian genes were said to make. Still, it was close
enough for government work, and university work; it
sustained the bandwagon of forward progress, and the
truth was that no one really had much more than a hazy
knowledge of these things anyway. And there was this
-- chins and noses and skin pigments were made of
protein, so the different versions of the gene could be
cobbled together, rather like a pantomime horse.
Actually we had Mendel and Morgan and Watson and
Crick all galumphing around onstage together under this
gene umbrella, and it held together pretty well
throughout the 20th century.
So the idea of one gene, one protein was a carry-over
from the earlier one gene, one trait formulation. And in
their work with bread molds in the 1940s, Beadle and
Tatum had seemed to confirm the idea that a gene is
something that performs a single task. Now, it begins to
look as though the gene is going to have to be rethought
completely. It is a task that has long been postponed. The
pantomime horse has begun to look ridiculous. Not only
has our understanding of the gene become massively
more complicated, the analysis of protein structure,
which now moves to the fore, is truly a daunting
prospect for biology. Proteins have 20 building blocks,
not four, and are arrayed in three dimensions, not along
one, like genes.
Both Celera Genomics and the government-funded
consortium have an interest in sustaining the old
"breakthrough" refrain; Venter to attract investors,
Collins to keep Congress happy. According to a
spokesman at NIH's National Human Genome Research
Institute in Bethesda, federal appropriations for the
human genome project have totaled $1.5 billion to date.
As government projects always do, it started small, with
$10.7 million appropriated for the Department of
Energy and $17.2 million for NIH, in 1988. "Frankly, it
was a gamble that we would be able to expand the
[research-dollar] pie," said Maynard Olson, head of the
publicly funded genome center at the University of
Washington in Seattle. But expand it they did. The
amount appropriated in fiscal year 2000 was $271
million, and the estimate for this year is $291 million. In
the process, a whole new institute at NIH was created.
"A lot of people don't mention those numbers," said the
NIH spokesman. In contrast, at the time of the White
House announcement last June, Celera said that it had
spent a total of $200 million on the project. The
government is now on a genome-spending path that will
consume more dollars every year than Celera has spent
overall.
And yet it was from Celera that the new understanding
came, not from those with government funds. The
contrived truce between the participants concealed this: It
was Celera that was willing to advance the new insight
even though it may end up undermining the company's
original premise and business plan. (And it was the
Whitehead Institute's Lander who had tried to stop
Science from publishing Celera's article.) With the real
world of investors to consider, Celera can less easily
survive in the cushioned, sometimes make-believe world
that government science has fashioned for itself. If there
was bad news, Celera needed it sooner rather than later.
Thus armed, perhaps Venter can adjust the business
model.
Nonetheless, Celera's message is not likely to comfort
investors. Gene therapy holds out less promise as a result
of this new understanding. At the press conference, a
journalist asked Francis Collins if the smaller number of
genes would make medical advances easier or more
difficult. "I would say easier," he said. Every gene search
is like trying to find a needle in a haystack. "Guess
what? The haystack just got three times smaller." But
when another journalist asked a similar question about
the genes interacting combinatorially, Collins retreated
from the haystack metaphor. The straw interacts with
itself, and the needle has other objects bumping into it,
he allowed. Craig Venter said more simply that when
you consider there is maybe a "tenfold expansion" in the
number of proteins compared to the number of genes, it
"does indicate increased complexity."
Celera will no doubt continue to sell its genome
information to the big research institutions, to the
pharmaceuticals, and to other biotechs -- for a while at
least. And the biotechs with patents will continue to
charge high prices to screen for "predisposing" genes; or
for the rare but real disease-causing defects.
To some biotechs the new announcement came as
something of an embarrassment. As Andrew Pollack
pointed out in the New York Times: "Incyte Genomics
advertises access to 120,000 human genes, including
60,000 not available from any other source. Human
Genome Sciences says it has identified 100,000 human
genes, and Double Twist 65,000 to 105,000. Affymetrix
sells DNA-analysis chips containing 60,000 genes."
Some of these genes have already been patented, but "if
genes are not the whole story," Pollack added, "it also
means those patents could be worth less." Or worthless.
Venter told the London Observer that the head of a
biotech company had phoned him in some distress
because he had already done a deal with SmithKline
Beecham to sell them the details of 100,000 genes.
"Where am I going to get the rest?" the man asked. How
long before someone starts comparing genes to tulips?
Last June, under the headline "50,000 Genes and we
Know Them All (Almost)," David Baltimore, the
president of Caltech and the winner of the 1975 Nobel
Prize for medicine, wrote that "humans have no more
genetic secrets; our genes are a book open to all to read."
But in the issue of Nature announcing the new analysis
of the human genome he wrote more soberly:
"We wait with bated breath to see the chimpanzee
genome. But knowing now how few genes humans
have, I wonder if we will learn much about the origins
of speech, the elaboration of the frontal lobes and the
opposable thumb, the advent of upright posture, or the
sources of abstract reasoning ability, from a simple
genomic comparison of human and chimp. It seems
likely that these features and abilities have mainly come
from subtle changesthat are not now easily visible to
our computers, and will require much more
experimental study to tease out. Another half century of
work by armies of biologists may be needed before this
key step of evolution is fully elucidated."
The old dream of reducing biology to physics, or of
believing that something simple -- four nucleotides on a
string -- could explain the vast complexity of the human
(or any other) body, has received a serious blow. A lot
of the companies involved in this search for meaning in
the genome were inordinately impressed with the idea
that, with just four "building blocks," an abundance of
computing power was all it would take. Strings of code
with tens of millions of bases could be searched and
analyzed in hours. Now, we may be looking at the
beginning of the end for genomics. The word
"proteomics" is already beginning to appear in
science-journal headlines.
Writing in the New York Times, Stephen Jay Gould
says that we may be now liberated from the "simplistic
and harmful idea" that each aspect of our being, "either
physical or behavioral, may be ascribed to the action of a
particular gene for' the trait in question." The collapse
of the doctrine of one gene, one protein, he added, "and
one direction of causal flow from basic codes to
elaborate totality, marks the failure of reductionism for
the complex system that we call biology. Organisms
must be explained as organisms, not as a summation of
genes." I think he is right.
Francis Crick was invited to attend the event. He wasn't
feeling up to it -- he is 84 -- so he sent along a
videotaped message instead. It was played that afternoon
at the packed, 500-seat Masur Auditorium at NIH. "We
foresaw very little of what happened in molecular
biology," Crick said. But the impact of the latest
development on medicine "will be enormous," and we
can only hope that "it will bring us on balance more
good than evil." Watson then stood up before the NIH
employees and with his customary tactlessness noted that
Crick was "in perfect health, as you could see." He just
doesn't like to travel. "He is one of the 20 percent who
has had a heart bypass and whose brain isn't affected,"
Watson added.
After the press conference, I spoke to Watson briefly. A
crowd was clustered around him, some of the journalists
getting him to sign the issue of Nature in their press
packets. A black woman asked him how long before we
would see medical results from the genome. He said
something about our children, then corrected himself:
"Our grandchildren will have better lives because of
what we are doing," he said. "You're worried about
breast cancer," he said to the woman. "I'm worried
about senility." He is 73. Forty-eight years have passed
since his publication of the double helix with Crick. I
asked him if the two concepts of the gene could continue
to coexist. He said he didn't think they differed all that
much. He analogized the shift in understanding to
Einstein's modification of Newtonian physics. "We have
relativity but it doesn't affect the way artillery works," he
said. Newton was still largely right. When I asked
Venter the same question, he gave me to understand that
this would take too long to discuss, but that the question
was pointing us "in the right direction."
I asked Watson if one gene can give rise to ten proteins.
"Some genes can give rise to 50 different proteins," he
said. No problem! He was unruffled, content, still in
control at mission central. The new knowledge about
genes and proteins would be smoothly integrated into
the received wisdom of molecular biology, apparently.
The higher councils were taking it all in stride. There
was no cause for alarm. But those who would like to put
the news to medical use are furrowing their brows.
|
|