Selling DNA Data
by William Wells
Incyte doesn't need to make drugs. It already makes
millions of dollars selling a bunch of As, Cs, Gs and Ts.
Human cells are almost ridiculously tiny and efficient. Every one of them
has an entire genetic instruction set - three billion As, Cs, Gs and Ts - packed
into a nucleus one hundredth of a millimeter across. And a few cents worth
of carbon, nitrogen and phosphorus will keep millions of cells happy in
a dish, reproducing themselves and their DNA data.
Incyte Pharmaceuticals, Inc. (Palo Alto, Calif.) also stores DNA data, but in a way
that is somewhat less compact and far more likely to impress a computer
geek. Ground-zero for Incyte is a highly air-conditioned room filled with
scores of black, towering supercomputers worth up to a million dollars each.
Nine of the ten largest pharmaceutical companies in the world each pay Incyte
~$5 million per year to look at the data on those computers. The money has
brought an expanding workforce - from 160 to 675 employees in the last two
and a half years - and a frantic effort to keep generating more and more information.
"We're maybe a year away from going bankrupt if we don't pursue new
technologies at any one point," says Tod Klingler, Director of Research
Bioinformatics at Incyte.
The manpower and computer power is needed to tame an ever-growing mountain
of DNA sequence information. "We now have 24 organisms completely sequenced -
we have their entire genetic message written down," says Temple Smith
of Boston University. "This information allows us to get information
about one organism and use that to understand another."
Evolution, the original Xerox machine
Incyte's DNA databases are valuable because nature is thrifty. As single
cells evolved to make mice and then men, many genes were kept on to do their
old jobs. Incyte can line up the genes and recognize the similarities in
the DNA sequences. (A DNA sequence is the order of the several thousand
A, C, G and T nucleotides that make up a single gene.) If the function of
a yeast gene is known, the function of the related human gene is, by implication,
similar. And yeast genes are easier to study. "We don't put humans
in Waring blenders and do experiments," says Smith, "but we do
put yeast in Waring blenders."
"People have been blown away in recent years by the sequence comparisons,"
says Smith. "For example, in the fly the homeobox genes control what
end is the mouth and what end is the ass, which apparently is very important.
The same genes are used in mice, in exactly the same order. Once biology
found a way to get a front and a back, she never did it again."
Pharmaceutical companies are worried about drugs not evolution. But drugs
for complex diseases do not spring forth fully formed. Modern drug hunting
means finding a chemical that jams a specific protein, and the first step
is to find that protein target. Only then can you look through thousands
of chemicals to find the one that turns the protein off. If the protein
is needed for the virus to invade the cell, or for the cancer cell to multiply,
then the chemical is your new drug. "In the past it might have taken
a long time to find one drug target, but now you can quickly get multiple
targets," says Klingler. "The priming of the pump is no longer
The most important proteins, like those that tell a cell when and how to
grow, are often remarkably similar from yeast to man. In fact a number of
human genes can replace their yeast counterparts and keep the yeast alive
(some of these genes
are listed on the Web). A drug company that searches
the Incyte database and finds the human version of a yeast growth protein
would count itself lucky: Chemicals that can turn off that protein might
have anti-cancer activity.
The immune system has no counterpart in yeast cells, but targets for anti-inflammatory
drugs can also be found in the Incyte databases. These searches rely on
another result of nature's thriftiness: proteins come in families. Immune
cells, for example, must send countless messages to each other. Rather than
invent a completely new protein for each message, evolution has Xeroxed
its first effort and made minor changes. Even if one of these messengers
is useless as a drug target (perhaps turning it off shuts down the whole
immune system), it may allow you to find its cousin, which is specific to
the auto-immune disease lupus.
The power of DNA databases mushrooms with their size. Every new piece of
DNA sequence means not only one more possible drug target, but also one
more sequence to compare all the old sequences against. "The sequence
comparison method is not new," says Smith. "What is new is how
much data we have."
Sequence, sequence, sequence
In simpler days, before people made up words like genomics and bioinformatics,
a single gene was enough to keep a graduate student occupied for all five
years of a doctorate. Sequence data trickled onto Genbank one gene at a time, even as the methods for obtaining
that sequence became simpler and simpler. Then a renegade called J. Craig
Venter had an idea: stop thinking about what DNA you should sequence and
start sequencing anything you can get your hands on. Reasoned thought was
out; brute force was in.
But there was some logic to Venter's approach. Genes are mere islands in
a cell's DNA, stranded among seas of nonsensical filler DNA. The Human Genome
Project promised to (eventually) sequence everything. (A genome is the collection
of all the DNA in a given cell.) Venter wanted to fish out the informative
bits first - just enough of each gene to take a guess at its place in the
running of the cell. It is proteins that do the work of a cell, but proteins
are made only after genes are converted to mRNA, which is then converted
to protein. Venter took the mRNA and transformed it back into DNA that was
ready to sequence and devoid of non-gene junk. After sequencing at most
a few hundred nucleotides of each piece of DNA, he had his expressed sequence
Venter founded The Institute for Genomic
Research (TIGR; Rockville, Md.) in July 1992,
with $85 million of funding promised over ten years by Human
Genome Sciences, Inc. (HGS; Rockville, Md.). Within
a year, TIGR claimed it had identified ESTs for over half the estimated
70,000 human genes. TIGR and HGS parted ways in 1997: TIGR is now a not-for-profit
institute with government funding, and HGS has focused on patenting genes
(several hundred applications so far, with over fifty patents allowed) and
developing the corresponding proteins as drugs.
Incyte began as a traditional pharmaceutical company. But when the failure
of its premier drug in clinical trials coincided with Venter's EST splash,
Incyte decided to re-invent itself. "We became basically a factory
for sequencing DNA," says Klingler.
Four years, six months, and three million human ESTs later, the sequencing machines are still running. The sequencing room is a far cry from
the deserted computer room: people scurry everywhere to tend to the rows
of sequencing machines. This room generates all the data that makes the
company run, but the work is repetitive and the workers - many of them college
students - are expendable. "There is a whole new temporary biologist
market," says Klingler. "I don't know how long the average technician
stays, but it's not too long."
Incytes human database is called LifeSeq. The ESTs come from the
mRNA of 669 different tissue samples, some of them diseased and some of
them not, and represent perhaps 90-95% of all human genes. Genes that are
often made into mRNA have been sequenced thousands of times, but some genes
that are rarely converted into mRNA remain to be sequenced once. Incyte
is also using the short ESTs to find the entire length of every human gene,
and is working out where each gene lies in the 24 human chromosomes.
Newer databases include PathoSeq, which has most of the genes from 32 bacterial
species, and ZooSeq, which includes genes from mice, rats, monkeys, and
soon dogs. The sequencing operation that feeds these databases generates
~200,000 pieces of sequence, or over 40 million DNA nucleotides, every single
Incyte could never have grown to this size by itself. It has aggressively collaborated
with, licensed from and acquired companies that can provide:
(Science Applications International
Corp. (San Diego, Calif.))
- improved sequencing methods (GeneTrace
Systems Inc. (Alameda, Calif.) and Molecular Dynamics Inc.
- gene mapping
(Genome Systems Inc. (St. Louis, Miss.) and Vysis
Inc. (Downers Grove, Ill.))
- better computer programs
to recognize important DNA sequences
- software to integrate the records of the patients who supplied the tissue samples (Oceania Inc. (Palo
- software to suggest how a protein may look
in three dimensions based on the sequence
of its gene (Molecular Simulations
Inc. (San Diego, Calif.))
- ink-jet technology
to put DNA on chips (Combion Inc., formerly of San Diego, Calif. and now part of Incyte)
- the ability to make chips with tens of thousands of pieces of DNA arranged on their
surface (Affymetrix Inc. (Santa Clara, Calif.) and Synteni
Inc. (Fremont, Calif.))
Many of these collaborations are aimed at adding more bells and whistles
to the databases. Any researcher or high school student can compare his
or her favorite gene with public databases like GenBank
or dbEST, using a common search method called BLAST, so
the Incyte database must stand out. For starters, says Klingler, "our
customers can get the first look at 40,000 human genes that are in no other
database. And Genbank is like a snapshot - the data may be true when it
is entered but it never gets updated based on new information coming in."
Genome projects are pouring new sequences into public databases, so the
advantage of having more genes will not last for long. "They can't
keep all this data secret but they don't care, because patent protection
is lead time," says Smith. "If they know something six months
ahead that's enough - then you can tell everyone everything."
That game has an inevitable conclusion. As Mark Fishman, a biologist at
Massachusetts General Hospital (Boston, Mass.), observed at a recent genomics
meeting in San Francisco, "The problem with defining a target like
sequencing the genome is that you might succeed and then be out of a job."
Counting the clones
Incyte's escape clause is called expression analysis. Just by sequencing
an insane number of ESTs the company gets a rough sense of how often each
gene is expressed, i.e., made into its corresponding protein. In the pancreas,
for example, the insulin gene will be turned on to make mRNA and then insulin
protein. There will be hundreds of insulin mRNAs, and so hundreds of insulin
ESTs, from pancreatic tissue, but no insulin ESTs from skin tissue. The
Incyte database has this sort of information for almost all genes in almost
"Now we can do biological research with a picture of the entire human
genome," says Klingler. "The classical approach is to look at
one gene at a time. Having a peek at all the human genes will change the
way you look at a problem. You can take a disease tissue and find all the
genes that you see only in asthmatic lungs. That's never been possible in
Counting up ESTs is what Klingler calls "low resolution" information.
The future lies in chips that can hold tens of thousands of genes arrayed
in a neat grid. A chip with every one of the 6116 genes of brewers yeast
has just been made by Joe DeRisi and Patrick Brown of Stanford University,
and any number of researchers and companies are busy lining up
collections of human genes. Those who are keen (and have a lot of spare
time and $25,000 for parts) can even make their own chips and chip-readers
instructions. The two leading chip companies are
which in a confusing turn of events is both collaborating with Incyte and
suing it for patent infringement, and Synteni, which was bought by Incyte last January.
|Link to the home page of the scientists who made this chip|
Researchers using the chips first collect mRNA from two different sources,
such as diseased and non-diseased tissue, or normal and drug-treated cells.
The mRNA from diseased tissue can be labeled with a green dye, and non-diseased
mRNA with a red dye. The mRNAs are then allowed to stick to their corresponding
genes on the chip. If there is far more of mRNA from gene 216 in the diseased
state, position 216 will light up green, but if there is more mRNA 216 in
the normal state it will be red. Equal expression gives a yellow spot. With
one experiment the researcher can tell how every gene has reacted to the
The flood of data from these methods is just beginning. "Probably 99%
of the data collected using this technique haven't been published yet,"
says Brown. "It's a fast-moving and exciting field." Brown is
looking at how yeast coordinate switching hundreds of genes on or off when
they have more or less food, but the pharmaceutical companies will be looking
at their favorite drug target. If the gene they proposed as a breast cancer
target is also turned on in pancreatic cancer they should expand their clinical
trials. And, given a choice, they should opt for the target that is not
made in the stomach or blood, to minimize the chances that their drug will
cause digestive and immune problems.
Finding a made-to-order gene that is on in one situation and off in many
others used to be either a fluke or impossible. The chips make it a matter
of a few experiments. That makes researchers like Klingler ambitious. "Our
real goal is to understand the molecular basis of human biology," he
says. "That's not going to happen in a traditional molecular biological
way, one gene at a time."
Originally published in the web magazine Access Excellence.