Sequence 3 Sequencing

THE HUMAN GENOME:

POEMS ON THE BOOK OF LIFE

GILLIAN K FERGUSON

Sequencing

Reading the book of life

‘Sequencing the human genome depended on many technological improvements in the production and analysis of sequence data. Key innovations were developed both within and outside the Human Genome Project. Laboratory innovations included four-colour fluorescence-based sequence detection, improved fluorescent dyes, dye-labelled terminators, polymerases specifically designed for sequencing, cycle sequencing and capillary gel electrophoresis. These studies contributed to substantial improvements in the automation, quality and throughput of collecting raw DNA sequence.‘ International Human Genome Sequencing Consortium

‘Rapid sequencing - Sequencing methods are now highly automated and miniaturised. Industrial-scale sequencing laboratories house hundreds of robots serviced by teams of technicians. Most genome sequencing projects are internationally organised and funded because, although they are large, they need be performed only once. The reams of data they generate – in essence, sequences of millions and millions of ‘letters’ of coded information – are not published in scientific journals but on the internet so that scientists around the world can have immediate access to them.‘ Demystifying Genomics, Medical Research Council

‘The overall sequencing output rose sharply during production. Following installation of new sequence detectors beginning in June 1999, sequencing capacity and output rose approximately eightfold in eight months to nearly 7 million samples processed per month, with little or no drop in success rate (ratio of useable reads to attempted reads). By June 2000, the centres were producing raw sequence at a rate equivalent to onefold coverage of the entire human genome in less than six weeks. This corresponded to a continuous throughput exceeding 1,000 nucleotides per second, 24 hours per day, seven days per week. This scale-up resulted in a concomitant increase in the sequence available in the public databases.’ HYPERLINK "http://www.nature.com/nature/journal/v409/n6822/full/" \l "International Human Genome Sequencing Consortium" International Human Genome Sequencing Consortium, Nature, 2001

‘How the full genetic code of an organism is read - Whether bacterium or human, the genome of any organism is too large to be deciphered in one go. The genome is therefore broken into smaller pieces of DNA, each piece is sequenced and computers fit all the sequences back together. For example, the human genome is about 3 billion base pairs, arrayed in 24 chromosomes. The chromosomes themselves are 50–250 million bases (megabases) long. These tracts of DNA are much too large for even the latest automated machines, which sequence fragments of DNA between 400 and 700 bases long. The genome is first broken into conveniently sized chunks, fragments of about 150 kilobases. Each fragment is inserted into a bacterial artificial chromosome (BAC), a cloning vector used to propagate DNA in bacteria grown in culture. The BACs are then mapped, so that it is known exactly where the inserts have come from. This process makes re-assembling the sequenced fragments to reflect their original position in the genome easier and more accurate, and any one piece of human DNA sequence can automatically be placed to an accuracy of 1 part in 30 000. Each of the large clones is then 'shotgunned' - broken into pieces of perhaps 1500 base pairs, either by enzymes or by physical shearing - and the fragments are sequenced separately. Shotgunning the original large clone randomly several times ensures that some of the fragments will overlap; computers then analyse the sequences of these small fragments, looking for end sequences that overlap - indicating neighbouring fragments - and assembling the original sequence of the clone. An alternative approach, 'whole genome shotgun sequencing', was first used in 1982 by the inventor of shotgun sequencing, Fred Sanger, while working on phages (viruses of bacteria). As its name suggests, in this technique the whole genome is broken into small fragments that can be sequenced and reassembled. This method is very useful for organisms with smaller genomes, or when a related genome is already known.’ Wellcome Trust, 2006

‘A caterpillar spits out a sac of silk/ where it lies entombed while its genes/ switch on and off like lights/ on a pinball machine.’ Alison Hawthorne Deming, Genetic Sequence

‘Celera Sequencing - 60 million overlapping fragments, each 2,000 to 10,000 bases long

Human Genome Project Sequencing - 22,000 fragments, each 100,000 to 300,000 bases long

Time to assemble 12,000 bases: 1980 – more than a year; 1997 - 20 minutes; 2000 - one minute,

12 000 letters of DNA now decoded by the Human Genome Project every second – 20 years ago, it took one year.’ Figs, BBC Science online

‘Meeting Human Genome Project sequencing goals by 2003 has required continual improvements in sequencing speed, reliability, and costs. Previously, standard methods were based on separating DNA fragments by gel electrophoresis, which was extremely labor intensive and expensive. Total sequencing output in the community was about 200 Mb for 1998. In January 2003, the DOE Joint Genome Institute alone sequenced 1.5 billion bases for the month.’ Human Genome Project, US

Accelerated Sequencing

Accelerated learning; hothousing

toddler to child - adult, Professor,

in nano-flickers;

dust and friction

blowing from the turned pages,

looking like pollen, air-sparkle;

fingers whirring in a starry blurr,

aped by the computer’s infantile

mind, this circuited sophisticate

that gets it all, but cannot know

or think about it; celebrate

what it means - the beauty.

The following wind reeks

of discovery; crazy jalopy

ridden on three, two wheels -

driving pitted, pocked roads,

highway by way of byway;

still precisely calibrated -

river and tributary sailed

in lunatic barge careering

faster and faster, splashing

through the primaeval goo,

but spewing such jewels, images -

such old prints, gold and treasures,

as if a world of Tutankhamen tombs

were all unearthed together, visible -

and lifetimes needed now to decipher

such wealth - brilliant trophies for all.

‘However, the data being produced by either of the two sequencing efforts is not as comprehensive or reliable as is often suggested. The first drafts will consist of 90% of the human genome to an accuracy of 99.9%. That is an error rate of 0.1% which, theoretically, could obliterate the entire diversity of the human race. The groups believe this level of precision can be achieved by repeating their work four or five times. But the "gold standard" of reliability set by the Human Genome Project, 99.99%, will require tenfold duplication - another two or three years work. And even then, as much as 10% of the genome will remain intractable to current technology. Much of this missing data will be short, repetitive sequences of bases that jam up the current DNA processing technique. These repeats are unlikely to hold important instructions. However, bacteria are used to multiply the chopped up human DNA and there may be sequences that prevent the microbes from copying the fragments. These pieces could contain active regions.’ BBC, 2000

‘The Sanger Institute has over 1500 devices installed on its network: we have more than 250 PCs and some Macintosh systems and a total of more than 700 64-bit Alpha processors. The Sanger Institute main sequence storage has more than 22 Terabytes (22,000 gigabytes) capacity. The system that serves sequence searches (the BLAST farm) has more than 400 'nodes'.’ Wellcome Trust Sanger Institute

‘In reviewing the past decade, it is clear that genomics was, and still is, driven by innovative technologies, perhaps more so than any other scientific area in recent memory. From the outset, computing, mathematics and new automated laboratory techniques have been key components in allowing the field to move forward rapidly. We highlight some key innovations that have come together to nurture the explosive growth that makes a new era of genomics a reality. We also document how these new approaches have fueled further innovations and discoveries.’ Nature, 2003

‘Our ability to explore genome function is increasing in specificity as each subsequent genome is sequenced. Microarray technologies have catapulted many laboratories from studying the expression of one or two genes in a month to studying the expression of tens of thousands of genes in a single afternoon… The study of sequence variation within species will also be important in defining the functional nature of some sequences. Effective identification and analysis of functional genomic elements will require increasingly powerful computational capabilities, including new approaches for tackling ever-growing and increasingly complex data sets and a suitably robust computational infrastructure for housing, accessing and analysing those data sets. In parallel, investigators will need to become increasingly adept in dealing with this treasure trove of new information. As a better understanding of genome function is gained, refined computational tools for de novo prediction of the identity and behaviour of functional elements should emerge. A Vision for the Future of Genomics Research, US National Human Genome Research Institute, 2003

‘Over the past twenty five years, a mere sliver of recorded time, the world of biology - and indeed the world in general - has been transformed by the technical tools of a field now known as genomics. These new methods have had at least two kinds of effects. First, they have allowed scientists to generate extraordinarily useful information, including the nucleotide-by-nucleotide description of the genetic blueprint of many of the organisms we care about most - many infectious pathogens; useful experimental organisms such as mice, the round worm, the fruitfly, and two kinds of yeast; and human beings. Second, they have changed the way science is done: the amount of factual knowledge has expanded so precipitously that all modern biologists using genomic methods have become dependent on computer science to store, organize, search, manipulate and retrieve the new information.’ Harold Varmus, Nature, 2003

Compressed, the Human Genome would fit onto an ordinary CD

Computer Sequencing of the Human Genome

Such sudden rushing into light; funnelling, sorting, crunching -

how to put the visible bright white star of a human eye through

the technological mincer to learn the holy recipes for sight -

re-assemble molecules to match blue sky, glue the shattered

Ming vase, shred Arabian carpet; digging up ancient treasures

from our living tomb, dreaming catacombs where we have lain,

wing and tail, mother cells of earth and water, four billion years;

turning out our rooms - unravelling the picture we already knew

was more like Seurat, Pointillism or Monet’s Impressionist vision,

than this creamy seal of vivid realism, seemingly smooth stablility;

more like energy, responses to light - but could not see, even

if we believed, ground down beyond our frantic molecules –

our factories, chemicals, to these patterns stitching us in reality,

at this particular time - weaving our decorative skin and hair –

shining these autographed eyes, knitting us together – now whole,

but stripped of the breathing detail, more naked than fish skeletons

laid on a white beach by the North Sea - truncated in dead letters

spelling life, streaming from a machine. Our own holy sequence,

cracked code of humanity evolved from everything that has lived -

that came to be, is, will be; secrets kept in the organic treasure box

through all Earth’s time, her glorious struggle to create more life –

until this binding spiraled string is virtually unwound, and graven

symbols, simple as a cave painting - first artistic impulse to record,

come into light; a reduction of life’s culture, minimalist poem of us.

‘The completion of the first draft of the human genome is an awesome technological feat. How did the researchers of the Human Genome Project consortium go about the task? …The acceleration in human genome sequencing has been remarkable. The original target date for completion was 2005, but in 1995 the timetable was revised to 2003, with a ‘first draft’ sequence to be produced in 2001. Even in October 1998, only about 6 per cent of the human genome had been completed. And then in 1999, with researchers hungry for useful sequence and lots of it, the publicly funded Human Genome Project announced that the draft would be produced in 2000. At first, some researchers were worried that a draft sequence would lower standards, but these concerns were allayed by the rapid flow of useful information into the DNA databases - all sequence is released onto the Internet within 24 hours - and by the ‘tasters’ of the final product provided by the publication of chromosomes 22 and 21 in December 1999 and March 2000, respectively. Worries were put to rest when, on 26 June, the Human Genome Project consortium announced that the milestone of the working draft had been reached. The working draft covers about 90 per cent of the euchromatic part of the genome (which contains most of the genes), has some gaps in the sequence and an error rate of about 1 in 1000. Each part of the working draft has been sequenced four or five times. To reach the ‘gold standard’, the sequencing is repeated about ten times - producing a final sequence with almost no gaps and an error rate of less than 1 in 10 000. Indeed, the largest sequencing factories such as the Sanger Centre achieve error rates of fewer than one mistake in 100 000 bases of finished sequence. The method used for much of the genome is ‘shotgun sequencing’ which, in essence, involves breaking the genome up into conveniently sized chunks. The total size of the human genome is estimated to be about 3 billion base pairs, arrayed in 23 chromosomes. The chromosomes themselves are 50-250 million bases (megabases) long, large to be sequenced directly (automated machines sequence fragments of between 400 and 700 bases), so the Human Genome Project fragments them into chunks of about 150 kilobases. Each of these large clones is then ‘shotgunned’ - broken into pieces of perhaps 1500 base pairs, either by enzymes or by physical shearing - and the fragments are sequenced separately. Shotgunning the original large clone randomly several times ensures that some of the fragments will overlap; computers then analyse the sequences of these small fragments, looking for end sequences that overlap - indicating neighbouring fragments - and assembling the original sequence of the clone.’ Wellcome Trust

‘Darwin studied the originating sequences, not the stable origin (if such existed). That paradox released him from the final task of challenging first causes, and yet usefully destablized any insistence on design or plan. ‘ Gillian Beer, Introduction to the Origin of Species, 1859, Oxford University Press, 1988

‘How does it work? - The DNA to be sequenced is provided in single-stranded form. This acts as a template upon which a new DNA strand is synthesized. DNA synthesis requires a supply of the four nucleotides (the building blocks of DNA), the enzyme DNA polymerase and a primer (a short sequence annealed to the template which initiates the new DNA strand). The nucleotides added to the growing DNA strand are complementary to those in the template strand. Sequencing is achieved by including in each reaction a nucleotide analogue that cannot be extended and thus acts as a chain terminator. Four reactions are set up, each containing the same template and primer but a chain terminator specific for A, C, G or T. Because only a small amount of the chain terminator is included, incorporation into the new DNA strand is a random event. Each reaction therefore generates a collection of fragments, but every DNA strand will end at the same type of base (A, C, G or T). The primers or nucleotides included in each of the four reactions contain different fluorescent labels allowing DNA strands terminating at each of the four bases to be identified. The reaction products are then mixed and separated by gel electrophoresis, which separates DNA molecules according to size even if they differ in length by only a single nucleotide. As the DNA strands pass a specific point, the fluorescent signal is detected and the base identified. The whole process can be extensively automated.’ Wellcome Trust

What were the genomic sequences like?

Were particular genetic letters pulsing?

Red, pressurised chemical fire; the hot

organic boiler of the heart, bright blood words

glowing - as Christ’s heart is shown revealed -

with something of the Sun; a furious flower -

hot sunflower in a broiling field, bullet-hole

poppy, burning red rose-in-snow;

daffodil trumpet prooting spring

as actual living symbol;

a whole body hallelujah.

Were eye genes clear - shining, luminous baubles,

bright sign of the human lens, like a sister of ice -

and the milky white word of the cornea - black

muscle of the light-grown pupil; an elastic hole,

shifting black vowel in the blue petal heart

of an iris eye-flower, written in sky colours.

Was the hand sentence shaped like a rooted star -

leaf-on-stalk, glittering with promises of multiple

skills; were the labyrinthine symbols for crucible

brain marked live blue, standing for the electrical

spark, chemical messages wiring organic electricity;

shimmering with intention, wit, action, imagination.

Was there a light; some shine without cipher -

no longer there when all the rest was put back,

reassembled to create a whole man -

his book; something crucial missing.

‘Clone-by-clone sequencing - how is the sequencing done? The first step is to make a library of the DNA. Like your local library, a DNA library is a collection of documents: in this case, they are DNA molecules. In the Human Genome Project, the DNA library made from pieces of human DNA joined to special DNA sequences to produce Bacterial Artificial Chromosomes or BACs, each of which can contain about 200,000 base-pairs of DNA. The BACs are then positioned on the maps produced earlier, so researchers know which part of the genome is in which BAC: they then begin sequencing them. Each BAC is cut into smaller fragments, each about 2000 base-pairs long. By breaking lots of copies of the BAC at random, a large set of onverlapping smaller fragments is produced….The smaller fragments are then joined to small DNA sequences that allow them to be grown in bacteria in the lab. The isolated pieces of DNA are called pUC clones: it is the ends of each of these clones that are sequenced. After growing the bacteria, the DNA is isolated and the DNA sequencing reactions are carried out. DNA sequencing reactions can only 'read' about 500 bases at a time, so lots of reads are made to cover each 200,000-base-pair BAC.’ Your Genome

If we broke Van Gogh’s sunflowers

into yellow, gold and black pieces -

after mapping their straining outlines;

analysing every brushstroke, pigment,

mineral, sun-coloured molecule -

re-assembled the data, we would

not have them. They would be lost -

how they burst from their paint skins,

barely contained by representation

in this three dimensional world -

because what he has painted is life,

flaming out from cut heads, already

seeing death with huge black eyes;

a whole summer’s sun disgorging.

‘In short, genomics has become a central and cohesive discipline of biomedical research. The practical consequences of the emergence of this new field are widely apparent. Identification of the genes responsible for human mendelian diseases, once a herculean task requiring large research teams, many years of hard work, and an uncertain outcome, can now be routinely accomplished in a few weeks by a single graduate student with access to DNA samples and associated phenotypes, an Internet connection to the public genome databases, a thermal cycler and a DNA-sequencing machine.’ A Vision for the Future of Genomics Research, US National Human Genome Research Institute, 2003

‘When is a chromosome finished? The international consortium has agreed on three criteria: more than 95 per cent of the chromosome must be sequenced, the number, location and size of remaining gaps must be pinned down, and individual gaps must be shorter than about 150 000 bases. The first two chromosomes to be finished surpassed these criteria. The 33.5 megabase sequence of chromosome 21 covers 99.7 per cent of the long arm, has only ten gaps totalling 100 000 bases, and has an estimated accuracy of 99.995 per cent. For making sense of the genome be useful, the sequence needs to be ‘annotated’ - new sequences have to be mapped onto their locations in the genome, genes identified, and genes linked to any known information about what they do or might do. But in the last year or so, the flow of human genome sequence has swelled from a stream to a tidal wave, and the DNA databases are being flooded with 10 000 DNA letters every minute. Manual approaches would be overwhelmed, so groups at the Sanger Centre and its next-door neighbour the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI, which hosts some of the world’s largest databases of genomic information), developed a new software system and database called Ensembl. This program automatically annotates new sequence by searching other databases for identical or similar sequences and then makes a prediction or confirms the presence of a gene…Both the data and the program source code for Ensembl are free and distributed via the Internet - a model similar to that used to develop the popular LINUX computer operating system.’ Wellcome Trust

Sequencer

I have stitched a string to harvest stars,

count them - like drawing twisted silk

through groping, invisible fingers –

feeling each large molecule within

my palm - plotted, ready for opening,

like the excitement of oysters; among

chaos, like a broken string of pearls,

cathedral crumbled into stones; vast

exploded mosaic. I am Cubist and

Impressionist, master-embroiderer;

gambler, reader, translator, writer –

pattern glues about the opened stars,

bright sparks; instruction fizzing,

as flowers open from a dull bulb,

as brown birds will blossom wings;

for here it is written, all encrypted -

and I am the breaker of codes,

the eye in organic darkness -

here is the mother of all codes,

ultimate enigmatic sequence.

I am speeding as my skill grows -

my eyes flashing like peacock tail;

tears are like liquid light running

down my skin, into the black hole

of my mouth - where inadequate words

come, iced with poetry; lumpy metaphor

stumbling over my blabbering tongue -

how will I teach what I have seen here

when they do not speak my language;

how will I translate such leaked light,

words of these shimmering chemicals,

recipes for hands and eyes, liver, heart.

More strings keep reeling from the subject -

like handkerchiefs from a magician’s pocket;

unravelled like an old jumper, then re-knitted,

pattern saved - demonstrating one size fits all.

‘Today’s biological research generates a huge quantity of data. This is growing exponentially with a shift in emphasis from individual biomolecules, to analysis of how they interact in complex networks which control the developmental and physiological processes of whole biological systems, and research into how this relates to human health. This transition has increased the importance of bioinformatics and raises key challenges which make it imperative that computer scientists work closely with biologists to refine existing bioinformatics tools and develop new ones….Bioinformatics is a crucial component of MRC’s post-genome challenge to translate new information about genes and proteins into improved healthcare. Bioinformatics tools allow researchers to collect huge amounts of information about biological structures, mechanisms and systems, and human health, and to look for underlying patterns and relationships which can help them to answer questions about how biological processes normally work and how they go wrong to cause disease. Improved ability to study the biology of integrated systems and to link biological data with clinical information will hasten the medical application of new knowledge, and requires the development of user friendly I.T. tools for: Improved gathering, management, storage, interrogation and sharing of biological data and co-ordination, integration and cross-talk between databases, so that biological data of different kinds, and from multiplesources, can be combined and analysed; Computational analysis and integration of knowledge about genetic information and protein structure and function in humans and other species; Developing statistical and mathematical genetics techniques, and integrating genetic information with data fromstudies of disease patterns, disease risk and health care in human populations; Computational modelling of biological systems at various levels of complexity.’ Medical Research Council, UK

Devloping technology that can keep us up

with the lushness of Nature, her creations

in process - sequencing from the start

of time, forces that moved among stars,

coaxed the cell to stay particular; code

itself for the future, for us. Four billion

years to be deciphered - computers groan,

ask for more brain, system, new memory,

to glimpse this glory - this amplitude -

even stripped to symbol, such richness,

so much information; the book of the eye

would take six lifetimes to read perfectly,

understand everything that has happened there

to make that bright organic glass, transparency;

even what light is, where it came from, why it is.

Or how plants became green, and us black, white,

mixed in genetic families - sharing love and disease,

tolerance and allergy, immunity, cure, susceptibility;

one family under scrutiny, with four billion

members - one billion for each billion years

of evolution - epidemic of people making

Earth sick; just becoming vermin, parasite

instead of life’s art, glorification of our planet,

stupendous creation, observing amazing facts.

‘The NHGRI will continue to support genomic sequencing, focusing on the genomes of mammals, vertebrates, chordates and invertebrates; other funders will support the determination of additional genome sequences from microbes and plants. With current technology, the NHGRI could support the determination of as much as 45–60 gigabases of genomic DNA sequence, or the equivalent of 15–20 human genomes, over the next five years. But as the cost of sequencing continues to decrease, the cost/benefit ratio of sequence generation will improve, so that the actual amount of sequencing done will be greatly affected by the development of improved sequencing technology.The decisions about which genomes to sequence next will be based on the results of comparative analyses that reveal the ability of genomic sequences from unexplored phylogenetic positions to inform the interpretation of the human sequence and to provide other insights. Finally, the degree to which any new genomic sequence is completed - finished, taken to an advanced draft stage or lightly sampled - will be determined by the use for which the sequence is generated. And, of course, the NHGRI's sequencing programme will maintain close contact with, and take account of the plans and output of, other sequencing programmes, as has happened throughout the HGP.’ A Vision for the Future of Genomics Research, US National Human Genome Research Institute, 2003

‘However, the real point of the Tigr/TCAG exercise was a proof of principle one, Dr Kirkness said. By using a "rough sketch" approach, science could take a glimpse at the many mammalian genomes deemed interesting but which would never receive funding for a full-scale decoding effort, he added. "Even for higher-level coverage, complete sequencing can now be done in a year," Dr Kirkness explained. "But for the level of coverage we report with the dog, you could envisage doing half a dozen mammalian genome sequences a year." It is a point echoed by Dr Stephen O'Brien from the US National Cancer Institute: "NHGRI recently estimated that in the next four years, US sequencing centres alone could produce 460 billion bases - the equivalent of 192 dog-sized genomes at [just under the Tigr/TCAG] coverage." Commenting on the latest work, Dr Matthew Binns, a UK dog geneticist, said: "This will be useful in studying common diseases such as cancers and heart disease. But it will be superseded in a few months' time when we're expecting the public Dog Genome Project to be completed.’ BBC, 2003

‘This fall, researchers at Whitehead Institute will test new technology that could aid these and other endeavors. The BioMEMS 768 Sequencer can sequence the entire human genome in only one year, processing up to 7 million DNA letters a day, about seven times faster than its nearest rival. Scientists began working on the project in 1999 with a $7 million National Human Genome Research Institute grant. The technology eventually will help scientists quickly determine the exact genetic sequence of the DNA of many different organisms, and could lead to faster forensic analysis of DNA gathered in criminal cases. The heart of the new BioMEMs machine is a large glass chip etched with tiny microchannels called "lanes." It tests 384 lanes of DNA at a time, four times more than existing capillary sequencers. Each lane can accommodate longer strands of DNA: about 850 bases (the nucleic acids found in DNA, abbreviated by the letters A, C, T or G), compared to the current 550 bases per lane.’ Science Daily, 2003

Digital Ark

As costs fall, Earth’s library will increase;

sequences of many other creatures come -

more reduced than skeleton, bone, molecules

of bone; even the glue that once was bones -

pinhead zoos, disembodied animal instructions,

held in the palm of a hand with room to spare -

flash-peg of an elephant, wildebeest herd;

the last tigers and lions, apes, albatrosses -

a digital ark in one tight hand - clutched,

sweat on our brow at such responsibility;

as the collective Noah who has heard the voice

of his own destruction, now desperately storing

the anatomical seeds of a kingdom, a world,

freak planet squandered – four billion years

of Evolution now condensed - an unbeautiful poem

of imploded life as the seed tells nothing of flowers.

Austere, trimmed of anything organic - spotted,

warm or striped; any hint of the hummingbird’s

shimmer, dazzle of Polar Bear fur, honeybee hoops,

but could – conceivably - be awoken - reconstituted

into life, if anything living were to survive at all,

bearing the principles of life - stem cell magic -

as some believe the compacted egg of universe

exploded into everything, all stars and planets –

to this moment on Earth, writing the lives of creatures

still alive - like living obituaries, all ready and waiting.

‘Many areas of critical importance to the realization of the genomics-based vision for biomedical research require new technological and methodological developments before pilots and then large-scale approaches can be attempted. Recognizing that technology development is an expensive and high-risk undertaking, the NHGRI is nevertheless committed to supporting and fostering technology development in many of these crucial areas, including the following. There is still great opportunity to reduce the cost and increase the throughput of DNA sequencing, and to make rapid, cheap sequencing available more broadly. Radical reduction of sequencing costs would lead to very different approaches to biomedical research. Improved genotyping methods and better mathematical methods are necessary to make effective use of information about the structure of variation in the human genome for identifying the genetic contributions to human diseases and other complex traits. Beyond coding sequences and transcriptional units, new computational and experimental approaches are needed to allow the comprehensive determination of all sequence-encoded functional elements in genomes. In the short term, the NHGRI expects to focus on the development of appropriate, scalable technologies for the comprehensive analysis of proteins and protein machines in human health and in both rare and complex diseases. As a complement to the development of the genome 'parts list' and increasingly effective approaches to proteome analysis, the NHGRI will encourage the development of new technologies that generate a synthetic view of genetic regulatory networks and interacting protein pathways.’ A Vision for the Future of Genomics Research, US National Human Genome Research Institute, 2003

‘With the ongoing rapid increase in both volume and diversity of 'omic' data (genomics, transcriptomics, proteomics, and others), the development and adoption of data standards is of paramount importance to realize the promise of systems biology. A recent trend in data standard development has been to use extensible markup language (XML) as the preferred mechanism to define data representations. But as illustrated here with a few examples from proteomics data, the syntactic and document-centric XML cannot achieve the level of interoperability required by the highly dynamic and integrated bioinformatics applications. In the present article, we discuss why semantic web technologies, as recommended by the World Wide Web consortium (W3C), expand current data standard technology for biological data representation and management.’ Nature, 2005

‘The proliferation of large-scale DNA-sequencing projects in recent years has driven a search for alternative methods to reduce time and cost. Here we describe a scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments. The apparatus uses a novel fibre-optic slide of individual wells and is able to sequence 25 million bases, at 99% or better accuracy, in one four-hour run. To achieve an approximately 100-fold increase in throughput over current Sanger sequencing technology, we have developed an emulsion method for DNA amplification and an instrument for sequencing by synthesis using a pyrosequencing protocol optimized for solid support and picolitre-scale volumes. Here we show the utility, throughput, accuracy and robustness of this system by shotgun sequencing and de novo assembly of the Mycoplasma genitalium genome with 96% coverage at 99.96% accuracy in one run of the machine.’ Genome sequencing in microfabricated high-density picolitre reactors, Nature, 2005

‘In a few months a machine that is able to sequence 2.8 million base pairs in a day — that is roughly one person's total base pairs — is expected to reach the market. While that is not the same as creating a complete and accurate genome in a day, the estimates now are that mammalian genomes can be sequenced for about $100,000. That is to say 3,000-fold less than the price tag to sequence humans.’ CBC News, 2007

Home
Note from the author
exploring the project
quotes

INTRODUCTION
CONTENTS
SEQUENCE ONE
SEQUENCE TWO
SEQUENCE THREE
Gene Story
Maps
Haplotype Map
Gene Atlas
Genomic Grids
SEQUENCING
Romantic Science
Medicine
Some Special Genes
Cloning
X & Y
SEQUENCE FOUR

Leave a comment
About the author
Make a contribution
Legal note on copyright