Cardiff Castle has many claims to fame. Yet, its role as a measure of Next Generation Sequencing output is not among them. Until now.

To be truly memorable, towns and cities need landmark buildings and one only has to visit Crewe to know how true this is. Building bling is important: it contributes to the character of a city, gives a town an extra swagger, and makes the life of the local tourist board that little bit easier.

Cardiff, capital of Wales, frequent location star in Doctor Who, and home to the Wales Gene Park, is not short of building bling and you can’t get much more congenially kitsch than the faux-medieval glory that is Cardiff Castle. Especially if you choose to pop inside for a good old mooch (I especially recommend the Arab room).

Cardiff Castle clock tower.

Cardiff Castle clock tower – proposed unit of measure for Next Generation Sequencer output.

Commissioned during the latter part of the 19th Century by John Crichton-Stuart, the well-heeled 3rd Marquess of Bute and designed by the pleasingly flamboyant architect and artist William Burges, the current Cardiff Castle is something that a Castle normally isn’t: it’s intact. The original mediaeval structure has been completely rebuilt such that the castle is now a glorious, eccentric Victorian high gothic confection with a blatant in-your-face fakery that really puts bona-fide mediaeval castles to shame.

All fine and dandy, but I particularly want to draw your attention to the clock tower. Partly because of William Burges’s signature use of Forest of Dean ashlar stone. Partly because of the rich over-the-top decor within. But mainly because the tower happens to be around 46 metres tall and therefore very handy for this blog.

You see, 46 metres (well, 46.8 metres to be precise), just happens the height attained if one were to stack up 180 boxes of A4 paper which, assuming five reams per box and 500 sheets per ream, would add up to 450,000 sheets of paper. That’s just about the number of sheets, give or take the odd ream, you’d need if you were to print out all the 3,095,693,983 A, C, G and T bases that make up the human genome (using 12 point Courier font, suitable margins and printing on both sides to be environmentally friendly).

Now I can hazard a guess as to what you might be thinking: do Staples offer bulk discounts? If so, I suspect the phrase “don’t try this at home” is going to fall on deaf ears. However you may also be thinking… why? Well, to my mind Cardiff Castle clock tower is as good a metaphor as any for the sheer magnitude of the human genome, especially if your architectural taste runs to the neo-Gothic.

Current maximum capacity of the Ion Proton sequencer, in terms of clock towers.

Current maximum capacity of the Ion Proton sequencer, in using the Clock Tower as the standard unit of measurement.

And once we have accepted the clock tower as a reasonable unit of measure for genomic data (go with me here) it can be used to illustrate the phenomenal outputs modern Next Generation Sequencing technologies are now capable of. For example, Wales Gene Park now has access to a new high throughput sequencer, the Ion Proton, which is capable of generating up to 10 billion bases of sequence data within four hours. That’s roughly three and a third clock towers worth in the time it takes to watch Cecil B. DeMille’s The Ten Commandments (4 hours).

In theory that means three and a third human genomes can be sequenced by the time Charlton Heston has lugged both stone tablets down the mountain. Unfortunately, it’s not quite as simple as that as errors mean we’ll typically want to sequence a human genome around thirty times over to ensure sufficient accuracy. That’s thirty clock towers using my shamelessly laboured analogy.

Now building thirty clock towers is quite an undertaking and one that even the satisfyingly eccentric 3rd Marquess might have baulked at. However, generating thirty clock towers worth of data is far less daunting. Indeed, higher capacity sequencers, such as our SOLiD 5500 W sequencer can generate enough data to produce a whopping ninety clock towers of data in the time it takes to watch “Modern Times Forever (Stora Enso Building, Helsinki)” provided you schedule a few loo breaks and a sensible recovery period (the film apparently runs for 10 days) which is equivalent to three human genomes at 30x average coverage.

So generating the data is (relatively) easy. Doing something about all those clock towers of information is another matter and is the single greatest bottleneck facing genomics research today. All in all a bit of an headache (though not as great as the headache of explaining to the local council why you’ve built 30 High Gothic clock towers without planning permission).

Consider, for example, a hypothetical medical research project in which we’re interested in finding the genetic mutation responsible for a particular disease. Let’s assume the simplest scenario: a single change to one of the 3 billion characters which make up the patient’s genome. Somewhere in those 180 boxes, leaning precariously against our clock tower is a sheet that contains a single character change that’s responsible for the disease.

Even with modern computers finding that single character change can be quite a task, especially when sequencing error demands that the genome is sequenced many times over. Researchers generally try to make things simpler and considerably cheaper for themselves by concentrating on the (protein-coding) genes alone, which surprisingly only amounts to around 1.5% of the total genome. This is equivalent to concentrating on only 6,750 sheets of A4 paper, or say, the occasional clock tower brick.

This sounds a sensible strategy especially since in classical genetics the protein-coding genes have been viewed as the basic units of inheritance and it’s within such regions that mutations have been thought to have the greatest impact. But as we continue to explore the human genome in ever more detail we are realising that this classical view is a woefully naive view of our genetics. In short, the remaining 98.5% of the genome is proving to be more and more important as each year passes. This means we are back to considering the whole genome once more – the complete clock tower – and in coming years this will increasingly become the principal focus of sequencing projects.

In 1875, ten years after it was commissioned, Cardiff Castle clock tower was completed. This year, one hundred and thirty eight years on, we are celebrating ten years on from another sort of completion – that of the human genome project. In those ten years we have moved from a gene-centric view of the genome to something far more subtle. And that subtlety is only set to increase in the next ten years. Cardiff Castle clock tower meanwhile will remain as subtle as a high Victorian Gothic monument can ever be (which is not very subtle at all).