News from Britain V

Kona reminded me last night that I am well overdue, so here is a collection of
events from the last month or so.  I think Konz covered Christmas pretty well.
Her new year's eve summary was almost complete.  At the end of the evening, she
was adamant that she had drunk X and a 1/2 glasses of lager.  I was sceptical
that she could be that accurate (under the circumstances) until she reminded
me that she had practically *landed* on the last 1/2 glass thus thwarting
any further attempts to drink it.  She was fortunately not injured in the
sky-dive off the bench but the poor glass didn't fare so well :)

The year in review (late)
The Guardian had a lovely year in review.  As we had only arrived in September,
a lot of it was past history, but still nice to see what happens in a given
year here.  The Tories had managed 1-3 major scandals per month - you could
almost plot a graph as you went along.  Not minor things either - Ministers
getting other ministers to sign legislation from which they stood to make
very large financial gains, investigation over insurance fraud .. etc.  You 
might remember from an earlier letter the head of the Yorkshire water board
(the organization that managed to create a drought in West Yorkshire by
pumping the abundantly available water elsewhere) Terry Newton.  He was 
quoted as saying you could "wash in half a bowl of water" - he had been for 
months and no-one had noticed.  It turns out he was showering elsewhere at 
a his in laws place - tsk tsk :)

The weather
I could not omit the obligatory update on the weather here.  Overcast today,
as a result of some Siberian weather.  It's warming again though after a few
days where it was quite dangerous walking the Cambridge streets unless you had
shoes with very good traction.  Once again the British were caught with their
pants down in cold weather.  Parts of southern Wales lost electricity
supplies in the middle of a bad patch of weather when the power lines froze.  I
still can't quite work that one out.

In the middle of one frost patch a while back we had a second Autumn.  When it
gets very cold, the trees get white leaves once again in the form of the most
beautiful ice crystals - up to 5 cm long.  It really looks as though someone
has put icing all over them.  Generally it is incredibly dry here - though
cold.  Once you learn to carry enough clothes with you, it can be quite
pleasant.  The snow is nice, since once it falls, the place looks a lot nicer
and the reflected light makes things a lot brighter.  Every now and then I
get up, and seeing the glow around the curtains, suspect it might be a lovely
warm sunny day.  It's usually sub-zero and all snowed up !

A piece of email that went around at work one morning:
Subject: power supply

There was a power cut this morning lasting for about half an hour from
0645.  Unfortunately, the emergency generator failed to operate
because of frozen fuel.  Attempts are being made to thaw out the fuel
today, but until that's done we are vulnerable.

Will keep you informed.

John Sulston

Never a dull moment.

Work continues to be fine.  I wrote a large piece a while back about work for
my family which I shall tack on the end.  Don't feel obliged to read it at
all, but if you're curious I think it gives a slightly better picture of life
at the Sanger Center.  I shall include just one piece of email that the
organization received a few weeks ago.  It made my day :)

Subject:  Sanger Centre name

To Whom It May Concern,
I have students doing a History project about Margaret Sanger, the pioneer in 
family planning.  We are wondering if your Sanger Centre was named after her, or
someone else.  If so, could you tell us how the decision to use her name was 
We anticipate your reply.  Thank you for your time.

David Sherman
Shorewood High School

The wedding anniversary
We have just celebrated our first wedding anniversary.  We planned to
go to London and see a musical and stay in a nice hotel.  Great plan -
but it was not well implemented.  We thought taking a car would be
cheaper and more flexible - Ha !  We left Cambridge at 2.15pm on a
Friday, and arrived at the musical (Miss Saigon) 2 minutes before the
start (8.45ish pm) - having spent 3 hours driving at 12 MPH.  A series
of disastrously wrong turns led us to drive though Piccadilly Circus
(nearly twice in fact).  Kona's Taxi Heuristic was born.  It
essentially states that when the taxi:car ratio gets over 4 to 1, you
are in trouble, as we were :) The musical was great however - the
singing was not superlative but it was quite enjoyable.  Kona remarked
that it was the first musical she had been at where she was not in the
pit with the orchestra.

Haggis and drunk Scotsmen
I recently endured my first Burn's night (sp?), complete with haggis, 
bagpipes and whiskey.  It was the Thursday dinner for Graduates (BA dinner)
and it was black tie and it was a freezing night.  Whoever said red wine should
be served at room temperature didn't have Cambridge in mind.  You need an
esky just to keep the wine from freezing.  The haggis was not, I must admit
as bad as it sounded (or looked - they carried one in (wobbling) and 
ceremonially carved it).  It was not that pleasant either - but mainly because
it was rather salty (probably not a good haggis knowing Pembroke College), and
the Scots traditionally have it with tatties and Neeps (sp?) - potatoes and
suedes - not an exciting combination - roll on Australian national cuisine. 
The atmosphere was good - the toast to the lassies was crude (drunken) and 
unimaginative, and then the women's toast to the laddies tore the laddies to
shreds to riotous laughter - you had to be there.

Generally, otherwise things have been quiet.  Term has started so we have been
to dinner a lot.  We have been binging on movies lately.  We saw Sabrina
Friday, Institute Benjamenta Saturday (don't bother), the BBC's version of
Pride and Prejudice Sunday (half thereof at the college on video), and
the Brothers McMullin last night (Monday) - but we haven't succumbed and bought
a t.v (and associated license) yet !  Thanks for all the paper and electronic
email - it's nice to hear what you have all been up to.  Enjoy summer (or
winter as appropriate).


Read on if you are looking for something to distract you from doing work
for another 10 minutes.

Life at the Sanger Centre
Hi,  as promised I am writing a little about work.  The place is
situated on quite large piece of land known as "Hinxton hall".  Ever
since I have been there, the actual hall has been shrouded in
scaffolding as they restore it.  It will eventually become the admin
centre for the site I think.  It would have been easier to rebuild it
from scratch in my opinion, but anyway.  We are in a rather run-down
old building that is also on the site - built 60'sish, it is really
ugly and looks like it was used for some kind of engineering company,
as there is a large factory floor with a big overhead crane.  This has
been converted into a large number of partitioned areas where most of
the human sequencing teams live.  We (informatics) are largely in the
corridor that runs along the top, so we look down on all the activity.
A number of portacabins currently extend accommodation for admin and
some of informatics.  The entire place is a maze and can be quite
disorienting until you get the hang of it.  There are about 280 people
in the Sanger centre at the moment and it will grow to the low 300s -
it wasn't meant to grown this much before we moved into the new

That's the Sanger centre - there are two other organizations sharing the site.
The EBI (European Bioinformatics Institute) has an "outpost" next to us - they
have the first new building that is almost finished (they are in it) - really
nice building, in the style in which ours is to built (it's currently under
construction more later).  The word "outpost" is very political, since the
head institute is still in Germany and there is a Pan-European wrangle as
to what they do and where - eg they have an outpost in Rome b/c the Italians
whinged that they weren't getting enough benefit and threatened to pull out.
They have about 50 people on site, but this is to grow quite considerably.
Their job is to study and maintain the biological data coming out of different
projects (including ours).

The third group is the Human Genome Mapping Project Resource Centre (HGMP-RC)
and they look after resources useful for mapping the Genome - eg DNA extracts
(I think - I have had very little to do with them).

The Sanger Centre has grown very rapidly with most people joining in the last
12 months or so with very few people remembering right back to the start
a couple of years ago.  Hence the new building which cost something like 
AUD $12 million.  It's actually quite attractive as new architecture goes,
with central courtyards, some very nice and spacious labs for the biologists
and an area for us that takes into account the needs of computers.  They will
knock down our current building - not a great loss, but it does rather feel
like home at the moment.  It is going to be rather easy to get lost in the
new building - we were taken for a tour by the contractors of our bit of the
building, still under construction and they lost quite a few people even on
the tour.  The car park under the building that extends slightly outside the
bounds of the actual building, has room for 500 cars to give you a rough idea
of how large it is.  Should be quite nice when we finally move it - the
move starts mid-year although the date keeps on moving.  We are one of the
first groups to move.  Can you imagine moving hundreds of computers, freezers,
sequencing machines, centrifuges, test tubes, stores etc - an incredibly
sensible time to take a holiday in my opinion.  We are planning champagne for
the actual demolition.

Buildings, contractors and people aside the site is also really nice English
countryside.  Trees (an orchard), wildlife, a lake etc.  A stroll around the
grounds is quite a pleasant experience.

That pretty much explains the surrounds.  My section has a general 
responsibility for the management of information around the place and its
interpretation.  There are about 20 people I think currently in Informatics.
There is a small team that does system support (ie keeping the computers and
networks running).  There are people who do more development-style programming,
developing and writing practical software and maintaining database software.
Some people actually use the software acting as curators of the data - 
answering correspondence about the data we publish.  Once we finish 
assembling part of a DNA sequence, it goes out onto the world wide web the
same evening.  It is interesting to see the logs of the people who access
our site.  I think I told you about the recent fuss with the second
breast cancer gene that a large pharmaceutical company was hoping to find and
patent.  We sequenced the 900,000 letters of DNA in about 2 weeks, found the
gene and cooperated with a centre that identified the specific defects in
families that were causing the cancer.  The company (upon finding out that we
had done this), grabbed the data off our site very quickly.  Some other
pharmaceutical companies seem to get hold of the data without ever visiting
our ftp site - an inside job ? :)

Every few days there is a story in the paper about the finding of the gene
for "X" thought to be involved in "Y".  I find this quite amusing.  The two
guys in the office next to mine, look though about 60,000 letters of the
DNA sequence for the nematode worm we are sequencing, every day, and find
about 6 new genes a day, 3 of which will have no known relatives.  It's
quite exciting to be part of such a large project.

So what am I doing ?  I have done a few different things since I started
in October.  I tackled one problem when I started called "vector clipping".
When we sequence (find out the sequence of DNA letters for) a piece of DNA,
we insert the DNA into another organism and then read from a known point in
that organism's DNA (little did that virus know what kind of career it had
ahead of it).  This means that we always "read" 43 letters of the viral DNA
before we get to human DNA.  The letters end up something like ..GATCCCC -
now it would seem like a relatively easy job to recognize and remove these
43 letters, except that (as usual), there are lots of errors.  The start of
the read is often quite noisy - so we may not start reading till 20 letters
in, or even 41 letters in.  We may also see a 'C' instead of a 'T' etc, or
may see one too many 'C's.   The existing program was not particularly good,
often missing the viral (or vector) sequence.  I used a package that someone
else had developed for training these things called hidden Markov models
(HMMs) for recognizing the vector sequence.  They take into account the
chance of making various kinds of mistakes.  The idea then is to take a
very rough model of the kinds of mistakes that can happen (we know this
b/c when we finally overlap lots of reads to build up the final human
DNA sequence, the errors and vector sequence stand out).  Using the rough
model, I took a few thousand DNA reads and fished out the vector sequence.
This is then aligned to get a better idea of the exact chances of different
mistakes being made.  The process is then repeated with the new model etc.
It works a lot better than the existing technique.  I am still refining it,
but I hope to get it published when it has been tested a little better.

There was some other basic work fixing some software for displaying images
and then I got into my latest problem.  Pretty much all the data we gather
comes in the form of some kind of picture captured by a computer as an
image.  For example, we read the sequence of a piece of DNA by copying the
DNA from one end and stopping sort of systematically at each letter.  The
pieces of DNA are then forced to move through a piece of gel about the size
of an A4 sheet using an electrical current.  We can tell from how far they 
move how big they are.  The machines we use to do this handle up to 36 lanes
running side by side quite happily and their software can usually find out
where the lanes have run (they often drift and bend as they move along).  
Unfortunately we like to push things to their absolute limit and we 
routinely run 60 lanes on the one gel, so the ABI (people who make the machines)
software does a very poor job of tracking and the humans end up spending hours
realigning 60 lines against this image.  I have been working on an alternative
lane tracking and editing package.  It turns out that people at St Louis
(our collaborators in the States) have been doing likewise, so I have also
been testing and integrating their software into our lab practices.

It's been very interesting getting into a new problem after being so
focused on one problem (my Ph.D) for so long.  I am rereading all sorts
of fascinating things that I haven't touched since 3rd year.  It's also
humiliating just how easily humans cope with a lot of this (it just bores
them to tears eventually), so I am gaining more and more respect for the
human visual system day by day.  

That's about it for now.  If you've read this far, you deserve a badge or
medal or something for devotion to duty :)