Colliding galaxies: Alzheimer’s and big data


Big Data: Could It Ever Cure Alzheimer’s Disease? I was struck by this article by Masud Husain (closed access in BRAIN so you cannot read the original) which is reprinted on medcsape.

The article nudged me to write a post because it reflects the challenge and opportunities created by two behemoths which like galaxies, are slowly colliding.  Its addressing an area of growing interest because organisations are waking up to the value of having information that comes from existing datasets, generating targeted data, and looking within it to drive insight, rather than establishing an hypothesis and finding data to support or refute it.

“From business to government, many have been seduced by the possibilities that Big Data seems to offer for a whole range of problems.”

Alzhiemer’s (AD) has been accreting datasets and large scale studies. Projects such as Alzheimer’s Disease Neuroimaging Initiative (ADNI) ($200M invested so far) is sequencing several hundred full human genome sequences of patients with AD. At the same time, supported by the cure Alzheimer’s foundation, our group at Harvard, at Massachusetts General Hospital and here in UK at the Sheffield Institute for Translational Nuerosciences (SITraN) is analysing a set of 1500 whole genome sequences of AD sufferers. Our group is amongst the first laboratories world wide to have undertaken a study of this magnitude, and that we have done so outside of the domains of the current academic sequencing centres is difficult for funding agencies and patients alike to comprehend. Given this relatively pioneering approach, it has take a lot of time to develop the infrastructure and analytical capacity to address the magnitude of the study we have undertaken. We are learning the hard way that datasets of this size are at best unwieldy. The compute resources alone have taken a year to master and apply, yielding the variations in each genome that seem to be more frequent in patients that have the disease.

In parallel, Schizophrenia studies, like the  Psychiatric Genomics Consortium (PGC) boast 123,000 samples from people with a diagnosis of schizophrenia, bipolar disorder, ADHD, or autism and 80,000 controls collected by over 300 scientists from 80 institutions in 20 countries. Given the magnitude and complexity of these projects, it fast becomes clear that collaboration, data sharing and internal communication are powerful components i.e.: drivers of success in contrast to traditional innovation, insight and raw scientific discovery.

Other diseases are by no means on the sidelines. Although relatively rare, the debilitating and lethal neurodegenerative disorder Amytrophic Lateral Sclerosis is also on the “galactic plane”. Combining resources at a global scale, Project Mine has generated over 5000 full genome sequences with a goal of completing another 10 000 within a year. Whole countries are sequencing their populations. Here in UK the plan is to complete 100 000 genomes by 2017. In Qatar they will sequence 300 000, in USA the plan is to complete 1 million peoples whole genomes. The tiniest nations are also on the plane – even the Faero Islands plan to complete 50 000 subjects and Iceland has just published the first 2000 whole genome sequences in their population. Although this sounds like a great deal, the means to adequately process and analyse these, and other large scale datasets is in its infancy. How can we analyse all this data? One way is the obvious route of training and education. We are part of a new National programme to establish graduate training in genome medicine. Offered here at Sheffield the MSc makes a solid step towards beginning to understand the use of genome data for health.

“The critical intersection of Genomics Big Data Medicine, delving into ‘bleeding edge’ technology & approaches that will deeply shape the future.”

Also here in UK we have been building groups across institutions so that we can collaborate to analyse and handle big health data. Later today I meet with representatives across Sheffield that will become part of a “Health North’ initiative that looks to combine de-identified, consented, health and environmental data across cities so that we can ultimately engage in actioning new forms of health data analysis.

In my view, Eric Shcadt currently leads the new field at the intersection between big data and genomics and medicine – at least in terms of vision. He has driven the development of multi-scale biological research projects that have captured thousands of genomes, clinical records, related datasets and drug profiles to launch a new form of highly networked big data medicine. The first really broadly accessible application of this is will be the launch of a new app together with Apple’s health ecosystem  Apple ResearchKit  that will help doctors interpret medical data on an iPhone. What data is that? Simply put its your lifestyle – how many steps you take, how many stairs you climb, your blood pressure, blood oxygen, when and where. Ultimately combining that with genomics and other health data means that apps in the future could have the potential to truly and effectively predict when you and you alone are most likely to die. Schadt calls his adventure the ‘death app’ – not a name that is likely to live long.