Making Sense of All That Data
This is the second part of a two-part series on data science, adapted from a feature from the Northwestern University News Center. To learn about how data science is informing research in cancer, cardiovascular and critical pediatric care fields, read part one.
The enormous volume of data generated in genomic data mining requires a tremendous amount of computing power, far more than the average desktop computer can handle.
Elizabeth McNally, MD, PhD, who recently took the helm of the Center for Genetic Medicine, is well acquainted with the challenges of analyzing mind-boggling amounts of data. She is leading an National Institutes of Health-sponsored project to examine whole genome sequencing from 300 individuals with cardiomyopathy, a common cause of heart failure. As a clinician, McNally leads the Program in Cardiovascular Genetics at the Northwestern Medicine Bluhm Cardiovascular Institute. McNally, also the Elizabeth J. Ward Professor of Genetic Medicine, works with a team of physicians and genetic counselors where they routinely use genetic testing in patients and families with inherited cardiovascular and neuromuscular diseases to determine the gene variants contributing to their diseases. This information helps establish a diagnosis and guides a therapeutic approach.
“Each genome is composed of 3 billion base pairs, the building units of the genome, and each of us have four or five million differences between us,” McNally said. “We are trying to figure out which one or two of these variants causes disease in that individual. It’s like looking for a needle in a pile of needles. It would be impossible without powerful computers.”
“If I tried to analyze 250 genomes and the millions of differences in genes on my desktop computer, it would take 50 years to do it,” McNally said.
She uses the supercomputer at Argonne National Laboratory for the computing power to quickly sequence and analyze her patients’ genomes in just a few days. Now she is looking forward to working with Quest, the high-performance computing cluster at Northwestern University.
Data Science and Electronic Medical Records
Other important research driven by data science is NIH’s Electronic Medical Records and Genomics (eMERGE) project led at Northwestern by Rex Chisholm, PhD, Adam and Richard T. Lind Professor of Medical Genetics and the vice dean for scientific affairs and graduate education, and Maureen Smith, MS, assistant professor of Medicine in the Department of Cardiology and clinical director of NUgene.
Chisholm, Smith and their team have analyzed clinical data and genome sequences from individuals in the NUgene biobank to learn which genetic abnormalities cause certain diseases and determine which drug is most effective for individuals, depending on their gene variant. Now eMERGE is building and testing a computer-decision support system — integrating genetic variant data into the electronic health records — to help doctors prescribe the correct drug.
What's Next for Data Science?
Scientists are racing to keep up with statistical methods and develop ever-more sophisticated techniques to extract true meaning from the data.
“The analytical skills and the methodological skills are being invented everyday, because there are new problems and challenges," said Donald Lloyd-Jones, MD, chair of preventive medicine and director of Northwestern University Clinical and Translational Sciences Institute. “This is a massive amount of data and we need to understand what is random variation and what is important variation that can lead to disease.”