This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
As the pharmaceuticals industry is going digital, digging into big data with artificial intelligence tools has become all the rage. To generate actionable results, however, Novartis bets as much on human inquisitiveness as on computer expertise.
Text by Goran Mijuk, illustration by Philip Buerli
Achim Plueckebaum is a down-to-earth guy. Even as he sits in the midst of what is probably the pharma industry’s hottest development in decades, the experienced computer scientist is keeping a cool head. The leader of the Novartis research and development platform data42 is a strong believer that data and digital technologies will change the path of medical science. But in spite of this view, he expects human inquisitiveness to remain instrumental in the race to leverage the power of data and digital in the hunt for new drugs.
“There is a hype surrounding data and digital that we need to be aware of,” Plueckebaum said when we met him in his office in Basel. He and his team are currently setting up the infrastructure – which also spans other locations in Cambridge, East Hanover and Hyderabad – to accelerate the development of the company’s digital research and development platform data42.
Their goal is to capture the hidden insights from the massive amount of research and clinical trial data of Novartis and tap into a powerful technological trend that has also attracted unconventional players such as Google and Amazon to the pharmaceuticals arena.
“The ability of artificial intelligence tools to perform better than humans in certain specified tasks has led to a misunderstanding of what can be done in the pharmaceuticals and medical industries. Many believe it is enough just to run a smart computer program through a pile of big data and find new drugs just like that.”
The problem, he says, is much harder. Besides cleaning up huge amounts of siloed and unstructured data, uploading them to a single, machine-readable platform and creating powerful algorithms that can analyze the data, scientists need to have precise questions in mind. Otherwise, Plueckebaum says, they will receive a nonsensical answer much like the characters in The Hitchhiker’s Guide to the Galaxy.
In Douglas Adams’ 1979 science fiction comedy, a supercomputer named Deep Thought is calculating the answer to “the ultimate question of life, the universe and everything.” When the machine comes up with the number 42, everyone is dumbfounded, only to learn that the answer will only make sense if the question is known.
With data42, Novartis wants to avoid such nonsensical outcomes. “It is really important to keep in mind that, if you don’t understand the question really well, you probably won’t understand the answer,” Plueckebaum says. “That’s why we decided to call our venture data42. The number is a constant reminder for us that we want to understand the questions in order to receive specific answers that are actionable.”
If the right questions are asked, however, data and digital could prove transformational for the pharmaceutical industry, which is benefitting from recent digital breakthroughs in the realm of data storage and processing and, especially, artificial intelligence.
Although artificial intelligence has been around for decades, recent developments in the realm of deep learning have allowed data scientists to make surprising leaps. Unlike conventional machine learning algorithms, which learn from the data input, deep learning applies algorithms in layers to create an “artificial neural network” that can learn and make intelligent decisions on its own.
One of the leaders in this emerging field, DeepMind, a unit of Alphabet, beat a human player in the highly complex Chinese board game Go more than three years ago and, using a similar artificial intelligence approach, also trumped biologists in predicting the shape of proteins earlier.
Such breakthroughs have opened new research avenues and could help solve some of the biggest medical and operational challenges of our time. This includes putting a lid on soaring research and development costs as well as finding innovative ways to treat diseases for which there are only limited or no treatment options available today.
The challenges are indeed big. Bringing a new therapy to market requires investments of more than 2 billion US dollars today and takes more than 10 years on average. Only one out of ten molecules tested in the clinic reaches the market.
On the medical front, the needs are mounting too. While around 500 drugs were approved in the United States in the past decade, medical needs are as high as ever. Many chronic and age-related conditions such as Alzheimer’s, for example, remain difficult to treat and for most of the more than 7000 known rare diseases there are no innovative medical options.
“Of course, we don’t know yet what we are going to find when we are using this new data and digital technology,” says Pascal Bouquet, technology lead at data42. “But we firmly believe we will be able to find insights that are not possible today. We are convinced that we can find nuggets that we have not seen so far and that, in the long run, we can even completely design and discover new drugs based purely on data.”
These hopes have led traditional pharmaceutical players to muscle up their digital expertise and are also attracting new companies such as Google, IBM and Apple into the healthcare space in the hope of developing innovative therapies and disrupting conventional drug development models.
Venture capitalists alone poured more than 1 billion dollars into healthcare-oriented artificial intelligence start-ups in 2018, according to data provider PitchBook. And the market is likely to get hotter. Everest Group, a consultancy, expects overall healthcare investments in artificial intelligence technology to grow from 1.5 billion in 2017 to more than 6 billion dollars by 2020.
Novartis believes it has an edge in this emerging field. “We have around 2 million patient-years of data in our system,” Bouquet says. “This is the crucial asset which will be instrumental going forward as we apply artificial intelligence tools to sift through the data and find hitherto unknown correlations between drugs and diseases.”
In order to make this vision come to life, all the clinical and research data – plus potentially real world data, imaging data and sensor data – first need to be structured and moved to a single platform to create a so-called data lake. This is easier said than done, because individual data sets often use different parameters to denote data points such as sex, age, family and disease conditions.
“All of those data need to be cleaned and curated to make them machine-learnable. This is hard and cumbersome work, but it frees up our data scientists to focus on answering questions with data,” says Peter Speyer, who leads data42’s product development.
The data size is substantial. The research and development input alone consists of 20 petabytes of data – the equivalent of around 40000 years of music on an MP3 player.
Achim Plueckebaum, Head of data42; Peter Speyer, Head of Products, data42; Pascal Bouquet, Technology Lead, data42.
The team, which involves more than 100 people across various Novartis units, has made great progress so far. They have ingested more than 2000 clinical studies onto the platform and have tested a dozen machine-learning models that could help find new information buried deep in the data.
In order to gain traction and build proof points, the data42 leadership has set itself short-term business-driven objectives which focus on very specific and precise tasks. One such project, which was started recently, aims to identify disease subtypes based on biological characteristics in the area of rheumatoid arthritis.
“For this project, we are working on cleaning the data of our existing trials in this disease domain, which is a task that can be done in a relatively short period of time,” says Peter Speyer. “Our goal is to identify subgroups of high responders to one of our treatments. If we find those, the franchise will potentially be able to set up a new trial and test the findings in the clinic.”
Among other current projects, the team is also looking at disease progression in certain cancer indications.
And more are yet to come, as the data42 team is working on fine-tuning the data and creating a huge data lake in which to dive for pieces of information that have escaped everyone’s attention so far.
“Once all the data is curated, the potential to generate new insights is likely to be enormous,” Speyer says. “So, whatever question you have, for example, on heart failure, wherever heart failure is captured as a disease of interest – as a comorbidity or as a side effect – we can pull this into analytics. That is the scalability of data42.”
If data42 lives up to its promise, it also has the potential to change how data scientists work together with scientists in the lab and in the clinic. “What you will see is increased collaboration between data scientists, who prepare the data, and medical scientists, who understand the question and what needs to be retrieved from the data,” Pascal Bouquet explains.
Neither biologists and chemists nor doctors are set to be replaced by the new digital tools, which, however, will be only as good as the input they receive. “Sometimes, when you craft a question really well, it turns out the solution is not as complex as you thought. You don’t need all this artificial intelligence for every question. For some questions you just need to go back to statistics. You find the right data. You apply the right method and you get the answers. Thinking through the question really helps accelerate and improve the insights – with or without artificial intelligence.”
These cookies are necessary for the website to function and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site will not then work. These cookies do not store any personally identifiable information.
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site. All information these cookies collect is aggregated and therefore anonymous. If you do not allow these cookies we will not know when you have visited our site, and will not be able to monitor its performance.