Connecting the medical dots to create a holistic understanding of human biology.
Back in the summer of 2019, I met with Achim Plueckebaum, Peter Speyer and Pascal Bouquet, who were part of a team that just a few months earlier had launched the Novartis digital research and development platform data42.
Their main office, located in an old lab building on the Campus in Basel, was near-empty and, for an outsider, looked a bit forlorn. But the long rows of desks, each equipped with large computer screens connected to the data centers of Novartis around the world, spoke of high ambition and big things to come.
2019 was a peak year for digital exploration. New digital tools and the promise of artificial intelligence saw the pharma sector rush into a digital El Dorado. The hope was to transform conventional drug discovery and reduce the painfully long and costly drug development times with the help of powerful algorithms and superfast computers.
The general idea was as follows: data42 should bring the massive, yet virtually untapped clinical and scientific data of Novartis, which encompassed nearly 2 million patient years of data, to life and help the company find new scientific insights to develop drugs faster.
But despite this lofty vision, the data42 team was aware that the task ahead would be fraught with major challenges, and the team was sure that – over time – they would meet with growing resistance if data42 did not live up to the hype that was raging around them.
Focus on the doable
They tried to keep a cool head, shied away from overpromising and focused on what is doable. Even the name data42 – a reference to Douglas Adams’ novel The Hitchhiker’s Guide to the Galaxy – was a reference to humility as it stressed that computer power can only be unlocked with the right – human – question.
The data42 team, which quickly grew to some 150 members, set themselves short-term targets to produce palpable results within a manageable timeframe. This strategy was instrumental in moving the project forward because the massive amount of data, which spanned thousands of clinical trials and millions of data points from years of research, would have been too heavy to lift in one go.
But even as the team took a step-by-step approach, the task was gargantuan. Anyone who has tried to combine Excel sheets using different formats may have at least a rough idea what a painstaking task the team faced.
In the case of data42, millions of data points had to be cleaned. One example, for instance, was to find a common denominator for a patient’s gender for the thousands of clinical trials conducted by Novartis in the past.
Some trials would mark a patient’s gender with an “f” or an “m,” in others “masculine” or “feminine” would be used, others again used the terms “male” and “female.” To make the data machine-readable, this needed to be fixed first.
The same problem would arise in other instances, be it a patient’s race, their medical history or the actual clinical data, which could differ widely depending on where the trial was conducted. In short, the team had to clear up the mess.
“All in all, we knew that the beginning would be very hard as the initial lift was to clean the data and then produce results as fast as we could in order to win the trust of our partners and prove that we can leverage our pool of medical, clinical and research data,” says Plueckebaum recalling the early days of data42.