skip to main content
Global Search Configuration

Artificial intelligence won’t transform pharma R&D, at least not anytime soon. But it is already impacting health care delivery and clinical trials.


  • Growing numbers of artificial-intelligence based companies reckon the technology can vastly improve the speed and efficacy of drug R&D. Big pharma’s tentative engagement with the nascent sector belies its urgent need for those solutions.

  • AI might help streamline some components of discovery, like rapid screening of virtual molecules, and providing a wider selection of viable hits, but it is not yet the answer to pharma's R&D productivity problem.

  • So what? Artificial intelligence’s impact will be felt as just one part of the wider data analytics revolution that pharma must embrace to remain competitive.

Artificial intelligence (AI) has for decades captured our collective imaginations as we consider both the opportunities – and threats – posed by “thinking” machines. Following huge recent advances in this much-hyped field, AI now features in our daily lives: as virtual assistants on our smartphones, within our Google searches and translations, and, perhaps in the near future, in our self-driving cars and home-help robots.

AI is a broad term encompassing various ways in which computers process and find patterns within complex information, and generate inferences or predictions from that data. AI has multiple, potentially transformative applications across almost all industry sectors and functions, including health care. That potential is attracting growing venture capitalist funding. Over $700 million was raised by health care and wellness-focused AI companies in 2016, according to CB Insights. (See Exhibit 1.)

AI promises to improve the entire health care value chain, from R&D to medication adherence and real-world evidence generation. AI’s most visible early impact has been in health care delivery. It is already helping spot patterns in complex radiological images to inform diagnoses, and enabling faster and more reliable detection of stroke in emergency departments, for instance. In the future, it may enable remote, robotic surgery, fully personalized medicine regimens based on multiple kinds of data, tailored disease management support and prevention strategies, healthy living companions, and more.

Exhibit 1

AI In Health Care And Wellness, Funding 2012-2016


AI ex1


CB Insights

AI may also transform the efficiency and cost of drug R&D, according to a growing cohort of AI-based biopharma firms. The need is urgent: drug development costs and time lines haven’t shifted in decades. It still takes over $2.5 billion and over 10 years to bring a new therapy to market, according to Tufts University’s Tufts Center for the Study of Drug Development. Yet average peak sales are falling, mostly due to pricing pressure. The result: an unsustainable decline in the return on investment in R&D, according to consulting firm Deloitte.

AI’s proponents – from start-ups to information technology giants like International Business Machines (IBM) Corp. – say the technology can help find more promising drug candidates, faster. AI machines can apply their massive processing power and sophisticated algorithms to sift through and generate insights and predictions from vast quantities of genomic, molecular, clinical and other kinds of data – far more data than human scientists could read or make sense of in a lifetime. “We need machine learning because there’s no other way to untangle what genetic mutations might be responsible for disease,” notes Brendan Frey, PhD, co-founder of Deep Genomics, in a 2016 perspective.

Companies are applying their AI machinery to drug R&D in various ways: some are using it to design de novo drug candidates, others to re-purpose existing drugs, to develop better diagnostics or biomarkers, and more. The prospect of both accelerating R&D and increasing success rates is enticing some in big pharma to dip their toes into AI, even though the prospect remains unrealized.

“I’m skeptical of the idea of AI coming in to transform R&D. It’s an important contributor but not the important one.” – Bruce Booth, Atlas Venture

R&D is a highly data-centered activity in which information volumes, and sources, are increasing exponentially. It’s therefore likely that AI has a role; the question is whether that role is as significant – and imminent – as the excitement around AI suggests. “I’m skeptical of the idea of AI coming in to transform R&D. It’s an important contributor but not the important one,” opines Bruce Booth, DPhil, LifeSciVC blogger and partner at Atlas Venture, which has, so far, been watching from the sidelines. AI certainly isn’t about to replace the most expensive, time-consuming component of R&D: clinical trials. But it may improve their odds of success.

Mapping Molecular Pathways

In December 2016, Pfizer Inc. tied up with IBM Watson Health to help uncover new immuno-oncology drugs. (Also see "Watson Health And Pfizer Partner To Harness Big Data For Drug Discovery" - Scrip, 1 Dec, 2016.) The multi-year alliance didn’t come out of the blue: it followed a test period wherein Pfizer used supercomputer Watson to retrospectively analyze the multi-year history of immuno-oncology discoveries. The computer correctly ranked the majority of these discoveries, and in some cases could have predicted them, based on existing data, “years prior to the first publication” of that discovery, according to a Pfizer spokeswoman.

Supercomputers such as Watson are programmed to find meaningful associations or draw inferences from across complex, often disparate data sets – gene sequences, chemical libraries, clinical trial databases, and scientific papers. They use rules that vary in their complexity, from basic “if-then” logic, to techniques that allow the system to improve its performance as it’s exposed to higher volumes of input data – so-called machine learning. Deep learning is an even more complex subset of machine learning, involving multi-layered “neural networks” (based loosely on the arrangement of neurons in the neocortex) that learn through exposure to example input data, and the desired outcomes. These machines essentially program themselves, rather than being programmed by computer scientists. They represent the most powerful AI systems, and are being used to tackle dense data like speech, images – or genomes.

Many AI platforms process natural language in written texts, fishing out key words and seeking, for instance, how often the names of particular genes or proteins are mentioned in close proximity, and clues as to the nature and strength of their relationship.

Applying these processes to huge volumes of material allows scientists to create dynamic maps highlighting how key molecules (genes, proteins or other) interact and their potential role in particular diseases. “It’s like an airline map,” explains Niven Narain, co-founder, President and CEO of BERG LLC, with a series of molecular “hubs” connected, with varying strength, to one another. Those maps may help uncover new drug targets or alternative indications (by highlighting larger hubs or stronger connections), and/or validate and elaborate upon the importance of existing ones.

Scientists must sanity-check the machine’s output. They also select the input data to train the machine. This may be quite specific: for example, telling it that a single gene or receptor may be referred to in texts more than one way. As such, AI doesn’t replace scientists; rather, it “augments their work,” elaborates Jackie Hunter, PhD, CEO of London-based BenevolentAI, whose Judgement Augmented Cognition System is named accordingly.

Hunter claims that JACS has already increased scientists’ productivity. “We have generated 36 new hypotheses and validated 24 of them in vitro in less than a year,” she says. That is about four times what traditional biopharma R&D would produce with the same personnel, according to Hunter, who spent two decades in R&D at GlaxoSmithKline PLC.


Better Uses For Existing Drugs

Whether or not these rapidly validated targets translate into more robust leads and faster clinical trials is far from proven. AI doesn’t remove the need for clinical trials or laboratory testing; instead, it purports to help increase the chances that those studies and tests succeed. “Our aim is to make fewer, better molecules whose properties we’ll be better able to predict” before they even go into animals, says Hunter.


BenevolentAI, which has raised $87 million since inception in 2013, hopes to validate its platform without waiting for an AI-discovered lead to get through trials. It’s using it to uncover promising new indications for existing, mid-stage pipeline assets that have been de-prioritized – and which come with plenty of existing safety and efficacy data. The company in December 2015 licensed full rights to two compounds from Janssen Pharmaceutica NV (part of Johnson & Johnson), around which it’s building an internal pipeline spanning inflammatory, neurodegenerative and orphan diseases. Its lead candidate is due to enter Phase IIb trials in 2017. J&J will receive royalties and certain milestones if the project moves into Phase III.

California-based NuMedii Inc. is also using its AI platform, developed at Stanford University, for re-purposing existing molecules. It has several from Astellas Pharma Inc. under a January 2016 discovery collaboration. Numedii is also providing Allergan PLC with psoriasis-focused leads that can be developed via the expedited 505(b)(2) regulatory pathway. This is typically used for re-formulations or new dosage forms of existing drugs, and allows sponsors to rely on others’ clinical data.

Back To Biology

Applying AI to existing databases and drug candidates makes sense, both to prove the technology and fully exploit prior R&D investments. But the biases and limitations of this pre-selected, and likely incomplete, input data may restrict the accuracy and predictive power of the AI machinery.


MA-based BERG is trying to wipe away all prior assumptions through a “back to biology” approach. It’s rejecting the existing hypothesis-driven and increasingly unproductive drug discovery process altogether, and starting from scratch – including where input data are concerned.


Rather than buying into existing databases, BERG has built what it claims is one of the world’s largest tissue biobanks, containing over 100,000 clinically annotated samples from both diseased and healthy individuals. “We start with a patient’s biology,” asserts CEO Niven Narain. The approach is hypothesis-agnostic, and data-driven, he continues. All the data are deconstructed down to the very basics, stripped of any assumptions, including around parameterization, for instance (choosing which variables to look at). (See sidebar, "Berg's Back To Biology Approach.")

Narain claims that this biology-first approach can cut the cost, and time, of drug development in half. “We’re getting from discovery to a drug candidate in at least half the current three- to five-year time frame,” he says. BERG says it can similarly cut the five to seven years it currently takes to reach proof-of-principle trials – notably by including companion diagnostics and a highly personalized approach to medicines development.


Personalized medicine – therapies tailored to address specific genetic mutations among certain patient sub-groups – is already impacting R&D. Pre-selecting patients based on the genetic profile of their tumor in theory should make it possible to prove efficacy or inefficacy in smaller, shorter trials. Trials of drugs developed with a companion diagnostic, helping identify responder patients, are 30% more likely to succeed, according to a 2016 study published by the Biotechnology Industry Organization, Informa's Biomedtracker and Amplion Inc. (Also see "One Size No Longer Fits All: The Personalized Medicine Trial Landscape" - In Vivo, 20 Mar, 2017.) In practice, finding the right patients is hard. But machine learning tools are being used to help match tumor mutation profiles to the most appropriate therapies, by data-focused groups such as GNS Healthcare Inc.

BERG hopes to push personalized medicine even further, to capture not just the identity but also the evolution of disease. “We track individual patient’s changing molecular profiles over time during Phase I trials,” explains Narain. This paints a fuller picture of a drug’s dynamic effects and may uncover iterative biomarkers that could be used in later trials.

Narain says BERG is in discussions with several pharma firms. But, funded almost exclusively by Silicon Valley property billionaire Carl Berg, it can afford to take its time deciding how to partner, and who with. Nor need it limit the scope of applications for its AI platform: the company is working with Becton Dickinson & Co. on a medication adherence algorithm and has ambitions to impact the entire health care system, from drug adherence to postmarketing surveillance.

BERG’s lead candidate is in Phase I/II testing for a type of brain cancer. AI didn’t discover the molecule – it’s a formulation of an enzyme, ubidecarenone or CoQ10, whose role in energy creation is already well known, and which is sold as a nutritional supplement. But AI helped validate the compound’s potential to treat cancer through interfering with cancer cell metabolism.

Can Deep Learning Make Sense Of Messy Biology?

The fundamental challenge for drug R&D is that we don’t yet understand enough biology. We can rapidly sequence the genome, edit DNA and even synthesize it. (Also see "Synthetic Biology And The Computerization Of Drug Development" - In Vivo, 11 Oct, 2016.) But we can’t “read” the genome. We don’t know precisely how it translates into disease (or healthy) phenotypes. The multiple intermediary molecular steps – transcription, splicing, translation, post-translational modifications, etc. – have proven too complex and dynamic for us to grasp.


AI might be able to help. But this brings us to a tricky paradox when applying AI to areas like biology or astrophysics, where our knowledge is incomplete, and we don’t know where or how large are the gaps. We need AI to help elucidate some of these secrets, yet do so without being limited by our imperfect input, or indeed our expectations. Katie Bouman, a researcher at the Massachusetts Institute of Technology who is using AI in astrophysics to build images of black holes, puts it thus: “We want to … leave the option open for there being a giant elephant at the center of our galaxy.”


AI skeptics see this fundamental knowledge gap as a limitation of the technology. Its proponents, though, claim AI can help narrow, if not bridge, the genotype-phenotype divide. Deep Genomics’ Brendan Frey says deep learning can help identify those intermediate steps, ultimately helping determine which mutations are pathogenic. Deep Genomics’ platform is trained to predict molecular phenotypes from DNA by spotting patterns in, and drawing inferences from, massive data sets. It’s classifying, prioritizing and trying to interpret genetic variants, with an initial focus on developing better diagnostics.

More accurately mapping genetic mutations to disease will be very useful indeed. But it’s not the same as finding a drug to treat or prevent the disease.

More accurately mapping genetic mutations to disease will be very useful indeed. But it’s not the same as finding a drug to treat or prevent the disease. The AI start-up Atomwise, backed by Khosla Ventures, is trying to design more effective drugs for multiple sclerosis and other conditions. It uses its deep learning platform to work out how molecules bind to one another. It claims its platform can predict binding affinity more accurately and usefully than other modeling techniques, by including structural information about the target, not just the ligand. This may open the gate to designing more effective molecules that bind in multiple locations – locations that might not have been identified without the additional insight into the target. Atomwise has used its structure-based bioactivity predictions in a confidential research collaboration with Merck & Co. Inc.


InSilico Medicine Inc., focused on diseases of aging, is similarly using deep learning to comb through “omics” and other data to identify drug targets and biomarkers. It uses gene expression analysis algorithms to score drugs based on their ability to activate or inhibit certain pathways. Human Longevity Inc. is using machine learning to help uncover the secret to a longer, healthier life – by interrogating genomic and phenotypic data.


This new generation of health-focused AI players is helping place some pieces of the biochemical puzzle – probably more than humans could do unaided. But they’re not claiming they can complete or fully understand it. “Biology is a very messy science. You cannot make it clean,” acknowledges Narain. “You have to engage in biology’s complexity.”

Using Existing Data, Better

For all the media attention – and money – pouring into AI, it’s not (yet) the answer to pharma’s R&D productivity problem. It might help streamline some components of discovery, like rapid screening of virtual molecules, and providing a wider selection of viable hits. “These tools are great in target applications,” and for helping generate interesting lead candidates. But that’s just the very first part of the journey to human testing,” cautions Atlas Venture’s Booth. “There are some huge benefits to computational modeling. But you still need to do real experiments around these things. Empiricism and serendipity will still be a big part of that.” (Also see "In Silico Drug Design: Finally Ready For Prime Time?" - In Vivo, 20 Jun, 2016.)


Fortunately, there are plenty more immediate productivity benefits to be had from better, more organized use of good old-fashioned empirical data. Big pharma is still in the throes of a digital and data analytics revolution that’s changing the kinds of products it sells – but also presenting a dizzying array of tools to quickly and easily aggregate, view and use existing data. These provide opportunities to streamline R&D without having to grapple with deep convolutional neural networks, or trust their predictions. Indeed, without the appropriate underlying data network, it’s hard to see how AI can do its best and fastest thinking anyway.

Many of the opportunities lie around digitizing existing trial- and trial-associated data, leading to faster, less error-prone and thus cheaper clinical studies.

Many of the opportunities lie around digitizing existing trial- and trial-associated data, leading to faster, less error-prone and thus cheaper clinical studies. Novartis AG, for example, is working with data analytics group Quantum Black to mine vast quantities of internal data on trial site performance to predict enrollment rates, data quality and cost across multiple trial sites. It can then use this information to select which sites are most likely to deliver good data, fast – Quantum Black says the work has already shaved 15% off trial times and 11% off annual costs for the big pharma. The Swiss group is also hoping to launch real-time trial enrollment tracking and electronic adverse reaction data capture across all trials by the middle of 2017. All this is part of a far broader digital initiative across the development organization, which CMO Vas Narasimhan, MD, hopes could eventually shave as much as 30% off costs and time scales.


Otsuka Pharmaceutical Co. Ltd. is working with North Carolina-based Clinical Ink on digital trial data capture, hoping to roll e-data capture out across all Phase II/III trials from 2017. The idea is to reduce error rates due to data transcription, but also to allow trial enrollment and progress to be closely tracked, and issues to be addressed before they cause delays. An e-trial platform could save up to 30% of trial monitoring costs, according to Otsuka’s Margaretta Nyilas, MD, head of clinical and business operations. These costs typically account for 40% of overall trial costs.


Trial recruitment can also be expensive and time-consuming. Sanofi is exploring with Science 37 “site-less” clinical trials that can help address both, by allowing patients to participate via a local center rather than travelling long distances.

In the discovery sphere, the automation of many lab processes has helped accelerate many tasks. And computers’ sheer processing power – brawn, rather than brain – can be used to screen molecules, empirically, on a more massive scale than has hitherto been possible. DNA-encoded libraries, for instance, enable billions of compounds to be tested as to whether they bind particular targets. Molecule selection can be based on a much wider set of characteristics than is possible to track without powerful computers.

Pharma’s Journey Toward AI

Effectively exploiting data and data analytics tools, including full-blown AI, requires pharma to break down existing divisional silos, find and embrace new kinds of expertise, and overcome cultural barriers. None of that is easy. Biologists might not appreciate the value of computer scientists’ input, for example, let alone answers provided by machines. “You need local teams’ buy-in” to use new kinds of tools, confirms Novartis’ Narasimhan.

Novartis reports some positive early results from its use of AI, across early discovery, screening, molecular docking, pathology and patient selection. It’s ramping up its efforts, even though it’s unclear whether AI-based tools are leading to reduced attrition and faster lead time. “You have to do it [AI] to really know” how and whether it’s helping accelerate drug discovery, Narasimhan admits. “But given the amount of publications, of knowledge, of data out there, the principle [of using AI in R&D] makes a lot of sense.”

AI isn’t the answer to pharma’s R&D productivity problem, at least not now. For one thing, it’s unproven: no AI-generated, novel drug candidate has reached clinical trials, let alone completed them. Human trials will remain the most expensive component of R&D for the foreseeable future (though research is ongoing into in silico trials, involving computer-simulated, “virtual” patients).

Second, AI tool design and application is constrained by our limited understanding of biology. Granted, it may ultimately help predict the consequences of some aspects of that biology, but even then it probably won’t be able to explain how.

Still, better exploiting of data is “something we just have to do,” Narasimhan continues. “The world is becoming more data-driven. It is about maximizing output for minimum input. That’s happening across all innovation sectors.”

Indeed, pharma is behind other industries in its digital and data transformation. Tech giants such as Google, Apple or IBM are parking their tanks on health care’s lawn. For biopharma to retain its edge in the business of discovering, developing and selling drugs, it has little choice but to explore where and how data-based technologies and tools – including AI – can enhance R&D.

Read also


Next steps

Whether you’re a small biotech start-up, research firm, generic manufacturer or a global pharmaceutical giant, you need focused, independent insight and opinion on market developments.

Our team is ready to hear from you for a particular request or area of interest. Please do not hesitate to reach out and discuss.

Contact us for product technical and account support.

  • US Toll-Free   : +1 888 670 8900 
  • US Toll             : +1 212-600-3520
  • UK & Europe : +44 (0) 208 052 0700

Have an immediate and specific information need?

Browse and buy from 1000s of analysis and research reports now: