AlphaFold Accelerates Drug Discovery: Identifying a Potential Liver Cancer Drug

Published on 4/24/2025
Introduction
Drug discovery is a lengthy and complex process, and effective treatments are available for only a fraction of human diseases. Artificial intelligence (AI) is poised to revolutionize this field, offering the potential to accelerate the identification and development of new therapeutics. A significant breakthrough in this area was the development of AlphaFold, which accurately predicts protein structures for the entire human genome, enabling structure-based drug design.
AI-Powered Drug Discovery for Liver Cancer
A recent study published in Chemical Science details the application of AlphaFold within an end-to-end AI-powered drug discovery platform (Pharma.AI) to identify a novel drug for hepatocellular carcinoma (HCC). HCC is the most prevalent type of primary liver cancer. The Pharma.AI platform includes:
- PandaOmics: A biocomputational engine for target identification.
- Chemistry42: A generative chemistry platform for molecule design.
This study marks the first instance of using AlphaFold to identify a confirmed hit for a novel target in early drug discovery, particularly one without a known crystal structure. The collaborative research was led by Alán Aspuru-Guzik, PhD, Michael Levitt, PhD, and Alex Zhavoronkov, PhD.
"We decided to go after a project where AI would be used to identify a target for a disease without an existing crystal, use AlphaFold to get the crystal, use another form of generative AI to generate the molecules for this crystal, and then synthesize and test the compounds," said Zhavoronkov. "And it worked!"
The Power of AI Integration
Michael Levitt emphasized the transformative potential of AI in drug discovery:
"This paper is further evidence of the capacity for AI to transform the drug discovery process with enhanced speed, efficiency, and accuracy. Bringing together the predictive power of AlphaFold and the target and drug design power of Insilico Medicine’s Pharma.AI platform, it’s possible to imagine that we’re on the cusp of a new era of AI-powered drug discovery."
Alán Aspuru-Guzik highlighted the synergistic effect of combining AI technologies:
"This paper demonstrates that for healthcare, AI developments are more than the sum of their parts. If one uses a generative model targeting an AI-derived protein, one can substantially expand the range of diseases that we can target. If one adds self-driving labs to the mix, we will be in uncharted territory."
Overcoming Computational Challenges
Bud Mishra, PhD, commented on the significance of this achievement in the context of computational biology:
"In 1969, Cyrus Levinthal despaired that with large number of degrees of freedom in an unfolded polypeptide chain, it will be intractable to sift through the molecule’s astronomical number of possible conformations as would be necessary in computational drug design. But imitating nature one can get around this paradox by focusing on ‘only selected easy instances of the hard problem’ as guided by evolution," said Bud Mishra, PhD. "By using large molecular datasets and powerful computers, it has now become possible to engineer AI’s like Alphafold, AlphafoldDB AlphaDesign and RosettaFold, which have enabled Zhavoronkov et al, to recently design CDK20 inhibitors, purely in silico. Their work marks a milestone in computational biology, which will inspire others in taming human suffering, diseases and aging!"
Identifying and Validating the Drug Candidate
The researchers utilized PandaOmics to pinpoint the HCC protein target and Chemistry42 to generate molecules based on the AlphaFold-predicted structure. They synthesized and tested seven molecules, leading to the identification of a small molecule hit compound for cyclin-dependent kinase 20 (CDK20) within a month. This process typically takes months or years using traditional methods.
A second AI cycle resulted in the identification of an even more potent compound (ISM042-2-048) with enhanced binding (Kd, 566.7 ± 256.2 nM) and inhibitory (IC50, 33.4 ± 22.6 nM) activity against CDK20.
The Role of CDK20 in HCC
Previous research has shown that CDK20 is often overexpressed in HCC tumor cell lines, promoting cell cycle progression through a positive feedback loop involving the androgen receptor (AR), CDK20, and β-catenin.
"Therefore, higher binding affinity or enzymatic inhibitory activity in a cell-free system will translate to better anti-proliferation effect in those HCC cell lines with relatively high CDK20 expression," said Zhavoronkov.
Functional assays confirmed that the newly identified molecule selectively inhibited the proliferation of the Huh7 HCC cell line, which expresses high levels of CDK20, compared to the HEK293 non-HCC cell line.
Insilico Medicine's AI-Driven Approach
Insilico Medicine focuses on leveraging AI to streamline every aspect of drug discovery and development, including:
- Target identification
- Novel molecule generation
- Biomarker development
- Treatment personalization
- Clinical trial data analysis
"We are combining these steps into a comprehensive pipeline, which provides a feedback loop that continually strengthens our pipeline," said Zhavoronkov.
Insilico’s Pharma.AI platform employs meta-learning, zero-shot generative reinforcement learning, and genetic algorithms to discover and design inhibitors for targets lacking structural data.
"It has experienced exponential increases in performance and quality over the past few years," said Zhavoronkov. "Our platform is built on years of modeling large biological, chemical, and textual datasets in order to discover new targets and design new compounds with desired properties without the use of large molecular libraries."
Key Components of Pharma.AI
The Pharma.AI platform is powered by two key engines:
-
Chemistry42: A multi-agent reinforcement learning system with 42 generative algorithms that explore chemical space to generate potential drugs. It can optimize small molecule hits and lead compounds, even without a co-crystal structure.
-
PandaOmics: Applies deep learning models to identify therapeutic targets associated with diseases by analyzing omics data from publications, clinical trials, and grant applications. It optimizes targets based on factors like novelty, confidence, and druggability.
"We’ve used PandaOmics to identify new targets for cancer, amyotrophic lateral sclerosis (ALS), and COVID-19. The novel target it discovered for idiopathic pulmonary fibrosis has been developed into a lead AI-designed novel drug candidate [INS018_055]," said Zhavoronkov.
INS018_055, similar to ISM042-2-048, is a protein kinase inhibitor and the first AI-designed drug to reach Phase I clinical trials.
"We have invested deeply into AI as a company and have accumulated a lot of data. We followed $2 trillion worth of research data and invested significant time and resources in making the data machine-readable so that it can be used in our AI platform."
Future Directions
While Insilico Medicine does not plan to advance ISM042-2-048 into clinical trials, the molecule is publicly available for further research.
"The purpose of the study was to serve as a proof-of-concept of what is now possible with AI—demonstrating that it is possible to use a predicted structure for a novel target and usable chemical data in just 30 days," said Zhavoronkov.
Insilico Medicine and the Acceleration Consortium are also developing self-driving laboratories, integrating AI, automation, and advanced computing to further accelerate drug and material discovery.