AI drug discovery has a ceiling. It isn't compute. It isn't model architecture. It isn't even data volume. The ceiling is the pathway layer — the structured, machine-readable, cell-type-specific representation of how biology actually works. And we haven't built it.

Today's AI models predict molecule properties beautifully. They predict pathway behavior poorly. The reason isn't a research gap — it's an infrastructure gap. The data those models would need either doesn't exist, doesn't connect, or sits in fifteen incompatible formats.

If we want AI drug discovery to deliver on its promise, the field has to invest in the pathway layer the way the genomics community invested in sequencing infrastructure twenty years ago.

Here's what that looks like.

1. Build pathway databases that are actually machine-readable

KEGG, Reactome, BioPAX, SBML — they exist, they're heroic, and they're not enough. Pathway maps today are:

Cell-type agnostic — the same generic diagram for a hepatocyte and a neuron
Static — no temporal dynamics, no signal propagation timing
Inconsistent — the same pathway is named, segmented, and annotated differently across databases
Sparsely curated — many pathways haven't been updated since the late 2010s

What's needed: a versioned, normalized, ML-native pathway store with cell-type stratification and time-resolved edges. Think GitHub for pathways, not PDFs.

2. Generate causal pathway data at scale, not correlation

Most pathway maps are inferred from correlations across observational data. That's a recipe for spurious edges and missing real ones. Causal data requires perturbation:

Perturb-seq, CRISPR screens, optogenetic interventions — each one isolates a single variable
Tens of millions of perturbations, not the tens of thousands we have today
Across multiple cell types, not just immortalized cancer lines

The Human Cell Atlas, Tabula Sapiens, and Perturb-seq consortia are doing real work here. They need an order of magnitude more funding.

3. Multi-omics, not single-omics

A pathway's behavior at any moment depends on:

Transcriptomics (what mRNA is being made)
Proteomics (what protein is actually present)
Metabolomics (what metabolites are flowing)
Epigenomics (what's accessible to be read)
Spatial context (where in the cell, where in the tissue)

Today's foundation models in biology mostly train on one modality. We need multi-omic foundation models — the AlphaFold of pathway state, not pathway structure.

4. Cell- and organ-level digital twins

Karr Lab's whole-cell model of Mycoplasma genitalium in 2012 was a breakthrough. Thirteen years later, we still don't have a comparable model for any human cell type. We need:

Virtual cells that simulate a perturbation and predict downstream pathway response
Tissue-scale models that integrate cell-cell signaling
Organ-scale models for predicting drug effects before any wet-lab work

NVIDIA Omniverse, OpenSim, and the EU's Human Brain Project show the rendering and compute substrate exists. The biological models on top are what's missing.

5. Federated infrastructure for proprietary data

Pharma companies sit on enormous proprietary pathway data. They will never open-source it. Federated learning gives a path to train shared models without sharing raw data — exactly the architecture the field needs to break the consortium-vs-competitor deadlock.

This requires industry standards, governance frameworks, and trust infrastructure that nobody has the patience to build alone. The same pattern that makes enterprise AI work in regulated industries — auditable, permissioned, mission-aligned — applies directly here.

6. Pathway-aware foundation models

Imagine a model trained on every published perturbation experiment, every multi-omic time course, every wet-lab pathway interrogation — across cell types, species, conditions. You hand it a perturbation: "add this molecule to a hepatocyte at this concentration." It returns a predicted multi-omic state at t+1 hour, t+24 hours, t+1 week.

That model doesn't exist. It's the AlphaFold of pathway dynamics — and it's the unlock. Without it, every drug program is a blind shot at a moving target.

Who builds this

This isn't a single-company opportunity. The pathway layer is infrastructure — like sequencing was, like the internet was. It requires:

NIH and EU funding programs specifically for pathway infrastructure, not "AI in healthcare" buzzword grants
Open consortia (Human Cell Atlas, ENCODE, ChEMBL — but for pathways)
Pharma open-data commitments for non-competitive precompetitive infrastructure
A new generation of biotech companies building the picks and shovels, not the gold

The payoff

When the pathway layer exists, AI drug discovery stops being a chemistry problem dressed up in a deep learning hat. It becomes what it should have been all along: systems pharmacology with AI as the inference engine. Predictions become testable. Off-target effects become predictable. Failures shift from Phase II in humans to Day Zero in silico.

The window

The next decade of drug discovery will be decided by who builds the pathway layer first. The companies still racing to train bigger molecule models are optimizing the wrong variable.

The molecule was never the bottleneck. The biology was. It still is.

The Pathway Layer: What AI Drug Discovery Needs to Build Next

1. Build pathway databases that are actually machine-readable

2. Generate causal pathway data at scale, not correlation

3. Multi-omics, not single-omics

4. Cell- and organ-level digital twins

5. Federated infrastructure for proprietary data

6. Pathway-aware foundation models

Who builds this

The payoff

The window

Related Posts

The Human OS: AI Drug Discovery Needs Systems Biology

The Scientific Visual OS: When NVIDIA Renders Drug Discovery

Uniting Tech and Science Geeks in AI Drug Discovery