AI for Science and Agentic Systems

This section will explore agentic AI systems and AI for science. It will feature notes from experiments with multi-agent platforms like the Virtual Lab and SciAgents, highlight key observations from conferences, and discuss protocols, evaluation metrics, knowledge graphs, and collaboration opportunities.

Artificial intelligence is moving beyond chatbots and image generators into systems that can discover new knowledge. I’ve been tracking and experimenting with multi‑agent systems that harness large language models, knowledge graphs, and domain‑specific tools to accelerate scientific research:

  • SciAgents and Virtual Lab – MIT’s SciAgents project uses a swarm of specialised agents connected to an ontological knowledge graph with 33 k nodes across materials, biology and physics. These agents explore the graph via random walks and reinforcement learning to generate hypotheses, design experiments and even write code arxiv.org. Earlier projects like Stanford’s Virtual Lab hard‑coded a step‑by‑step pipeline for generating nanobody candidates; SciAgents generalises this by letting agents decide which tools to call, which data to retrieve and when to ask humans for input. The result is a more open‑ended “discovery engine” capable of serendipitous findings.
  • Why Knowledge Graphs Matter – Traditional vector databases excel at semantic search but lack explicit reasoning: they retrieve similar embeddings but can’t explain relationships. Knowledge graphs encode entities and relations in a structured form, enabling symbolic reasoning, path‑finding and provenance tracking. In multi‑agent systems, they provide a “world model” that agents can traverse logically, making them better for scientific discovery and transparent decision‑making.
  • Evaluation and Safety – Agentic research platforms need first‑class evaluation. Novelty scores, uncertainty estimates and provenance tags help differentiate genuine discoveries from paraphrased literature. Human‑in‑the‑loop review remains critical: autonomous hypotheses must be validated experimentally, and unsafe suggestions should be filtered out.
  • From Bench to Holodeck – My own experiments with generative video (e.g., Sora) highlight the challenges of controlling multiple characters and consistent spatial relationships. The same control problems appear in scientific agents: aligning symbolic prompts with downstream tools, managing resource constraints and ensuring reproducibility. Future work aims to build a Holodeck‑like interface where scientists can run AI‑driven experiments in virtual environments, then translate promising results to the real lab.

How this page will grow: I plan to document my experiments with agentic systems—blog posts, code snippets, and demo videos—and invite researchers to collaborate. The section will include reading lists on knowledge graphs versus vector databases, and showcase breakthroughs like SciAgents, Virtual Lab and multi‑agent wargaming simulations.