Research

Research Areas

My research sits at the intersection of machine learning, uncertainty quantification, and scientific experimentation. I develop methods and systems that enable trustworthy AI-driven discovery in domains where data is expensive, decisions are consequential, and reliability is paramount.

Conformal Prediction & Uncertainty Quantification

I develop rigorous uncertainty quantification methods based on conformal prediction, with emphasis on providing distribution-free, finite-sample guarantees for scientific decision-making.

Key contributions:

Autonomous Experimentation

Developing adaptive control systems that achieve 20-100× efficiency gains in experimental data acquisition through intelligent sample selection and real-time decision-making.

Applications:

Scientific Imaging & Deep Learning

Building foundation models and specialized deep learning systems for scientific image analysis across multiple modalities and scales.

Focus areas:

High-Performance Computing & Data Infrastructure

Leading large-scale infrastructure efforts connecting DOE facilities with advanced computing resources.

Current initiatives:

Major Projects

LAMBDA

Federated Data Infrastructure for DOE Facilities

Leading Activity 3: Data Organization and Cross-Facility Workflows. Building AI-ready data infrastructure connecting the Advanced Light Source, NERSC, Molecular Foundry, and other DOE facilities. Developing standards for data organization, API specifications, and authentication systems for federated access.

SYNAPS-I (reSIFT)

Foundation Models for Scientific Imaging

NIH-funded effort developing representation learning approaches for multi-modal scientific data. Building foundation models that can transfer knowledge across imaging modalities and scientific domains.

SSBC

Small Sample Beta Correction for Conformal Prediction

Novel methodology providing tight PAC coverage guarantees in small-sample settings. Enables accountable automation for toxicity screening, molecular property prediction, and structural biology applications.

dlsia

Deep Learning for Scientific Image Analysis

ML toolkit for segmentation, anomaly detection, and analysis across tomography, lattice light sheet microscopy, and electron microscopy. Deployed at multiple DOE facilities and integrated into experimental pipelines.

qlty

Out-of-Core Tensor Processing

Framework enabling efficient ML training and inference on memory-constrained hardware. Allows processing of datasets that exceed available RAM through intelligent chunking and caching strategies.

Autonomous IR Spectroscopy

Real-Time Adaptive Acquisition at Scale

Developed real-time adaptive acquisition and evaluation pipelines for infrared spectroscopy at large-scale DOE facilities. Achieves significant efficiency gains through intelligent sample selection.

Past Major Contributions

Structural Biology & Crystallography

As Science Lead for the Berkeley Center for Structural Biology (2007-2015), I increased automation capacity by over 5× through computational optimization and developed Xtriage, now adopted by the Protein Data Bank for global crystallographic quality control.

Key achievements:

Free-Electron Laser Science

Developed exascale workflows for FEL scattering data analysis, dramatically boosting throughput and reconstruction speed for serial femtosecond crystallography and other FEL techniques.

Collaborative Research

I maintain active collaborations across:


For specific publications and technical details, see the Publications and Software pages.