Research
Research Areas
My research sits at the intersection of machine learning, uncertainty quantification, and scientific experimentation. I develop methods and systems that enable trustworthy AI-driven discovery in domains where data is expensive, decisions are consequential, and reliability is paramount.
Conformal Prediction & Uncertainty Quantification
I develop rigorous uncertainty quantification methods based on conformal prediction, with emphasis on providing distribution-free, finite-sample guarantees for scientific decision-making.
Key contributions:
- Small Sample Beta Correction (SSBC): Tight PAC coverage guarantees for small-sample regimes
- Operational rate estimation for accountable automation
- Conformalized quantile regression for adaptive experimental control
- Evaluation frameworks for scientific model benchmarking
Autonomous Experimentation
Developing adaptive control systems that achieve 20-100× efficiency gains in experimental data acquisition through intelligent sample selection and real-time decision-making.
Applications:
- Hyperspectral imaging with adaptive point selection
- Mass spectrometry with intelligent acquisition strategies
- Automated feature discovery in geobiology
- Real-time quality control at synchrotron beamlines
Scientific Imaging & Deep Learning
Building foundation models and specialized deep learning systems for scientific image analysis across multiple modalities and scales.
Focus areas:
- Segmentation and anomaly detection for tomography
- Lattice light sheet microscopy analysis
- Electron microscopy image processing
- Representation learning for scientific imaging (SYNAPS-I/reSIFT)
High-Performance Computing & Data Infrastructure
Leading large-scale infrastructure efforts connecting DOE facilities with advanced computing resources.
Current initiatives:
- LAMBDA: Federated AI-ready data infrastructure across DOE facilities
- Cross-facility workflows linking light sources with NERSC
- Exascale workflows for free-electron laser data analysis
- Out-of-core tensor processing (qlty) for constrained hardware
Major Projects
LAMBDA
Federated Data Infrastructure for DOE Facilities
Leading Activity 3: Data Organization and Cross-Facility Workflows. Building AI-ready data infrastructure connecting the Advanced Light Source, NERSC, Molecular Foundry, and other DOE facilities. Developing standards for data organization, API specifications, and authentication systems for federated access.
SYNAPS-I (reSIFT)
Foundation Models for Scientific Imaging
NIH-funded effort developing representation learning approaches for multi-modal scientific data. Building foundation models that can transfer knowledge across imaging modalities and scientific domains.
SSBC
Small Sample Beta Correction for Conformal Prediction
Novel methodology providing tight PAC coverage guarantees in small-sample settings. Enables accountable automation for toxicity screening, molecular property prediction, and structural biology applications.
dlsia
Deep Learning for Scientific Image Analysis
ML toolkit for segmentation, anomaly detection, and analysis across tomography, lattice light sheet microscopy, and electron microscopy. Deployed at multiple DOE facilities and integrated into experimental pipelines.
qlty
Out-of-Core Tensor Processing
Framework enabling efficient ML training and inference on memory-constrained hardware. Allows processing of datasets that exceed available RAM through intelligent chunking and caching strategies.
Autonomous IR Spectroscopy
Real-Time Adaptive Acquisition at Scale
Developed real-time adaptive acquisition and evaluation pipelines for infrared spectroscopy at large-scale DOE facilities. Achieves significant efficiency gains through intelligent sample selection.
Past Major Contributions
Structural Biology & Crystallography
As Science Lead for the Berkeley Center for Structural Biology (2007-2015), I increased automation capacity by over 5× through computational optimization and developed Xtriage, now adopted by the Protein Data Bank for global crystallographic quality control.
Key achievements:
- Co-led peer-reviewed service crystallography program (>500 structures, including Nobel Prize-associated work)
- Developed algorithms for crystallographic pathology and twinning detection
- Built SAXS analysis pipelines with 10× speedup
- Core contributor to PHENIX and CCTBX infrastructure
Free-Electron Laser Science
Developed exascale workflows for FEL scattering data analysis, dramatically boosting throughput and reconstruction speed for serial femtosecond crystallography and other FEL techniques.
Collaborative Research
I maintain active collaborations across:
- Multiple DOE National Laboratories (LBNL, SLAC, ANL, BNL, ORNL)
- NIH-funded imaging initiatives
- Academic partners at UC Berkeley, Stanford, and international institutions
- Industry partners in biotechnology and materials science
For specific publications and technical details, see the Publications and Software pages.
