Overview
At Microsoft Research AI for Science we seek highly motivated Postdoctoral Researchers for experimental data integration into the next Biomolecular Emulator (BioEmu) model.
Microsoft Research AI for Science focuses on the development of machine learning and artificial intelligence methods for transforming molecular simulation and discovery of novel materials, drugs and chemical reactions. The BioEmu project aims to model the dynamics and function of proteins, how they change shape, bind to each other, and bind small molecules. This approach will help us to understand biological function and dysfunction on a structural level and lead to more effective and targeted drug discovery. Our BioEmu-1 model was published in Science (see our blog post for links to our open-source software and other resources and this explainer video).
The successful candidate will have the opportunity to work on the following
- Design and scale experimental datasets for ML
- Bridge models with real-world biological measurements (e.g., cryo-em, binding assays)
- Develop workflows that connect noisy experimental signals to actionable model insights
Why this role is exciting
You’ll work on problems that don’t yet have well‑defined benchmarks. Where part of the innovation is deciding what to optimise and prove it matters for biology. It’s an opportunity to bridge state‑of‑the‑art ML with meaningful biomedical impact in a highly collaborative research environment.
Responsibilities
1. Bridging Models with Real-World Experimental Signals
-
Develop methods to connect ML models with experimental observables, such as:
- cryo-em density maps
-
inding affinity / kinetics assays
-
proteomics / sequencing data
- Enable model inference conditioned on or steered by experimental data.
- Interpret discrepancies between model predictions and experimental outcomes to guide iteration.
- Integrate heterogeneous datasets into coherent representations for modeling.
2. Experimental Data Strategy & Dataset Development usch as:
-
Design high-quality, ML-ready experimental datasets (e.g., protein interactions, conformational dynamics, binding measurements, cryo-em density).
- Translate research questions into scalable experimental campaigns with clear success criteria.
- Define dataset standards, metadata, and quality metrics for downstream modeling.
- dentify gaps in existing datasets and propose novel data generation strategies.
3. Model-Aware Experimental Design such as:
-
Establish closed-loop workflows where experimental results refine models and vice versa.
- Define evaluation metrics that reflect real-world biological utility, not just benchmarks.
4. Scalable Data Processing & Automation such as:
-
Build automated, reproducible pipelines for data ingestion, processing, and analysis (Python-based).
- Develop systems for data curation, QC, and uncertainty estimation on noisy experimental data.
- Leverage modern tooling (databases, distributed compute, LLM-assisted workflows) to scale beyond manual analysis.
5. Collaboration & External Coordination, Partner with:
-
ML researchers
- Computational biologists
- experimental collaborators (academic + CROs)
- Provide technical guidance on experimental design, data quality, and iteration cycles.
- Translate between disciplines to ensure alignment between model needs and experimental outputs.
6. Independent Research & Impact such as:
-
Contribute to novel methods at the model–experiment interface.
- Publish research, release datasets/software, and shape internal research direction.
- Drive projects from ambiguous ideas to high-impact, usable artifacts.
Qualifications-
Completed or nearly complete PhD or equivalent experience in a science or engineering discipline.
- Deep expertise in at least one relevant area, such as machine learning for biomolecular systems, molecular modeling and simulation, structural biology, experimental protein assays, or statistical mechanics.
- Strong Python skills and experience building data analysis, modeling, or machine learning pipelines.
- Experience working with real-world biological, structural, experimental, or molecular datasets.
- Ability to work across disciplines and communicate complex ideas clearly.
- Track record of independently owning and delivering research projects.
-
Experience connecting computational models to experimental data, such as cryo-EM, X-ray, NMR, SPR, mass spectrometry, NGS, or other assay readouts.
- Background in generative models, diffusion models, representation learning, molecular dynamics, or statistical mechanics for biomolecular systems.
- Experience with large-scale dataset generation, curation, or automated analysis workflows.
- Familiarity with experimental workflows such as protein expression, purification, interaction assays, or high-throughput systems.
- Interest in closing the loop between modeling and experiment.
- Experience or interest in drug discovery, therapeutics, or real-world biomedical applications.
- Ability to collaborate with external partners and align research goals with practical health challenges.
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process.