Constraint-Aware Diffusion for Scientific Applications
Teaching diffusion models to obey physics, geometry, and safety constraints — without retraining.
Problem Statement and Motivation
Generative AI — especially diffusion and flow-matching models — has achieved transformative performance across image synthesis, protein structure prediction, and materials design. Yet a critical gap limits their deployment in high-stakes scientific and engineering domains: these models generate statistically plausible outputs that routinely violate physical laws, geometric constraints, and safety specifications.
Consider the challenge. A diffusion model trained on protein structures generates smooth, realistic-looking backbones — but some violate bond-angle constraints critical for function. A trajectory-planning model produces elegant robot paths, but some pass through solid obstacles. A materials-discovery model proposes crystal lattices with impossible interatomic distances. The models look right, but aren’t.
This reflects a fundamental mismatch: diffusion models optimize for statistical faithfulness to training data, not constraint satisfaction. Retrofitting constraints post-hoc (filtering, rejection sampling) wastes computation and degrades diversity. Retraining massive foundational models with constraint-aware objectives is prohibitively expensive and domain-specific.
Our group addresses this from a fundamentally different angle: making constraint enforcement a native, training-free operation inside the diffusion sampling loop.
Key Takeaways
The Core Framework: Diffusion as Sequential Constrained Optimization
To understand our approach, we first observe that the reverse diffusion process can be recast as a sequence of optimization steps. Denoising Diffusion Probabilistic Models define a forward process that corrupts data \(x_0 \sim p_{\text{data}}(x_0)\) into noise \(x_T\). The learned reverse process recovers data via Gradient Langevin Dynamics (GLD):
\[x_t \leftarrow x_t + \gamma_t \nabla_{x_t} \log p(x_t) + \sqrt{2\gamma_t}\,\epsilon\]As the variance schedule decreases toward \(t \to 0\), GLD transitions toward deterministic gradient ascent on the learned log-likelihood. This allows us to view each reverse step as solving a local optimization problem — and, critically, to augment each step with a feasibility projection:
\[\mathcal{P}_C(x_t) = \arg\min_{y \in C} \|y - x_t\|_2^2\]
Our Constrained-Aware Diffusion Models (CADM) apply this projection at every step of the reverse Markov chain. The result is a formal guarantee: for convex constraint sets, the distance to the feasible region decreases at rate \(1 - 2\beta\gamma_t\) per step, reaching an \(\varepsilon\)-feasible sample in \(\mathcal{O}(\gamma_{\min}^{-1} \log(1/\varepsilon))\) steps — with no degradation of sample diversity.
Scaling to Non-Convex and Latent-Space Constraints
Scientific constraints are rarely convex. We address this through two extensions:
-
Augmented Lagrangians: For non-convex constraint sets, we introduce dual multipliers \(\lambda_j \geq 0\) and solve a local augmented problem at each step, enabling rapid inference without exact non-convex projections.
-
Proximal updates in latent space: Stable Diffusion operates in a compressed latent space \(z_t\). Constraint evaluation (e.g., porosity, stress) requires decoding to pixel space. We construct a differentiable computational graph that evaluates \(g(\mathcal{D}(z_t))\) and propagates gradients back through the decoder:
- Simulators-in-the-loop: When constraint evaluators (e.g., Finite Element Analysis) are non-differentiable, we use calibrated Monte Carlo perturbations to estimate pseudo-gradients, enabling the diffusion model to adjust outputs until simulated constraints are met.
Applications Across Scientific Domains
Protein Design with Hard Structural Constraints
Designing novel proteins requires strict adherence to geometric constraints — bond lengths, angles, and inter-residue distances — to ensure structural stability and functional affinity. Standard diffusion-based protein design tools (e.g., RFdiffusion) can hallucinate backbone geometries that fail costly downstream validation.
Our work (Christopher et al., ICLR 2026) integrates proximal feasibility updates with ADMM decomposition into the RFdiffusion generative process, enforcing hard structural constraints while preserving geometric and stereochemical diversity. We introduce a curated benchmark for motif scaffolding in the PDZ domain and demonstrate:
- 100% constraint satisfaction on bonding and geometric constraints — compared to 0% for all three baselines (Standard RFdiffusion, Recentering, Constraint-Guided Diffusion) across nearly 100,000 generated samples.
- 97.8% usable samples on molecule encapsulation / pocket design — approximately 4× the nearest baseline (24.2%).
- 21% viable designs on motif scaffolding overall, reaching 83% for well-posed ligands, while baselines produced zero feasible designs.
The approach is the first to solve pocket-design problems at full backbone resolution with provable constraint adherence.
Materials Discovery: Constrained Generation for Physical Properties
Discovering structural materials often requires satisfying dynamic physical constraints — target porosity, specific stress-strain responses, interatomic repulsion, or boundary conditions — that are expensive to verify computationally.
Our work (Zampini et al., 2025) evaluates the framework on three distinct materials-science benchmarks:
- Morphometric constraints (microstructure porosity): 0% constraint violations with FID of 13.5, versus 68.4% violation rate for conditional diffusion — achieving strict constraint adherence while maintaining competitive generation quality.
- Metamaterial inverse design (stress-strain response): MSE of 1.4 ± 0.6, a 4.6× improvement over the specialized Bastek & Kochmann baseline (MSE 6.4 ± 4.6), with only 5% invalid shapes versus 55% for conditional diffusion.
- Copyright-safe generation: 90% safety compliance versus 67% for conditional diffusion and 71% for projected diffusion, with an FID of 65.1 maintaining image quality.
Robotics and Multi-Robot Motion Planning
Motion planning in autonomous systems requires finding collision-free trajectories governed by ODE dynamics for velocity, acceleration, and spatial bounds. Standard generative models that ignore these constraints produce trajectories that collide with obstacles or violate kinematic limits.
We have addressed robotics planning through two complementary systems:
SMD — Simultaneous Multi-Robot Motion Planning (Liang et al., 2025) integrates constrained optimization into the diffusion sampling process to produce simultaneously collision-free, kinematically feasible trajectories. On dense maps with 9 robots, SMD achieves 96% success — compared to just 27% for the best baseline (MMD), a 3.6× improvement. In structured environments (corridor maps), SMD reaches 100% success while all baselines score 0%. Evaluated across 4,000 test instances spanning six environment types.
DGD — Discrete-Guided Diffusion (Liang et al., AAAI 2026) couples discrete MAPF solvers with constrained generative diffusion, decomposing the nonconvex multi-robot planning problem into tractable convex subproblems. DGD achieves over 92% success across standard benchmarks, scales to 100 robots in environments with 104 obstacles — a 2.5× improvement in robot count over prior diffusion methods — and is 20× faster than SMD on comparable settings (12.2s vs 254.3s).
Basic map
Dense map
Corridor map
Shelf map
Room map
Notably, because we explicitly encode physics, these models can generalize out-of-distribution: generating valid trajectories under gravitational conditions never seen during training (e.g., lunar gravity).
Chemistry: Constrained Discrete Diffusion for Molecular Generation
Chemical design is fundamentally discrete. Generating molecules in SMILES format requires selecting tokens from a finite vocabulary while satisfying valency rules, structural filters, and molecular property targets — simultaneously.
Our Constrained Discrete Diffusion (CDD) framework (Cardei et al., 2025) adapts the CADM principle to discrete sequence spaces by defining projections over the probability simplex that minimize the KL divergence between original logits and feasibility-constrained logits. Key results:
- Zero constraint violations on toxicity mitigation across all threshold levels, versus 33.2%, 21.6%, and 13.1% for GPT-2 and 17–32% for MDLM/UDLM baselines.
- Zero violations on counting and lexical constraints, versus 54.5% and 97.5% for unconstrained MDLM.
- 392 novel non-toxic molecules in the AI4D3 molecular-generation benchmark, versus 108 for MDLM and 5 for autoregressive generation.
- Minimal computational overhead compared to PPLM (85× slowdown) and FUDGE (134–143× slowdown).
Extending to code and structured text: (Shao et al., 2026) applies constrained discrete diffusion to code generation, enforcing syntactic and semantic constraints on program structure directly in the generation process.
The Frontier: Constraint-Aware Flow Matching
Training-free constraint enforcement introduces a fundamental mismatch: the model is trained on unconstrained trajectories but evaluated on constrained ones, inducing distributional shift that can degrade sample quality. Our most recent work eliminates this gap.
Constraint-Aware Flow Matching (CAFM) (Christopher et al., 2026) proposes an end-to-end framework that explicitly incorporates constraint projections into the training objective of flow-matching models. By aligning the model’s learned velocity field with the constrained sampling procedure, CAFM eliminates distributional shift at its root, achieving higher constraint satisfaction rates and better sample quality than training-free methods — with strict per-sample feasibility guarantees.
Search-Augmented Masked Diffusion (SearchDiff) (Ta et al., 2026) takes a complementary approach for discrete domains: augmenting masked diffusion with tree search to explore the token space and find constraint-satisfying sequences, achieving superior performance on constrained generation benchmarks.
Key Results
What constraint-aware generative modeling changes
| Domain | Method | Result to remember |
|---|---|---|
| Protein design | CADM + ADMM ICLR 2026 | 100% constraint satisfaction vs. 0% for all baselines; 97.8% usable samples, about 4× the nearest baseline. |
| Materials science | Latent constrained diffusion NeurIPS 2025 | 4.6× improvement in stress-strain MSE; 0% morphometric violations vs. 68.4% for conditional diffusion. |
| Multi-robot planning | SMD ICML 2025 | 96% success on dense maps vs. 27% for the best baseline; 100% vs. 0% in corridor maps. |
| Scalable planning | DGD AAAI 2026 | Scales to 100 robots, a 2.5× increase; 20× faster than SMD with 92%+ success across benchmarks. |
| Molecular generation | CDD 2025 | 0% toxicity and lexical violations; 392 novel non-toxic molecules vs. 108 for MDLM. |
| Flow matching | CAFM 2026 | Eliminates the training-sampling distribution shift and improves the quality-satisfaction trade-off. |
Beyond individual results, this body of work establishes a general principle: constraint satisfaction can be treated as a first-class citizen in generative modeling, achieved through the integration of differentiable optimization into the generative process — rather than bolted on as post-processing.
Relevant Citations
- Christopher, J. K., Seamann, A., Cui, J., Khare, S., Fioretto, F. (2026). Constrained Diffusion for Protein Design with Hard Structural Constraints. ICLR 2026. arXiv:2510.14989
- Zampini, S., Christopher, J. K., Oneto, L., Anguita, D., Fioretto, F. (2025). Training-Free Constrained Generation With Stable Diffusion Models. NeurIPS 2025 Spotlight. arXiv:2502.05625
- Cardei, M., Christopher, J. K., Hartvigsen, T., Kailkhura, B., Fioretto, F. (2025). Constrained Discrete Diffusion. NeurIPS 2025. arXiv:2503.09790
- Liang, J., Christopher, J. K., Koenig, S., Fioretto, F. (2025). Simultaneous Multi-Robot Motion Planning with Projected Diffusion Models. ICML 2025. arXiv:2502.03607
- Liang, J., Koenig, S., Fioretto, F. (2026). Discrete-Guided Diffusion for Scalable and Safe Multi-Robot Motion Planning. AAAI 2026. arXiv:2508.20095
- Christopher, J. K., Warner, J. E., Fioretto, F. (2026). Constraint-Aware Flow Matching: Decision Aligned End-to-End Training for Constrained Sampling. arXiv:2605.12754
- Ta, H. B., Cardei, M., Velasquez, A., Fioretto, F. (2026). Search-Augmented Masked Diffusion Models for Constrained Generation. arXiv:2602.02727
- Shao, L., Cardei, M., Xie, Z., Fioretto, F., Wang, W. (2026). Constrained Code Generation with Discrete Diffusion. arXiv:2605.16829