graph TD classDef review fill:#e8f0ff,stroke:#3b6edc classDef code fill:#e8ffe8,stroke:#2e8b57 classDef bench fill:#fff4e6,stroke:#d48806 classDef agent fill:#f6e8ff,stroke:#7a3ddc A[LLMs for Materials Science] A --> B[Reviews] A --> C[Code Generation] A --> D[Benchmarking] A --> E[Agentic Frameworks] B --> B1[Applications of Natural Language Processing and Large Language Models in Materials Discovery] B --> B2[Enabling Large Language Models for Real-World Materials Discovery] B --> B3[Foundation Models for Materials Discovery – Current State and Future Directions] C --> C1[Generative Retrieval-Augmented Ontologic Graph and Multiagent Strategies for Interpretive Large Language Model-Based Materials Design] C --> C2[Developing Large Language Models for Quantum Chemistry Simulation Input Generation] C --> C3[A Fine-Tuned Large Language Model Based Molecular Dynamics Agent for Code Generation to Obtain Material Thermodynamic Parameters] C --> C4[ChronoLLM: customizing language models for physics-based simulation code generation] D --> D1[MatTools: Benchmarking Large Language Models for Materials Science Tools] E --> E1[Agentic Framework for Programmatic Crystal Structure Generation Using a Fine-Tuned Worker–Supervisor Large Language Model] class B,B1,B2,B3 review class C,C1,C2,C3,C4 code class D,D1 bench class E,E1 agent click B1 "https://doi.org/10.1038/s41524-025-01554-0" click B2 "https://doi.org/10.1038/s42256-025-01058-y" click B3 "https://doi.org/10.1038/s41524-025-01538-0" click C1 "https://doi.org/10.1021/acsengineeringau.3c00058" click C2 "https://doi.org/10.1039/d4dd00366g" click C3 "https://doi.org/10.1038/s41598-025-92337-6" click C4 "https://doi.org/10.48550/arXiv.2505.10852" click D1 "https://doi.org/10.48550/arXiv.2505.10852" click E1 "https://doi.org/10.1016/j.egyai.2026.100710"
Literature Notes
This page summarizes papers on the use of large language models in materials science workflows, particularly for simulation code generation, retrieval-augmented generation, and agentic systems. The goal is to track emerging methods and implementation patterns relevant to computational materials research.
Literature Map
Thematic Areas
Review Articles: Survey papers describing the landscape of LLM applications in materials science and identifying challenges
LLMs for Code Generation: Work focusing on generating simulation scripts or computational workflows from natural language prompts.
Benchmarking: Evaluating LLM performance on domain-specific tasks.
Agentic Frameworks: Systems where multiple LLM agents collaborate to generate and validate outputs.
Notes
Generative Retrieval-Augmented Ontologic Graph and Multiagent Strategies for Interpretive Large Language Model-Based Materials Design
Explores how LLMs can support materials design workflows through retrieval and code generation.
Fine-tuned LLM specialized for mechanics of materials
Retrieval-augmented generation
Multi-agent reasoning architecture
Library used: pyscf (Table 10)
Example tasks include calculation of energy and plotting graph
Demonstrates working examples of code generation and RAG queries
Queries outside the training domain produce hallucinations
Generated code often required manual correction and iterative prompting
Notable figure - Table 10
Developing Large Language Models for Quantum Chemistry Simulation Input Generation
DOI: 10.1039/d4dd00366g
Repo: https://git.lwp.rug.nl/pollice-research-group/ORCAInputFileSynthesis
Investigates whether LLMs can generate ORCA quantum chemistry input scripts.
Foundational LLMs (e.g., GPT-3.5 Turbo) used.
Three approaches evaluated: Prompt Engineering, Retrieval-Augmented Generation (RAG), Fine-tuning using synthetic data.
Vector database for RAG implemented with FAIS
Fine-tuning produced the best performance
Even relatively small datasets significantly improved accuracy
Prompt engineering alone has limited reliability
RAG improves grounding but still produces structural errors
Figure 2: summary of experimental flow.
Applications of Natural Language Processing and Large Language Models in Materials Discovery
- DOI: 10.1038/s41524-025-01554-0
- Review Paper
- Literature survey across NLP and machine learning applications in materials discovery.
MatTools: Benchmarking Large Language Models for Materials Science Tools
Develops a benchmark framework for evaluating LLMs on materials science tool usage.
Evaluates different interaction paradigms: Pure LLM, Retrieval-augmented generation, Agentic workflows
Models tested include: Closed-source LLMs, Open-source models (notably Qwen series), Fine-tuned variants
Example library used: pymatge
Agentic approaches often improve reliability
The paper identifies several categories of errors in LLM-generated simulation code, including: syntax errors, incorrect parameterization, invalid workflow logic
Figure 2 and Figure 3.
Enabling Large Language Models for Real-World Materials Discovery
Review paper.
Examines challenges that prevent LLMs from being reliably used in real materials discovery workflows.
Provides numerous examples where LLMs fail on domain-specific scientific tasks
Figure 1 – overview of LLM integration in materials science
Table 2 – possible LLM tasks in the domain (including simulation code generation)
Foundation Models for Materials Discovery – Current State and Future Directions
- DOI: 10.1038/s41524-025-01538-0
- Review paper.
A Fine-Tuned Large Language Model Based Molecular Dynamics Agent for Code Generation to Obtain Material Thermodynamic Parameters
Develops an LLM-based system for generating molecular dynamics simulation code.
Fine-tuned LLMs used
Designed for generating simulation workflows
Focuses on extracting material thermodynamic parameters from simulations
Shows improved code generation performance using fine-tuned models compared to base models.
Reliability still depends on prompt structure and domain constraints.
Figure 2 – architecture diagram
Agentic Framework for Programmatic Crystal Structure Generation Using a Fine-Tuned Worker–Supervisor Large Language Model
Explores whether agentic LLM architectures can improve reliability in materials simulation code generation.
Two-model agent system
- Worker model: generates pymatgen code for crystal structures
- Supervisor model: validates generated structures
The approach still relies on domain-specific validation tools.
ChronoLLM: customizing language models for physics-based simulation code generation
PyChrono, an open source multi-physics dynamics engine for multibody systems, used as simulation tool.
both closed-source and open-source LLMs customized to generate pychrono simulation scripts
the authors report that the generated scripts were rarely correct