Literature Notes

This page summarizes papers on the use of large language models in materials science workflows, particularly for simulation code generation, retrieval-augmented generation, and agentic systems. The goal is to track emerging methods and implementation patterns relevant to computational materials research.


Literature Map

graph TD

classDef review fill:#e8f0ff,stroke:#3b6edc
classDef code fill:#e8ffe8,stroke:#2e8b57
classDef bench fill:#fff4e6,stroke:#d48806
classDef agent fill:#f6e8ff,stroke:#7a3ddc

A[LLMs for Materials Science]

A --> B[Reviews]
A --> C[Code Generation]
A --> D[Benchmarking]
A --> E[Agentic Frameworks]

B --> B1[Applications of Natural Language Processing and Large Language Models in Materials Discovery]
B --> B2[Enabling Large Language Models for Real-World Materials Discovery]
B --> B3[Foundation Models for Materials Discovery – Current State and Future Directions]

C --> C1[Generative Retrieval-Augmented Ontologic Graph and Multiagent Strategies for Interpretive Large Language Model-Based Materials Design]
C --> C2[Developing Large Language Models for Quantum Chemistry Simulation Input Generation]
C --> C3[A Fine-Tuned Large Language Model Based Molecular Dynamics Agent for Code Generation to Obtain Material Thermodynamic Parameters]
C --> C4[ChronoLLM: customizing language models for physics-based simulation code generation]

D --> D1[MatTools: Benchmarking Large Language Models for Materials Science Tools]

E --> E1[Agentic Framework for Programmatic Crystal Structure Generation Using a Fine-Tuned Worker–Supervisor Large Language Model]

class B,B1,B2,B3 review
class C,C1,C2,C3,C4 code
class D,D1 bench
class E,E1 agent

click B1 "https://doi.org/10.1038/s41524-025-01554-0"
click B2 "https://doi.org/10.1038/s42256-025-01058-y"
click B3 "https://doi.org/10.1038/s41524-025-01538-0"
click C1 "https://doi.org/10.1021/acsengineeringau.3c00058"
click C2 "https://doi.org/10.1039/d4dd00366g"
click C3 "https://doi.org/10.1038/s41598-025-92337-6"
click C4 "https://doi.org/10.48550/arXiv.2505.10852"
click D1 "https://doi.org/10.48550/arXiv.2505.10852"
click E1 "https://doi.org/10.1016/j.egyai.2026.100710"


Thematic Areas

  • Review Articles: Survey papers describing the landscape of LLM applications in materials science and identifying challenges

  • LLMs for Code Generation: Work focusing on generating simulation scripts or computational workflows from natural language prompts.

  • Benchmarking: Evaluating LLM performance on domain-specific tasks.

  • Agentic Frameworks: Systems where multiple LLM agents collaborate to generate and validate outputs.


Notes

Generative Retrieval-Augmented Ontologic Graph and Multiagent Strategies for Interpretive Large Language Model-Based Materials Design

  • DOI: 10.1021/acsengineeringau.3c00058

  • Repo: https://github.com/lamm-mit/MeLM

  • Explores how LLMs can support materials design workflows through retrieval and code generation.

  • Fine-tuned LLM specialized for mechanics of materials

  • Retrieval-augmented generation

  • Multi-agent reasoning architecture

  • Library used: pyscf (Table 10)

  • Example tasks include calculation of energy and plotting graph

  • Demonstrates working examples of code generation and RAG queries

  • Queries outside the training domain produce hallucinations

  • Generated code often required manual correction and iterative prompting

  • Notable figure - Table 10

Developing Large Language Models for Quantum Chemistry Simulation Input Generation

  • DOI: 10.1039/d4dd00366g

  • Repo: https://git.lwp.rug.nl/pollice-research-group/ORCAInputFileSynthesis

  • Investigates whether LLMs can generate ORCA quantum chemistry input scripts.

  • Foundational LLMs (e.g., GPT-3.5 Turbo) used.

  • Three approaches evaluated: Prompt Engineering, Retrieval-Augmented Generation (RAG), Fine-tuning using synthetic data.

  • Vector database for RAG implemented with FAIS

  • Fine-tuning produced the best performance

  • Even relatively small datasets significantly improved accuracy

  • Prompt engineering alone has limited reliability

  • RAG improves grounding but still produces structural errors

  • Figure 2: summary of experimental flow.

Applications of Natural Language Processing and Large Language Models in Materials Discovery

  • DOI: 10.1038/s41524-025-01554-0
  • Review Paper
  • Literature survey across NLP and machine learning applications in materials discovery.

MatTools: Benchmarking Large Language Models for Materials Science Tools

  • DOI: 10.48550/arXiv.2505.10852

  • Repo: https://github.com/Grenzlinie/MatTools

  • Develops a benchmark framework for evaluating LLMs on materials science tool usage.

  • Evaluates different interaction paradigms: Pure LLM, Retrieval-augmented generation, Agentic workflows

  • Models tested include: Closed-source LLMs, Open-source models (notably Qwen series), Fine-tuned variants

  • Example library used: pymatge

  • Agentic approaches often improve reliability

  • The paper identifies several categories of errors in LLM-generated simulation code, including: syntax errors, incorrect parameterization, invalid workflow logic

  • Figure 2 and Figure 3.

Enabling Large Language Models for Real-World Materials Discovery

  • DOI: 10.1038/s42256-025-01058-y

  • Review paper.

  • Examines challenges that prevent LLMs from being reliably used in real materials discovery workflows.

  • Provides numerous examples where LLMs fail on domain-specific scientific tasks

  • Figure 1 – overview of LLM integration in materials science

  • Table 2 – possible LLM tasks in the domain (including simulation code generation)

Foundation Models for Materials Discovery – Current State and Future Directions

A Fine-Tuned Large Language Model Based Molecular Dynamics Agent for Code Generation to Obtain Material Thermodynamic Parameters

  • DOI: 10.1038/s41598-025-92337-6

  • Repo: https://github.com/FredericVAN/PKU_MDAgent

  • Develops an LLM-based system for generating molecular dynamics simulation code.

  • Fine-tuned LLMs used

  • Designed for generating simulation workflows

  • Focuses on extracting material thermodynamic parameters from simulations

  • Shows improved code generation performance using fine-tuned models compared to base models.

  • Reliability still depends on prompt structure and domain constraints.

  • Figure 2 – architecture diagram

Agentic Framework for Programmatic Crystal Structure Generation Using a Fine-Tuned Worker–Supervisor Large Language Model

  • DOI: 10.1016/j.egyai.2026.100710

  • Repo: https://github.com/ViktoriiaBaib/T2S_LLM

  • Explores whether agentic LLM architectures can improve reliability in materials simulation code generation.

  • Two-model agent system

    • Worker model: generates pymatgen code for crystal structures
    • Supervisor model: validates generated structures
  • The approach still relies on domain-specific validation tools.

ChronoLLM: customizing language models for physics-based simulation code generation

  • DOI: 10.1007/s11044-026-10152-x

  • PyChrono, an open source multi-physics dynamics engine for multibody systems, used as simulation tool.

  • both closed-source and open-source LLMs customized to generate pychrono simulation scripts

  • the authors report that the generated scripts were rarely correct