Current Benchmark Coverage

73 Test Cases
6 Data Domains
3 Institutions

Data Categories

🔬
Simulation Data
20 cases
CFD, physics simulations, FEM
📦
Object Scans & Industrial CT
19 cases
3D scanning, industrial imaging
🏥
Medical Imaging
11 cases
CT, MRI, medical visualization
📊
Geometry & Synthetic
10 cases
Charts, I/O tests, shapes
🧬
Molecular & Biological
11 cases
Molecular dynamics, microscopy
🌍
Climate & Geospatial
4 cases
Climate science, geospatial data

Contributing Institutions

Argonne National Laboratory
ANL
20 cases
Lawrence Livermore National Laboratory
LLNL
42 cases
University of Notre Dame
Notre Dame
11 cases

Pending Contributions

Track new submissions awaiting review and incorporation into the benchmark. These statistics reflect contributions not yet included in the official benchmark above.

0 Datasets
0 Contributors
0 Test Cases

Pending Contributions Breakdown

Application Domains

Attribute Types

Contributors

Contributor Institution # of Questions Subjects
No contributions yet. Be the first to contribute!

Submit Dataset

Help build a comprehensive benchmark for scientific visualization agents. Submit your dataset along with task descriptions and evaluation criteria.

📁 About File Uploads

Files are uploaded to Firebase Cloud Storage. All submissions are stored securely and will be used for the SciVisAgentBench benchmark.

  • Maximum data size: < 5GB per dataset
  • Ground truth images: PNG, JPG, TIFF, etc. (minimum 1024x1024 pixels recommended)
  • Supported source data formats: VTK, NIfTI, RAW, NRRD, HDF5, etc.

Contributor Information

Dataset Information

Application Domain (Data Source)

Attribute Types *

What information does the data represent?

Task Description for LLM Agent *

File Uploads *

Any format accepted: VTK, NIfTI, RAW, NRRD, HDF5, etc. Multiple files allowed (Max size: 5GB recommended per file)
Optional: Any format accepted (e.g., ParaView state file, or state files of other visualization engines). Multiple files allowed
Optional: Any format (JSON, YAML, TXT, etc.). Multiple files allowed

Outcome-Based Evaluation Metrics *

Any format accepted: PNG, JPG, TIFF, etc. Upload multiple views of the expected visualization
Optional: Any format accepted (e.g., Python, ParaView, Jupyter Notebook, MATLAB, R, or other visualization code). Multiple files allowed
Optional: Enter the correct answers to any questions in the task description

Additional Information

About SciVisAgentBench

What is SciVisAgentBench?

SciVisAgentBench is a comprehensive evaluation framework for scientific data analysis and visualization agents. We aim to transform SciVis agents from experimental tools into reliable scientific instruments through systematic evaluation.

Taxonomy of SciVis agent evaluation

Taxonomy of SciVis agent evaluation, organized into two perspectives: outcome-based evaluation assessing the relationship between input specifications and final outputs while treating agents as black boxes, and process-based evaluation analyzing the agent's action path, decision rationale, and intermediate behaviors.

Why Contribute?

  • Help establish standardized evaluation metrics for visualization agents
  • Drive innovation in autonomous scientific visualization
  • Contribute to open science and reproducible research
  • Be recognized as a contributor to this community effort

Evaluation Taxonomy

Our benchmark evaluates agents across multiple dimensions including outcome quality, process efficiency, and task complexity. We combine LLM-as-a-judge with quantitative metrics for robust assessment.

See our GitHub repository for evaluation examples and deployment guides.

Team

The core team of this project is from the University of Notre Dame, Lawrence Livermore National Laboratory, and Vanderbilt University. Main contributors include Kuangshi Ai (kai@nd.edu), Shusen Liu (liu42@llnl.gov), and Haichao Miao (miao1@llnl.gov).