Domains¶
The public website currently exposes a focused subset of benchmark tasks rather than the entire internal task tree. This page summarizes the public domain coverage of that subset.
Public Domain Snapshot¶
The breakdown below is rendered from generated website data.
Why Domain Coverage Matters¶
ASI-Bench is intended to evaluate scientific workflows rather than one narrow task family. Domain coverage matters because strong agent performance in one area, such as numerical PDE solving, does not automatically transfer to other areas such as computational biophysics or statistical physics.
Even in the current public slice, tasks already differ in:
- scientific framing
- data format and expected outputs
- runtime and package requirements
- evaluation methodology
- how much domain knowledge is needed to make progress
Current Public Policy¶
The domain summary shown here reflects the public test subset only. It is therefore intentionally narrower than the full internal repository.
As the benchmark matures, this page can evolve into:
- a richer domain map
- a domain comparison chart
- a landing page for domain-specific task browsing