Skip to content

Submit Results

Submit your benchmark results for inclusion on the official leaderboard. This section explains the submission workflow, required artifacts, and review process.

Submission Workflow

1. Run the benchmark    →  ai4sci-bench run / batch-run
2. Package results      →  Collect output directory
3. Submit for review    →  Open a GitHub issue with result metadata
4. Maintainer review    →  Provenance and reproducibility check
5. Leaderboard entry    →  Results appear on the official leaderboard

Requirements Summary

Requirement Details
Benchmark version Must target a published task set release
Sandbox mode --sandbox task or --sandbox os required
Seed Fixed seed (default: 42) for reproducibility
Prompt levels At least B1, B2, B3, B4 on all submitted tasks
Artifacts Complete output directory including run_metadata.json
Provenance Agent version, model ID, configuration, and CLI version logged

Current Status

Early Access

The submission process is currently manual via GitHub issues. Automated submission infrastructure (Hugging Face Space) is planned for a future phase.

How to Submit

  1. Complete your benchmark runs with --sandbox task or --sandbox os
  2. Open a GitHub Issue with:
    • Agent name and version
    • Model name and configuration
    • Sandbox mode used
    • Link to or upload of run_metadata.json and batch_records/
  3. A maintainer will review provenance and may request additional logs

Next Steps