AIMAC is a tool for evaluating the accessibility of web pages generated by various LLMs via the OpenRouter API.
- Generates HTML using various LLMs via the OpenRouter API.
- Captures screenshots of the rendered pages using Playwright (parallel by default).
- Runs accessibility checks using Axe-core.
- Scores and compares model performance based on Serious and Critical axe-core violations, with dampening to prevent single rules from dominating. See ARCHITECTURE.md for scoring methodology.
- Provides detailed reports on accessibility violations and compliance.
View the latest benchmark results at aimac.ai.
Requires Python 3.12+ and uv.
# Clone the repository
git clone https://github.com/GAAD-Foundation/AIMAC.git
cd AIMAC
# Install dependencies (including dev tools for testing)
uv sync --all-extras
# Install Playwright browser (required for screenshots)
uv run playwright install chromium
# Verify installation
uv run aimac --helpCreate a .env file with your OpenRouter API key:
cp .env.example .env
# Edit .env and add your OPENROUTER_API_KEYRun commands via uv run:
uv run aimac init
uv run aimac iCreates the database by applying SQL from data/schema/*.sql. The database location is configured via AIMAC_DATABASE_PATH (defaults to ./data/aimac.db).
uv run aimac collect
uv run aimac c
# Force a fresh run (bypass all caches)
uv run aimac collect --refresh
# Test specific models only
uv run aimac collect --models anthropic/claude-sonnet-4,openai/gpt-4oResults are cached automatically. Use --refresh to bypass caches. See ARCHITECTURE.md for caching details.
Fetches the top programming models from the OpenRouter API and:
- Saves a snapshot to
data/snapshots/models/YYYY-MM-DD_HH-MM.jsonfor inspection - Upserts models to the database (preserves any manual overrides)
- Executes pending requests asynchronously with retry logic
- Writes artifacts (HTML, JSON) to
output/directory - Creates a leaderboard and per-model summaries with rankings based on median accessibility score (lower is better). Ties are broken by: (1) mean score, (2) total violations, (3) cost. Models with identical values across all metrics receive the same rank.
Note: Requires OPENROUTER_API_KEY to be set in your .env file.
After running aimac collect, view the results interactively:
# View leaderboard (all models ranked)
uv run aimac report
uv run aimac r # Short alias
# View specific model details
uv run aimac r model 1 # Top-ranked model
uv run aimac r model claude # Fuzzy search by name
uv run aimac r model anthropic/claude-3.5-sonnet # Exact model ID
# Compare models within a category
uv run aimac r category shopping # Fuzzy search
uv run aimac r category 5 # By category numberReports are designed to work well with assistive technology:
- TSV default - Tab-separated output works with screen readers, grep, cut, and Unix pipes
- No ANSI colors - Plain text ensures compatibility with all terminals and assistive technology
- No Unicode decorations - Avoids characters that may not render or announce correctly
Default output is TSV (tab-separated values) - optimized for screen readers and Unix tools:
# Default TSV output
uv run aimac r
# Pretty (padded columns for terminals)
uv run aimac r --format pretty
# JSON (for scripts, web)
uv run aimac r --format json
# Note: Leaderboard JSON includes reliability fields `stddev` (Consistency) and `p90` per model.Set default format via environment variable:
# In .env file
AIMAC_REPORT_FORMAT=tsv|pretty|json
# Or export in shell
export AIMAC_REPORT_FORMAT=jsonAdd -v flag to see additional columns (company names, reliability metrics):
# Leaderboard with total violations, Consistency (StdDev), and P90
uv run aimac r -v
# Model view with Critical/Serious breakdown
uv run aimac r model 1 -v
# Category view with Critical/Serious breakdown
uv run aimac r category shopping -vTSV output works seamlessly with Unix tools:
# Sort by cost (4th column)
uv run aimac r | sort -t$'\t' -k4,4n
# Top 5 models
uv run aimac r | head -n 6 # +1 for header
# Filter by company (requires -v for Company column)
uv run aimac r -v | grep "Anthropic"
# Save to file
uv run aimac r --format json > leaderboard.jsonEach report includes contextual "Run next:" suggestions to help you explore:
# Leaderboard shows:
Run next: "aimac report model 1"
# Model view shows:
Run next: "aimac report model 2"
# Category view shows:
Run next: "aimac report model <top-performer-id>"
Basic workflow:
# 1. View leaderboard
uv run aimac r
# 2. Examine top model
uv run aimac r model 1
# 3. Compare models in a specific category
uv run aimac r category shopping
# 4. Get detailed severity breakdown
uv run aimac r category shopping -vExport for sharing:
# Generate Pretty report for sharing in terminals
uv run aimac r --format pretty > reports/leaderboard.txt
# Export all data as JSON
uv run aimac r --format json > reports/leaderboard.json
uv run aimac r model 1 --format json > reports/top_model.jsonScripting with JSON:
# Extract top model ID
TOP_MODEL=$(uv run aimac r --format json | jq -r '.rows[0].model_id')
# Count models with score < 20
uv run aimac r --format json | jq '[.rows[] | select(.median < 20)] | length'# Main help (shows all commands)
uv run aimac -h
# CLI reporting flags and options
uv run aimac report -h
# CLI reporting examples and workflows
uv run aimac r helpScreenshots run in parallel by default using Playwright, auto-tuned to your CPU. Progress prints as Screenshots progress: N/Total during execution.
Artifacts written to output/ alongside HTML:
{request_id}.html{request_id}.png
See ARCHITECTURE.md for worker formulas, memory model, and timeout configuration.
uv run -m pytest -qTests are hermetic (no real API calls). See ARCHITECTURE.md for test organization and isolation details.
playwright install chromium command not found
Run via uv: uv run playwright install chromium
If you use AIMAC in your research, see CITATION.md for citation formats.
MIT License - see LICENSE for details.