AIMAC

AIMAC is a tool for evaluating the accessibility of web pages generated by various LLMs via the OpenRouter API.

Generates HTML using various LLMs via the OpenRouter API.
Captures screenshots of the rendered pages using Playwright (parallel by default).
Runs accessibility checks using Axe-core.
Scores and compares model performance based on Serious and Critical axe-core violations, with dampening to prevent single rules from dominating. See ARCHITECTURE.md for scoring methodology.
Provides detailed reports on accessibility violations and compliance.

Live Results

View the latest benchmark results at aimac.ai.

Quick Start

Requires Python 3.12+ and uv.

# Clone the repository
git clone https://github.com/GAAD-Foundation/AIMAC.git
cd AIMAC

# Install dependencies (including dev tools for testing)
uv sync --all-extras

# Install Playwright browser (required for screenshots)
uv run playwright install chromium

# Verify installation
uv run aimac --help

Environment Setup

Create a .env file with your OpenRouter API key:

cp .env.example .env
# Edit .env and add your OPENROUTER_API_KEY

Usage

Run commands via uv run:

Initialize the Database

uv run aimac init
uv run aimac i

Creates the database by applying SQL from data/schema/*.sql. The database location is configured via AIMAC_DATABASE_PATH (defaults to ./data/aimac.db).

Collect Models

uv run aimac collect
uv run aimac c

# Force a fresh run (bypass all caches)
uv run aimac collect --refresh

# Test specific models only
uv run aimac collect --models anthropic/claude-sonnet-4,openai/gpt-4o

Results are cached automatically. Use --refresh to bypass caches. See ARCHITECTURE.md for caching details.

Fetches the top programming models from the OpenRouter API and:

Saves a snapshot to data/snapshots/models/YYYY-MM-DD_HH-MM.json for inspection
Upserts models to the database (preserves any manual overrides)
Executes pending requests asynchronously with retry logic
Writes artifacts (HTML, JSON) to output/ directory
Creates a leaderboard and per-model summaries with rankings based on median accessibility score (lower is better). Ties are broken by: (1) mean score, (2) total violations, (3) cost. Models with identical values across all metrics receive the same rank.

Note: Requires OPENROUTER_API_KEY to be set in your .env file.

View Reports on the Command Line

After running aimac collect, view the results interactively:

# View leaderboard (all models ranked)
uv run aimac report
uv run aimac r                    # Short alias

# View specific model details
uv run aimac r model 1            # Top-ranked model
uv run aimac r model claude       # Fuzzy search by name
uv run aimac r model anthropic/claude-3.5-sonnet  # Exact model ID

# Compare models within a category
uv run aimac r category shopping  # Fuzzy search
uv run aimac r category 5         # By category number

Report Features

Accessibility-First Design

Reports are designed to work well with assistive technology:

TSV default - Tab-separated output works with screen readers, grep, cut, and Unix pipes
No ANSI colors - Plain text ensures compatibility with all terminals and assistive technology
No Unicode decorations - Avoids characters that may not render or announce correctly

Output Formats

Default output is TSV (tab-separated values) - optimized for screen readers and Unix tools:

# Default TSV output
uv run aimac r

# Pretty (padded columns for terminals)
uv run aimac r --format pretty

# JSON (for scripts, web)
uv run aimac r --format json
# Note: Leaderboard JSON includes reliability fields `stddev` (Consistency) and `p90` per model.

Set default format via environment variable:

# In .env file
AIMAC_REPORT_FORMAT=tsv|pretty|json

# Or export in shell
export AIMAC_REPORT_FORMAT=json

Verbose Mode

Add -v flag to see additional columns (company names, reliability metrics):

# Leaderboard with total violations, Consistency (StdDev), and P90
uv run aimac r -v

# Model view with Critical/Serious breakdown
uv run aimac r model 1 -v

# Category view with Critical/Serious breakdown
uv run aimac r category shopping -v

Unix Composability

TSV output works seamlessly with Unix tools:

# Sort by cost (4th column)
uv run aimac r | sort -t$'\t' -k4,4n

# Top 5 models
uv run aimac r | head -n 6  # +1 for header

# Filter by company (requires -v for Company column)
uv run aimac r -v | grep "Anthropic"

# Save to file
uv run aimac r --format json > leaderboard.json

Navigation Hints

Each report includes contextual "Run next:" suggestions to help you explore:

# Leaderboard shows:
Run next: "aimac report model 1"

# Model view shows:
Run next: "aimac report model 2"

# Category view shows:
Run next: "aimac report model <top-performer-id>"

Examples

Basic workflow:

# 1. View leaderboard
uv run aimac r

# 2. Examine top model
uv run aimac r model 1

# 3. Compare models in a specific category
uv run aimac r category shopping

# 4. Get detailed severity breakdown
uv run aimac r category shopping -v

Export for sharing:

# Generate Pretty report for sharing in terminals
uv run aimac r --format pretty > reports/leaderboard.txt

# Export all data as JSON
uv run aimac r --format json > reports/leaderboard.json
uv run aimac r model 1 --format json > reports/top_model.json

Scripting with JSON:

# Extract top model ID
TOP_MODEL=$(uv run aimac r --format json | jq -r '.rows[0].model_id')

# Count models with score < 20
uv run aimac r --format json | jq '[.rows[] | select(.median < 20)] | length'

Help Commands

# Main help (shows all commands)
uv run aimac -h

# CLI reporting flags and options
uv run aimac report -h

# CLI reporting examples and workflows
uv run aimac r help

Screenshots

Screenshots run in parallel by default using Playwright, auto-tuned to your CPU. Progress prints as Screenshots progress: N/Total during execution.

Artifacts written to output/ alongside HTML:

{request_id}.html
{request_id}.png

See ARCHITECTURE.md for worker formulas, memory model, and timeout configuration.

Testing

uv run -m pytest -q

Tests are hermetic (no real API calls). See ARCHITECTURE.md for test organization and isolation details.

Troubleshooting

playwright install chromium command not found

Run via uv: uv run playwright install chromium

Citation

If you use AIMAC in your research, see CITATION.md for citation formats.

License

MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data/schema		data/schema
src/aimac		src/aimac
tests		tests
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CITATION.md		CITATION.md
LICENSE		LICENSE
README.md		README.md
aimac		aimac
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AIMAC

Live Results

Quick Start

Environment Setup

Usage

Initialize the Database

Collect Models

View Reports on the Command Line

Report Features

Accessibility-First Design

Output Formats

Verbose Mode

Unix Composability

Navigation Hints

Examples

Help Commands

Screenshots

Testing

Troubleshooting

Citation

License

About

Uh oh!

Releases 1

Packages

Languages

License

GAAD-Foundation/AIMAC

Folders and files

Latest commit

History

Repository files navigation

AIMAC

Live Results

Quick Start

Environment Setup

Usage

Initialize the Database

Collect Models

View Reports on the Command Line

Report Features

Accessibility-First Design

Output Formats

Verbose Mode

Unix Composability

Navigation Hints

Examples

Help Commands

Screenshots

Testing

Troubleshooting

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages