-
Notifications
You must be signed in to change notification settings - Fork 34
Description
Summary
When two sequential leases use the same AWS account within a 24-hour window, cost attribution between leases may be inaccurate. The current soft 24-hour cooldown should become a hard requirement when accurate per-lease billing is needed.
Problem
Current Behavior
The system prefers accounts not used in the last 24 hours but will fall back to recently-used accounts if no preferred accounts are available:
// source/lambdas/api/innovation-sandbox/src/innovation-sandbox.ts:898-948
if (!lastCleanupTime || parseDatetime(lastCleanupTime) <= twentyFourHoursAgo) {
preferredAccounts.push(account);
} else {
fallbackAccounts.push(account); // Still usable
}A warning is logged but the lease proceeds:
"The account acquired for the lease has been used within the last 24 hours and may result in inaccurate cost data"
Why This Causes Inaccurate Billing
-
Cost Explorer data delay: AWS Cost Explorer data is delayed 8-24 hours. When Lease A terminates, its final
totalCostAccruedsnapshot may miss costs that haven't appeared in Cost Explorer yet. -
No delayed reconciliation: Once a lease terminates, monitoring stops. There's no follow-up to capture costs that appear later.
-
Gap period attribution: Costs incurred between Lease A's termination and Lease B's start may not be attributed to either lease.
Example Scenario
Timeline:
09:00 - Lease A starts on Account-123
17:00 - Lease A terminates (final cost snapshot: $50)
17:30 - Lease B starts on Account-123
Next day - Cost Explorer shows $20 from Lease A's final hours
Result:
- Lease A shows $50 (missing $20)
- Lease B shows costs from 17:30 onward
- $20 is lost/unattributed
Proposed Solution
Introduce a new "Cooldown" account status (similar to Quarantine) that separates recently-used accounts from the available pool.
How It Works
- After lease termination: Account moves to
Cooldownstatus instead of directly toAvailable - Automatic release: A scheduled process moves accounts from
CooldowntoAvailableafter 24 hours - Admin override: Admins can manually move accounts from
CooldowntoAvailableearly (accepting billing inaccuracy)
Account State Flow
Active (lease in progress)
│
▼
CleanUp (cleanup running)
│
▼
Cooldown (new state - 24hr wait for Cost Explorer)
│
├─► [24 hours elapsed] ─► Available
│
└─► [Admin manual release] ─► Available (with warning logged)
Global Configuration
{
// ... existing config
enforceAccountCooldown: boolean // Default: false for backward compatibility
}When disabled (default): Current behavior - accounts go directly to Available after cleanup.
When enabled: Accounts go to Cooldown status and wait 24 hours before becoming Available.
Why Global Only
A per-template option wouldn't work: if Template A enforces cooldown but Template B doesn't, Template B could use an account and pollute the billing window for Template A anyway. The cooldown operates at the account level, so enforcement must be global.
Files to Modify
| File | Change |
|---|---|
source/common/data/sandbox-account/sandbox-account.ts |
Add Cooldown to account status enum |
source/common/data/innovation-sandbox-config/innovation-sandbox-config.ts |
Add enforceAccountCooldown field |
source/lambdas/api/innovation-sandbox/src/innovation-sandbox.ts |
Filter out Cooldown accounts in acquireAvailableAccount(), add admin release endpoint |
source/lambdas/account-management/account-cleanup/ |
Transition to Cooldown instead of Available when config enabled |
New Lambda: cooldown-release-handler.ts |
Scheduled process to move Cooldown → Available after 24 hours |
source/infrastructure/lib/ |
Add EventBridge schedule for cooldown release |
| Frontend | Add Cooldown status display, admin release button, config toggle |
Acceptance Criteria
-
Cooldownaccount status added - Global config option
enforceAccountCooldownadded - Cleanup transitions accounts to Cooldown when enabled
- Scheduled Lambda releases accounts after 24 hours
- Admin API endpoint to manually release accounts early
- Frontend shows Cooldown accounts and release action
- Documentation updated
- Existing behavior unchanged when setting is disabled (default)