Skip to content

Commit 0a89967

Browse files
jkpeCopilot
andauthored
feat: reintroduce DigitalOcean API metrics support for load balancer scaling (#2)
* feat: Implement DigitalOcean metrics client and MuxMetrics for routing metric requests * docs: Enhance README to clarify support for both DigitalOcean API and Prometheus metrics, including updated examples and configuration options. * feat: Add DigitalOcean Load Balancer configuration examples for HTTP and Network services * feat: Add DO_API_TOKEN to deployment configuration * docs: Update README with detailed instructions for creating a DigitalOcean API token, Kubernetes secret, and verifying controller functionality * Update do_client.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent 8fb7d89 commit 0a89967

File tree

7 files changed

+394
-27
lines changed

7 files changed

+394
-27
lines changed

README.md

Lines changed: 225 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,15 @@
22

33
https://github.com/user-attachments/assets/9ab9f805-df87-49f1-98b4-edceb66a5b2f
44

5-
A lightweight Kubernetes controller that automatically scales a DigitalOcean Load Balancer node size (size unit) based on Prometheus metrics from your ingress controller.
5+
A lightweight Kubernetes controller that automatically scales a DigitalOcean Load Balancer node size (size unit) based on metrics from either the DigitalOcean API or Prometheus.
66

77
## How it works
88

99
- Watches `Service` objects of type `LoadBalancer` that include required annotations.
10-
- Periodically fetches the configured Prometheus query.
11-
- `nginx_ingress_controller_requests` used as an example.
12-
- Uses HTTP-style ingress metrics (e.g., total requests per second) to compute desired nodes.
10+
- Periodically fetches metrics from either:
11+
- **DigitalOcean API**: Direct load balancer metrics (e.g., throughput, requests)
12+
- **Prometheus**: Custom queries for ingress/application metrics
13+
- Uses the configured metric to compute desired nodes.
1314
- Computes the desired `size_unit` with hysteresis and min/max bounds and writes it back to the Service annotation.
1415

1516
DigitalOcean Cloud Controller Manager applies annotation changes to the actual Load Balancer.
@@ -18,27 +19,46 @@ DigitalOcean Cloud Controller Manager applies annotation changes to the actual L
1819

1920
- Install from the DigitalOcean Kubernetes Marketplace:
2021
- [Kubernetes Metrics Server](https://marketplace.digitalocean.com/apps/kubernetes-metrics-server)
21-
- [Kubernetes Monitoring Stack](https://marketplace.digitalocean.com/apps/kubernetes-monitoring-stack) (kube-prometheus-stack)
22+
- [Kubernetes Monitoring Stack](https://marketplace.digitalocean.com/apps/kubernetes-monitoring-stack) (kube-prometheus-stack) - *optional, only needed for Prometheus metrics*
2223
- [Nginx Ingress Controller](https://marketplace.digitalocean.com/apps/nginx-ingress-controller) (optional, any ingress controller should work)
2324

2425
## Deploy
2526

27+
- Create a DigitalOcean API token with least privileges:
28+
- Create a token with Custom Scopes following the official guide: [`Create a personal access token`](https://docs.digitalocean.com/reference/api/create-personal-access-token/)
29+
- Grant only these scopes:
30+
- `monitoring:read`
31+
- Create a Kubernetes secret with your DigitalOcean API token:
32+
33+
```bash
34+
kubectl -n kube-system create secret generic doks-lb-scale-secret --from-literal=token=$DO_API_TOKEN
35+
```
36+
2637
- Apply RBAC and Deployment:
2738

2839
```bash
29-
kubectl apply -f config/rbac.yaml
30-
kubectl apply -f config/deployment.yaml
40+
kubectl apply -f https://raw.githubusercontent.com/jkpe/doks-lb-scale/refs/heads/main/config/rbac.yaml
41+
kubectl apply -f https://raw.githubusercontent.com/jkpe/doks-lb-scale/refs/heads/main/config/deployment.yaml
3142
```
3243

33-
Set the Prometheus URL via the `--prom-url` flag or `PROMETHEUS_URL` env var. The provided deployment sets `PROMETHEUS_URL` to `http://ingress-nginx-controller-metrics:9090` by default; adjust to your cluster.
44+
### Configuration Options
45+
46+
The controller supports two metrics sources:
47+
48+
1. **DigitalOcean API** (default): Set `DO_API_TOKEN` environment variable or `--do-token` flag
49+
2. **Prometheus**: Set `PROMETHEUS_URL` environment variable or `--prom-url` flag
50+
51+
You can configure both sources simultaneously - the controller will route requests based on the metric prefix.
3452

3553
## Required annotations
3654

3755
- `kubernetes.digitalocean.com/load-balancer-id`: the DO LB ID.
38-
- `doks-lb-scale/metric`: the metric to use. Must be a Prometheus query prefixed with `promql:`.
39-
- `doks-lb-scale/target-per-node`: REQUIRED: `req=<int>` (requests per second per node target)
40-
41-
Only HTTP/ingress metrics are supported.
56+
- `doks-lb-scale/metric`: the metric to use:
57+
- **DO API metrics**: Direct metric names (e.g., `nlb_tcp_network_throughput`, `requests_per_second`)
58+
- **Prometheus metrics**: Must be prefixed with `promql:` (e.g., `promql:sum(rate(nginx_ingress_controller_requests[1m]))`)
59+
- `doks-lb-scale/target-per-node`: REQUIRED:
60+
- `req=<int>` for request-based metrics (HTTP requests, ingress metrics)
61+
- `nlb=<int>` for NLB throughput metrics (Mbps)
4262

4363
Optional annotations:
4464
- `doks-lb-scale/hysteresis-percent`: default `20`.
@@ -47,7 +67,9 @@ Optional annotations:
4767
- `doks-lb-scale/scale-down-delay-minutes`: optional. If set to a positive integer, delays any scale-down by the specified number of minutes. The controller first sets a not-before timestamp and only applies the scale-down once that time has passed. Scaling up clears any pending delay.
4868
- `service.beta.kubernetes.io/do-loadbalancer-size-unit`: set by controller.
4969

50-
## Example Service (HTTP requests)
70+
## Example Services
71+
72+
### Example 1: Prometheus Metrics (HTTP Requests)
5173

5274
```yaml
5375
apiVersion: v1
@@ -56,6 +78,7 @@ metadata:
5678
name: nginx
5779
annotations:
5880
kubernetes.digitalocean.com/load-balancer-id: "your-load-balancer-id"
81+
service.beta.kubernetes.io/do-loadbalancer-type: "REGIONAL" # DigitalOcean HTTP Load Balancer
5982
service.beta.kubernetes.io/do-loadbalancer-size-unit: "1"
6083
doks-lb-scale/metric: "promql:sum(rate(nginx_ingress_controller_requests{ingress!=\"\",status!=\"\"}[1m]))"
6184
doks-lb-scale/target-per-node: "req=8000" # requests per node
@@ -71,6 +94,56 @@ spec:
7194
targetPort: 80
7295
```
7396
97+
### Example 2: DigitalOcean API Metrics (Network Load Balancer Throughput)
98+
99+
```yaml
100+
apiVersion: v1
101+
kind: Service
102+
metadata:
103+
name: nginx
104+
annotations:
105+
kubernetes.digitalocean.com/load-balancer-id: "your-load-balancer-id"
106+
service.beta.kubernetes.io/do-loadbalancer-type: "REGIONAL_NETWORK" # DigitalOcean Network Load Balancer
107+
service.beta.kubernetes.io/do-loadbalancer-size-unit: "1"
108+
doks-lb-scale/metric: "nlb_tcp_network_throughput"
109+
doks-lb-scale/target-per-node: "nlb=45" # Mbps per node
110+
doks-lb-scale/hysteresis-percent: "20"
111+
doks-lb-scale/min-nodes: "1"
112+
doks-lb-scale/max-nodes: "50"
113+
spec:
114+
type: LoadBalancer
115+
selector:
116+
app: nginx
117+
ports:
118+
- port: 80
119+
targetPort: 80
120+
```
121+
122+
### Example 3: DigitalOcean API Metrics (Requests per Second)
123+
124+
```yaml
125+
apiVersion: v1
126+
kind: Service
127+
metadata:
128+
name: nginx
129+
annotations:
130+
kubernetes.digitalocean.com/load-balancer-id: "your-load-balancer-id"
131+
service.beta.kubernetes.io/do-loadbalancer-type: "REGIONAL" # DigitalOcean HTTP Load Balancer
132+
service.beta.kubernetes.io/do-loadbalancer-size-unit: "1"
133+
doks-lb-scale/metric: "requests_per_second"
134+
doks-lb-scale/target-per-node: "req=8000" # requests per second per node
135+
doks-lb-scale/hysteresis-percent: "20"
136+
doks-lb-scale/min-nodes: "1"
137+
doks-lb-scale/max-nodes: "50"
138+
spec:
139+
type: LoadBalancer
140+
selector:
141+
app: nginx
142+
ports:
143+
- port: 80
144+
targetPort: 80
145+
```
146+
74147
## Example ingress-nginx Helm values
75148
76149
Use the following Helm values to deploy `ingress-nginx` with a `LoadBalancer` Service, metrics enabled for Prometheus scraping, and the required annotations for doks-lb-scale to manage the Load Balancer size:
@@ -82,6 +155,7 @@ controller:
82155
type: LoadBalancer
83156
annotations:
84157
kubernetes.digitalocean.com/load-balancer-id: "your-load-balancer-id"
158+
service.beta.kubernetes.io/do-loadbalancer-type: "REGIONAL"
85159
doks-lb-scale/metric: "promql:sum(rate(nginx_ingress_controller_requests{ingress!=\"\",status!=\"\"}[1m]))"
86160
doks-lb-scale/target-per-node: "req=8000"
87161
doks-lb-scale/hysteresis-percent: "20"
@@ -98,12 +172,28 @@ controller:
98172
prometheus.io/scrape: "true"
99173
```
100174

101-
Pair this with the Prometheus-based example in the previous section (using `promql:sum(rate(nginx_ingress_controller_requests{ingress!="",status!=""}[1m]))`).
175+
## Metric Categories
176+
177+
The controller supports two metric categories:
178+
179+
### Request-based metrics (`req=INT`)
180+
- **DO API**: `requests_per_second`, `http_requests_per_second`
181+
- **Prometheus**: Any custom query prefixed with `promql:`
182+
- **Use case**: HTTP/ingress traffic scaling
183+
184+
### NLB Throughput metrics (`nlb=INT`)
185+
- **DO API**: `nlb_tcp_network_throughput`, `nlb_udp_network_throughput`
186+
- **Prometheus**: Not supported for NLB metrics
187+
- **Use case**: Network load balancer throughput scaling
188+
189+
The controller automatically detects the metric category and validates that the target configuration matches.
102190

103191
## Notes
104192

105-
- The controller performs a Prometheus instant query via `/api/v1/query?query=...` and uses the value from the first result.
106-
- Up to date LB service annotations: [DigitalOcean CCM annotations](https://github.com/digitalocean/digitalocean-cloud-controller-manager/blob/master/docs/controllers/services/annotations.md)
193+
- **DO API metrics**: The controller performs a direct API call to DigitalOcean's monitoring endpoint.
194+
- **Prometheus metrics**: The controller performs a Prometheus instant query via `/api/v1/query?query=...` and uses the value from the first result.
195+
- For up-to-date LB service annotations, see [DigitalOcean CCM annotations](https://github.com/digitalocean/digitalocean-cloud-controller-manager/blob/master/docs/controllers/services/annotations.md).
196+
- For documented DigitalOcean Load Balancer node limits and scaling details, see the [DigitalOcean Load Balancer pricing and limits documentation](https://docs.digitalocean.com/products/networking/load-balancers/details/pricing/#regional-load-balancers).
107197

108198
## Hysteresis examples
109199

@@ -117,7 +207,125 @@ If desired is within [lower, upper], nothing changes.
117207
Quick examples:
118208
- current 10, pct 20% → window [8,12]; desired 12 = no change; 13 = scale up; 7 = scale down
119209
- current 5, pct 10% → window [4,5]; desired 4 = no change; 6 = scale up; 3 = scale down
120-
- current 1, pct 20% → window [0,1]; desired 1 = no change; ≥2 = scale up (min-nodes still applies)`
210+
- current 1, pct 20% → window [0,1]; desired 1 = no change; ≥2 = scale up (min-nodes still applies)
211+
212+
## Verifying the Controller is Working
213+
214+
To verify that the doks-lb-scale controller is working properly, check the controller logs and monitor the service annotations.
215+
216+
### Check Controller Logs
217+
218+
View the controller logs to see the reconciliation process:
219+
220+
```bash
221+
kubectl logs -n kube-system deployment/doks-lb-scale-controller -f
222+
```
223+
224+
### Expected Log Output
225+
226+
When the controller starts successfully, you should see:
227+
228+
```log
229+
[2025-08-14 09:39:02] INFO setup → starting manager
230+
[2025-08-14 09:39:02] INFO healthprobe → starting server at [::]:8080
231+
[2025-08-14 09:39:02] INFO leader → attempting to acquire lease: kube-system/doks-lb-scale-controller
232+
[2025-08-14 09:39:17] INFO leader → successfully acquired lease: kube-system/doks-lb-scale-controller
233+
[2025-08-14 09:39:17] INFO service → Starting EventSource (kind: Service)
234+
[2025-08-14 09:39:17] INFO service → Starting Controller (kind: Service)
235+
[2025-08-14 09:39:17] INFO service → Starting workers (count: 1)
236+
```
237+
238+
### Normal Operation Logs
239+
240+
During normal operation, you'll see periodic reconciliation logs:
241+
242+
```log
243+
[2025-08-14 09:39:17] INFO reconcile service=ingress-nginx/ingress-nginx-controller
244+
↳ Reconcile start
245+
↳ Fetching metrics
246+
lbID = 7a016a4b-20cb-4d97-9612-01dd421cea21
247+
metric = promql: sum(rate(nginx_ingress_controller_requests{ingress!="",status!=""}[1m]))
248+
↳ Metrics value
249+
value = 0
250+
↳ Computed desired nodes
251+
current = 2
252+
desired = 1
253+
↳ Within hysteresis window — skipping update
254+
lower = 1
255+
upper = 2
256+
desired = 1
257+
current = 2
258+
```
259+
260+
### Scaling Event Logs
261+
262+
When the controller scales the load balancer, you'll see:
263+
264+
```log
265+
[2025-08-14 09:42:47] INFO reconcile service=ingress-nginx/ingress-nginx-controller
266+
↳ Reconcile start
267+
↳ Fetching metrics
268+
lbID = 7a016a4b-20cb-4d97-9612-01dd421cea21
269+
metric = promql: sum(rate(nginx_ingress_controller_requests{ingress!="",status!=""}[1m]))
270+
↳ Metrics value
271+
value = 2023.3658290843246
272+
↳ Computed desired nodes
273+
current = 2
274+
desired = 3
275+
↳ Updating service size-unit
276+
from = 2
277+
to = 3
278+
↳ Service annotation updated
279+
size-unit = 3
280+
```
281+
282+
### Delayed Scale Down Logs
283+
284+
When using the `doks-lb-scale/scale-down-delay-minutes` annotation, scale-down events are delayed:
285+
286+
```log
287+
[2025-08-14 09:43:32] INFO reconcile service=ingress-nginx/ingress-nginx-controller
288+
↳ Reconcile start
289+
↳ Fetching metrics
290+
lbID = 7a016a4b-20cb-4d97-9612-01dd421cea21
291+
metric = promql: sum(rate(nginx_ingress_controller_requests{ingress!="",status!=""}[1m]))
292+
↳ Metrics value
293+
value = 131.66666666666666
294+
↳ Computed desired nodes
295+
current = 3
296+
desired = 1
297+
↳ Scale down scheduled after delay
298+
delayMinutes = 10
299+
notBefore = 2025-08-14T09:53:32Z
300+
from = 3
301+
to = 1
302+
```
303+
304+
The controller will show the delay being scheduled and then count down the remaining time until the scale-down can occur. If traffic increases during the delay period, the pending scale-down will be cancelled.
305+
306+
### Monitor Service Annotations
307+
308+
Check that the controller is updating the service annotation:
309+
310+
```bash
311+
kubectl get service <your-service-name> -o yaml | grep -A 5 -B 5 "do-loadbalancer-size-unit"
312+
```
313+
314+
You should see the `service.beta.kubernetes.io/do-loadbalancer-size-unit` annotation being updated as the controller scales the load balancer.
315+
316+
### Troubleshooting
317+
318+
If you don't see the expected logs:
319+
320+
1. **Check if the controller is running:**
321+
```bash
322+
kubectl get pods -n kube-system | grep doks-lb-scale
323+
```
324+
325+
2. **Verify the service has the required annotations:**
326+
```bash
327+
kubectl get service <your-service-name> -o yaml | grep -A 10 -B 10 "doks-lb-scale"
328+
```
121329

122330
## Contact
123331

config/deployment.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,16 @@ spec:
2020
imagePullPolicy: Always
2121
args:
2222
- "--verbose=$(DOKS_LB_SCALE_VERBOSE)"
23+
- "--do-token=$(DO_API_TOKEN)"
2324
- "--prom-url=$(PROMETHEUS_URL)"
2425
env:
2526
- name: DOKS_LB_SCALE_VERBOSE
2627
value: "true"
28+
- name: DO_API_TOKEN
29+
valueFrom:
30+
secretKeyRef:
31+
name: doks-lb-scale-secret
32+
key: token
2733
- name: PROMETHEUS_URL
2834
value: "http://kube-prometheus-stack-prometheus.kube-prometheus-stack.svc:9090"
2935
ports:
Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,9 @@ metadata:
44
name: nginx
55
annotations:
66
kubernetes.digitalocean.com/load-balancer-id: "your-load-balancer-id"
7+
service.beta.kubernetes.io/do-loadbalancer-type: "REGIONAL" # DigitalOcean HTTP Load Balancer
78
service.beta.kubernetes.io/do-loadbalancer-size-unit: "1"
8-
# Scale by total requests per second observed by nginx ingress controller
9+
# Scale by total requests per second observed by nginx ingress controller/Prometheus
910
doks-lb-scale/metric: "promql:sum(rate(nginx_ingress_controller_requests{ingress!=\"\",status!=\"\"}[1m]))"
1011
doks-lb-scale/target-per-node: "req=8000"
1112
doks-lb-scale/hysteresis-percent: "20"
@@ -15,7 +16,7 @@ metadata:
1516
spec:
1617
type: LoadBalancer
1718
selector:
18-
app: nginx
19+
app: whoami-service
1920
ports:
2021
- port: 80
2122
targetPort: 80
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
apiVersion: v1
2+
kind: Service
3+
metadata:
4+
name: nginx
5+
annotations:
6+
kubernetes.digitalocean.com/load-balancer-id: "your-load-balancer-id"
7+
service.beta.kubernetes.io/do-loadbalancer-type: "REGIONAL_NETWORK" # DigitalOcean Network Load Balancer
8+
service.beta.kubernetes.io/do-loadbalancer-size-unit: "1"
9+
# Scale by TCP throughput observed by DigitalOcean Monitoring
10+
doks-lb-scale/metric: "nlb_tcp_network_throughput"
11+
doks-lb-scale/target-per-node: "nlb=45"
12+
doks-lb-scale/hysteresis-percent: "20"
13+
doks-lb-scale/min-nodes: "1"
14+
doks-lb-scale/max-nodes: "50"
15+
doks-lb-scale/scale-down-delay-minutes: "10"
16+
spec:
17+
type: LoadBalancer
18+
selector:
19+
app: whoami-service
20+
ports:
21+
- port: 80
22+
targetPort: 80

0 commit comments

Comments
 (0)