When sidecars actually earn their pod slot for TLS, retries, logging, and config sync, and when a plain library beats one.
It was a Wednesday afternoon at the creator-economy platform I worked at, and I was staring at a Rails service with four different HTTP clients pasted into it. Two did retries. One did mTLS. Zero of them logged the way our central pipeline wanted. The fourth was a fork of the third with one line changed and a comment that said do not remove. That afternoon was the first time I really pushed for sidecars, and I want to talk about why.
Sidecars and ambassadors sound clever when you first hear them. A small extra container in the pod that handles networking, auth, telemetry, or config sync. Pulled off well, you stop arguing about which HTTP client library to standardize on. Pulled off poorly, you add a second process to every pod that nobody on call understands.
Use a sidecar when the concern is cross-cutting, polyglot, and operationally owned by a different team than the app. Otherwise, use a library and go to lunch.
The rest is why.
The honest list, after years running them on AWS EKS across thousands of pods at the creator platform: TLS termination and mTLS, retries with jitter, circuit breaking, structured outbound request logging, tracing headers, dynamic config refresh, selective traffic mirroring. Roughly the Envoy feature set. The app speaks plain HTTP to localhost. The sidecar speaks production protocol to the world.
Here’s the pod manifest we’d run for an internal Node.js service that needed mTLS to a billing system, with Envoy as the sidecar.
apiVersion: apps/v1
kind: Deployment
metadata:
name: subscription-api
spec:
replicas: 3
selector:
matchLabels: { app: subscription-api }
template:
metadata:
labels: { app: subscription-api }
spec:
containers:
- name: app
image: registry/internal/subscription-api@sha256:abc123
env:
- name: BILLING_URL
value: "http://127.0.0.1:15001"
ports:
- containerPort: 8080
readinessProbe:
httpGet: { path: /healthz, port: 8080 }
- name: envoy
image: envoyproxy/envoy:v1.29.0
args: ["-c", "/etc/envoy/envoy.yaml", "--service-cluster", "subscription-api"]
volumeMounts:
- { name: envoy-config, mountPath: /etc/envoy }
- { name: certs, mountPath: /etc/certs, readOnly: true }
ports:
- { containerPort: 15001, name: outbound }
- { containerPort: 9901, name: admin }
volumes:
- name: envoy-config
configMap: { name: subscription-api-envoy }
- name: certs
secret: { secretName: billing-mtls-client }
The app stays boring. It calls http://127.0.0.1:15001/charges and the sidecar does mTLS, retries, timeout. The boring is the whole point.
A trimmed Envoy outbound config for that same pod, with retries and a circuit breaker:
static_resources:
listeners:
- name: billing_listener
address:
socket_address: { address: 0.0.0.0, port_value: 15001 }
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: billing
route_config:
virtual_hosts:
- name: billing
domains: ["*"]
routes:
- match: { prefix: "/" }
route:
cluster: billing_upstream
retry_policy:
retry_on: "5xx,gateway-error,reset"
num_retries: 3
per_try_timeout: 1.5s
retry_back_off:
base_interval: 0.05s
max_interval: 1s
http_filters:
- name: envoy.filters.http.router
clusters:
- name: billing_upstream
type: STRICT_DNS
connect_timeout: 0.25s
circuit_breakers:
thresholds:
- max_connections: 200
max_pending_requests: 100
max_retries: 50
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
common_tls_context:
tls_certificates:
- certificate_chain: { filename: /etc/certs/tls.crt }
private_key: { filename: /etc/certs/tls.key }
load_assignment:
cluster_name: billing_upstream
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address: { address: billing.internal, port_value: 443 }
Argue the retry budget if you like. You can’t argue pasting that logic into every service’s HTTP client is fun.
Dapr is the other one I’ve actually run, on a Node.js side product I CTO. It’s a sidecar too, but the contract is different. Instead of a generic L7 proxy, Dapr exposes a small HTTP/gRPC API for pub/sub, state, secrets, service-to-service calls. Your app calls http://localhost:3500/v1.0/publish/... and Dapr decides whether that goes to Redis, Kafka, RabbitMQ, or AWS SNS based on a YAML file.
import got from 'got';
const DAPR_URL = `http://127.0.0.1:${process.env.DAPR_HTTP_PORT ?? 3500}`;
export async function publishOrderCreated(event: {
orderId: string;
customerId: string;
totalCents: number;
}) {
try {
await got.post(`${DAPR_URL}/v1.0/publish/orders-pubsub/orders.created`, {
json: event,
timeout: { request: 2000 },
retry: { limit: 0 },
});
} catch (err) {
// Dapr already retried per its component config. If we got here, log
// and let the outbox table catch the failure on the next sweep.
logger.warn({ err, orderId: event.orderId }, 'dapr publish failed');
throw err;
}
}
What I like about Dapr is that swapping the broker is a YAML change, not a code change. What I don’t like is that you’ve added a process to the pod whose semantics your app team doesn’t fully own. When Dapr’s sidecar restarts mid-publish, your 200 OK from localhost does not mean the message landed in Kafka. People learn that the hard way.
Kubernetes pods are how this pattern lives in practice. Same network namespace, same lifecycle, separate processes. Anything in the pod shares fate with the app. That word matters more than people think.
If the sidecar OOMs, your app’s request fails. If the sidecar’s readiness fails, your app gets no traffic. If the sidecar leaks file descriptors, your pod gets evicted. Operating sidecars at scale means treating the sidecar as part of the app’s SLO.
We learned this when a routine Envoy config push had a typo, the sidecar refused to start, and a few hundred pods cycled into CrashLoopBackOff. The platform team’s rollout assumed the sidecar version was independent of the app. It wasn’t. We added a pre-rollout config validator the next week.
Sidecars are good at cross-cutting concerns. They’re useless when the root cause is inside the process you’ve put them next to.
Single-language shop, small team, no platform team. You’ll spend less time writing a thin HttpClient wrapper with retries, timeouts, and structured logging than you will operating Envoy. Most of the value of a sidecar comes from polyglot or platform-team-owned scenarios. One runtime, one team, just write the library.
On a community and talent product I CTO on the side, the whole backend is NestJS. One HttpClientService with axios under the hood, retry interceptors, tracing header injector, Datadog log lines. No sidecar. One container, one process. Done.
Thanks for reading. If you’ve got thoughts, send them my way.