Sidecar and Ambassador Patterns

When sidecars actually earn their pod slot for TLS, retries, logging, and config sync, and when a plain library beats one.

It was a Wednesday afternoon at the creator-economy platform I worked at, and I was staring at a Rails service with four different HTTP clients pasted into it. Two did retries. One did mTLS. Zero of them logged the way our central pipeline wanted. The fourth was a fork of the third with one line changed and a comment that said do not remove. That afternoon was the first time I really pushed for sidecars, and I want to talk about why.

Sidecars and ambassadors sound clever when you first hear them. A small extra container in the pod that handles networking, auth, telemetry, or config sync. Pulled off well, you stop arguing about which HTTP client library to standardize on. Pulled off poorly, you add a second process to every pod that nobody on call understands.

My one-line stance on sidecars

Use a sidecar when the concern is cross-cutting, polyglot, and operationally owned by a different team than the app. Otherwise, use a library and go to lunch.

The rest is why.

What goes in the sidecar

The honest list, after years running them on AWS EKS across thousands of pods at the creator platform: TLS termination and mTLS, retries with jitter, circuit breaking, structured outbound request logging, tracing headers, dynamic config refresh, selective traffic mirroring. Roughly the Envoy feature set. The app speaks plain HTTP to localhost. The sidecar speaks production protocol to the world.

Here’s the pod manifest we’d run for an internal Node.js service that needed mTLS to a billing system, with Envoy as the sidecar.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: subscription-api
spec:
  replicas: 3
  selector:
    matchLabels: { app: subscription-api }
  template:
    metadata:
      labels: { app: subscription-api }
    spec:
      containers:
        - name: app
          image: registry/internal/subscription-api@sha256:abc123
          env:
            - name: BILLING_URL
              value: "http://127.0.0.1:15001"
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet: { path: /healthz, port: 8080 }
        - name: envoy
          image: envoyproxy/envoy:v1.29.0
          args: ["-c", "/etc/envoy/envoy.yaml", "--service-cluster", "subscription-api"]
          volumeMounts:
            - { name: envoy-config, mountPath: /etc/envoy }
            - { name: certs, mountPath: /etc/certs, readOnly: true }
          ports:
            - { containerPort: 15001, name: outbound }
            - { containerPort: 9901, name: admin }
      volumes:
        - name: envoy-config
          configMap: { name: subscription-api-envoy }
        - name: certs
          secret: { secretName: billing-mtls-client }

The app stays boring. It calls http://127.0.0.1:15001/charges and the sidecar does mTLS, retries, timeout. The boring is the whole point.

A trimmed Envoy outbound config for that same pod, with retries and a circuit breaker:

static_resources:
  listeners:
    - name: billing_listener
      address:
        socket_address: { address: 0.0.0.0, port_value: 15001 }
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: billing
                route_config:
                  virtual_hosts:
                    - name: billing
                      domains: ["*"]
                      routes:
                        - match: { prefix: "/" }
                          route:
                            cluster: billing_upstream
                            retry_policy:
                              retry_on: "5xx,gateway-error,reset"
                              num_retries: 3
                              per_try_timeout: 1.5s
                              retry_back_off:
                                base_interval: 0.05s
                                max_interval: 1s
                http_filters:
                  - name: envoy.filters.http.router
  clusters:
    - name: billing_upstream
      type: STRICT_DNS
      connect_timeout: 0.25s
      circuit_breakers:
        thresholds:
          - max_connections: 200
            max_pending_requests: 100
            max_retries: 50
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          common_tls_context:
            tls_certificates:
              - certificate_chain: { filename: /etc/certs/tls.crt }
                private_key: { filename: /etc/certs/tls.key }
      load_assignment:
        cluster_name: billing_upstream
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address: { address: billing.internal, port_value: 443 }

Argue the retry budget if you like. You can’t argue pasting that logic into every service’s HTTP client is fun.

Where Dapr fits in this picture

Dapr is the other one I’ve actually run, on a Node.js side product I CTO. It’s a sidecar too, but the contract is different. Instead of a generic L7 proxy, Dapr exposes a small HTTP/gRPC API for pub/sub, state, secrets, service-to-service calls. Your app calls http://localhost:3500/v1.0/publish/... and Dapr decides whether that goes to Redis, Kafka, RabbitMQ, or AWS SNS based on a YAML file.

import got from 'got';

const DAPR_URL = `http://127.0.0.1:${process.env.DAPR_HTTP_PORT ?? 3500}`;

export async function publishOrderCreated(event: {
  orderId: string;
  customerId: string;
  totalCents: number;
}) {
  try {
    await got.post(`${DAPR_URL}/v1.0/publish/orders-pubsub/orders.created`, {
      json: event,
      timeout: { request: 2000 },
      retry: { limit: 0 },
    });
  } catch (err) {
    // Dapr already retried per its component config. If we got here, log
    // and let the outbox table catch the failure on the next sweep.
    logger.warn({ err, orderId: event.orderId }, 'dapr publish failed');
    throw err;
  }
}

What I like about Dapr is that swapping the broker is a YAML change, not a code change. What I don’t like is that you’ve added a process to the pod whose semantics your app team doesn’t fully own. When Dapr’s sidecar restarts mid-publish, your 200 OK from localhost does not mean the message landed in Kafka. People learn that the hard way.

Multi-container pods, the day-to-day

Kubernetes pods are how this pattern lives in practice. Same network namespace, same lifecycle, separate processes. Anything in the pod shares fate with the app. That word matters more than people think.

If the sidecar OOMs, your app’s request fails. If the sidecar’s readiness fails, your app gets no traffic. If the sidecar leaks file descriptors, your pod gets evicted. Operating sidecars at scale means treating the sidecar as part of the app’s SLO.

We learned this when a routine Envoy config push had a typo, the sidecar refused to start, and a few hundred pods cycled into CrashLoopBackOff. The platform team’s rollout assumed the sidecar version was independent of the app. It wasn’t. We added a pre-rollout config validator the next week.

When a sidecar is the wrong call

Sidecars are good at cross-cutting concerns. They’re useless when the root cause is inside the process you’ve put them next to.

When a library actually beats a sidecar

Single-language shop, small team, no platform team. You’ll spend less time writing a thin HttpClient wrapper with retries, timeouts, and structured logging than you will operating Envoy. Most of the value of a sidecar comes from polyglot or platform-team-owned scenarios. One runtime, one team, just write the library.

On a community and talent product I CTO on the side, the whole backend is NestJS. One HttpClientService with axios under the hood, retry interceptors, tracing header injector, Datadog log lines. No sidecar. One container, one process. Done.

Takeaways

Sidecars earn their slot when the concern is cross-cutting, polyglot, or platform-owned. Otherwise write a library.
Envoy is the right answer when you want generic L7 features: mTLS, retries, circuit breaking, traffic shifting.
Dapr is the right answer when you want pluggable infrastructure abstractions and you accept the cost of a process boundary in the publish path.
Treat the sidecar as part of the app’s SLO. Its crash is your crash. Its readiness is your readiness.
A sidecar will not fix a bug whose root cause is inside your app process. Don’t reach for one as a magic wrapper.
Pin image SHAs on every sidecar. Treat sidecar config rollouts like app deploys.

Thanks for reading. If you’ve got thoughts, send them my way.