NestJS Configuration Management

How I run @nestjs/config in production. Typed namespaces, Zod validation that fails fast at boot, AWS Secrets Manager rotation, and the test overrides that keep CI honest.

It was a Saturday afternoon at the combat-sports tournament platform I CTO’d in London. A live tournament was being broadcast publicly, federations watching the standings page, and our standings-projector consumer group started rebalancing every thirty seconds. Five pods had max.poll.interval.ms set to 300s. The sixth pod had 60s. Same deployment, different config. The deployment manifest referenced :latest and one pod had pulled a stale image. The whole group rebalanced because of one number.

That’s the story I tell when someone asks why I’m strict about NestJS configuration. Config isn’t plumbing. Config is the API your runtime exposes to itself, and if you don’t treat it like an API, it bites you on a Saturday during a live broadcast.

This is how I run @nestjs/config on production NestJS services, the way I wish someone had shown me before I got there the hard way.

Typed namespaces, not raw env

The default ConfigService.get<string>('SOME_KEY') is a string-typed escape hatch. It gets you to ship. It also lets a typo in AWS_REGION ship right with you. I use registerAs namespaces and read them through a typed accessor, so the rest of the app never sees process.env.

// src/config/database.config.ts
import { registerAs } from '@nestjs/config';
import { z } from 'zod';

const schema = z.object({
  host: z.string().min(1),
  port: z.coerce.number().int().positive().default(5432),
  username: z.string().min(1),
  password: z.string().min(1),
  database: z.string().min(1),
  poolSize: z.coerce.number().int().min(2).max(200).default(20),
  ssl: z.coerce.boolean().default(true),
});

export type DatabaseConfig = z.infer<typeof schema>;

export default registerAs('database', (): DatabaseConfig => {
  const parsed = schema.safeParse({
    host: process.env.DB_HOST,
    port: process.env.DB_PORT,
    username: process.env.DB_USERNAME,
    password: process.env.DB_PASSWORD,
    database: process.env.DB_NAME,
    poolSize: process.env.DB_POOL_SIZE,
    ssl: process.env.DB_SSL,
  });

  if (!parsed.success) {
    // surface the exact field that failed, then crash. boot must not continue.
    throw new Error(`database config invalid: ${parsed.error.toString()}`);
  }
  return parsed.data;
});

Two things I care about here. First, the schema lives next to the namespace, not in some global validate() function that nobody updates. Second, the namespace throws on construction. NestJS calls this during module init, before any HTTP listener binds. A bad config never reaches the readiness probe.

I split namespaces by domain. database.config.ts, kafka.config.ts, aws.config.ts, auth.config.ts, featureFlags.config.ts. Each one is a small file with one job. When a new dev joins the team, they can read a single namespace and understand exactly which env vars feed it.

Fail fast at boot

If config is wrong, the pod should die. Not log a warning. Not fall back to a default. Die. Kubernetes will notice, the deployment will halt at the canary stage, and you’ll see the bad change in your rollout dashboard before any traffic hits it.

// src/config/config.module.ts
import { Module } from '@nestjs/common';
import { ConfigModule } from '@nestjs/config';
import databaseConfig from './database.config';
import kafkaConfig from './kafka.config';
import awsConfig from './aws.config';
import authConfig from './auth.config';

@Module({
  imports: [
    ConfigModule.forRoot({
      isGlobal: true,
      cache: true,
      load: [databaseConfig, kafkaConfig, awsConfig, authConfig],
      // validation runs per-namespace via Zod. no validationSchema here.
      ignoreEnvFile: process.env.NODE_ENV === 'production',
    }),
  ],
})
export class AppConfigModule {}

The Kafka consumer rebalance loop at the federation platform would have been caught by this. Same deployment manifest, two pods carrying different max.poll.interval.ms because one was running a stale image. If the namespace had been hashed and exposed on a /internal/config-hash endpoint, our health-check would have shown the drift before traffic hit it. We added that endpoint the week after. It’s three lines of code and it has caught two production drifts since.

Secrets belong somewhere else

The first rule I follow: passwords, API tokens, signing keys never live in env vars baked into the deployment. They live in AWS Secrets Manager. The deployment carries a secret ARN. A custom config loader pulls the value at boot, decrypts it via the pod’s IRSA role, and feeds it into the namespace.

// src/config/secrets.loader.ts
import { SecretsManagerClient, GetSecretValueCommand } from '@aws-sdk/client-secrets-manager';

const client = new SecretsManagerClient({ region: process.env.AWS_REGION });
const cache = new Map<string, { value: string; fetchedAt: number }>();
const TTL_MS = 5 * 60 * 1000;

export async function loadSecret(arn: string): Promise<string> {
  const hit = cache.get(arn);
  if (hit && Date.now() - hit.fetchedAt < TTL_MS) return hit.value;

  const res = await client.send(new GetSecretValueCommand({ SecretId: arn }));
  if (!res.SecretString) {
    throw new Error(`secret ${arn} has no SecretString payload`);
  }
  cache.set(arn, { value: res.SecretString, fetchedAt: Date.now() });
  return res.SecretString;
}

// usage in a namespace, called from registerAs
export async function buildAuthConfig() {
  const jwtSigningKey = await loadSecret(process.env.AUTH_JWT_SECRET_ARN!);
  // parse + validate as usual
  return { jwtSigningKey };
}

I use Secrets Manager rotation for database credentials. Aurora’s managed rotation flips the password on a schedule, my cache TTL is short enough that the next refresh picks up the new value, and the pool re-establishes connections with the new credential within the next minute. Vault works the same way if you’re not on AWS. The point is: rotation must not require a redeploy.

The other rule. A secret never appears in a log line, never in a Datadog tag, never in a Sentry breadcrumb. The Zod namespace marks secret fields and the redactSecretsInLogs() interceptor strips them out before any structured-log middleware runs.

One env per environment

I keep environment isolation strict. local, ci, staging, production. No shared secrets across them. No shared databases.

Production config and emergency config are two different surfaces. The first is what your service boots with. The second is what you push when prod is on fire. They should never share a code path. NestJS is fine for the first. For the second, I use a separate channel: an S3-backed JSON document, signed, fetched on a 30-second interval by every gateway pod, applied without a restart. The remote-config namespace in NestJS is a thin client that polls and validates.

Test overrides without monkey-patching

The thing that breaks teams is the temptation to mutate process.env in tests. Don’t. Override the namespace at module-build time.

// test/order.module.spec.ts
import { Test } from '@nestjs/testing';
import { ConfigModule } from '@nestjs/config';
import databaseConfig from '../src/config/database.config';
import { OrderModule } from '../src/order/order.module';

describe('OrderModule', () => {
  it('boots with overridden database config', async () => {
    const moduleRef = await Test.createTestingModule({
      imports: [
        ConfigModule.forFeature(databaseConfig),
        OrderModule,
      ],
    })
      .overrideProvider(databaseConfig.KEY)
      .useValue({
        host: 'localhost',
        port: 5432,
        username: 'test',
        password: 'test',
        database: 'orders_test',
        poolSize: 5,
        ssl: false,
      })
      .compile();

    const app = moduleRef.createNestApplication();
    await app.init();
    expect(app).toBeDefined();
    await app.close();
  });
});

Each test gets the namespace it needs, no others. No process.env.DB_HOST = 'localhost' lines that leak across test files. CI runs the same module init that production runs.

Takeaways

Typed namespaces with Zod, one per domain. No raw process.env in feature code.
Validate at boot. A bad config crashes the pod before traffic arrives.
Secrets live in Secrets Manager or Vault. Rotation must not need a redeploy.
Expose a /internal/config-hash endpoint. Config drift between pods is real.
Production config and emergency config are different surfaces. Don’t share a code path.
Override namespaces in tests with overrideProvider. Never mutate process.env.

Thanks for reading. If you’ve got thoughts, send them my way.