How I run multiple NestJS services in an Nx monorepo with shared libs, affected builds, tag-enforced boundaries, and per-app deploys, plus the production incidents that shaped the setup.
A Tuesday afternoon, three NestJS services in one repo, and a PR that touched a “shared” helper. CI green. Deploy green. Half an hour later one consumer was running yesterday’s binary because nobody redeployed it after the helper changed. Sound familiar. The helper wasn’t really shared. It was three drifting copies.
That’s the version of the problem an Nx monorepo solves, if you set it up like you mean it. I’ve run NestJS monorepos like this at a community and talent product I CTO on the side, and at a logistics-sector hiring platform I CTO on the side. Same shape both times: a few Nest apps, a thick layer of shared libs, and a CI that only rebuilds and redeploys what the diff touches.
Here’s the setup, and the parts I had to learn the hard way.
pnpm workspaces give you hoisting and a tidy pnpm-lock.yaml. They do not give you a dependency graph, affected detection, or remote caching. For two services and one lib you don’t need any of that. For five or six Nest apps and a couple dozen libs, you absolutely do.
The fastest payoff is nx affected. CI doesn’t lint, test, build, or deploy anything you didn’t touch. Changed one route in the billing service, only billing’s pipeline runs. I’ve seen PR feedback drop from 14 minutes to under 3 just by switching from “build everything” to affected. Boring win, but you feel it every day.
The less-boring win is module boundary enforcement. More on that below.
apps/
api-gateway/
billing-service/
notifications-service/
libs/
shared/
config/
logging/
errors/
billing/
domain/
data-access/
feature-invoices/
notifications/
domain/
data-access/
nx.json
tsconfig.base.json
Each apps/* is a Nest app with its own main.ts, app.module.ts, and project.json. Each libs/* is a TS library Nest imports the same way it imports @nestjs/common. Path aliases in tsconfig.base.json make it boring:
{
"compilerOptions": {
"target": "ES2022",
"module": "commonjs",
"moduleResolution": "node",
"strict": true,
"experimentalDecorators": true,
"emitDecoratorMetadata": true,
"baseUrl": ".",
"paths": {
"@org/shared/config": ["libs/shared/config/src/index.ts"],
"@org/shared/logging": ["libs/shared/logging/src/index.ts"],
"@org/shared/errors": ["libs/shared/errors/src/index.ts"],
"@org/billing/domain": ["libs/billing/domain/src/index.ts"],
"@org/billing/data-access": ["libs/billing/data-access/src/index.ts"],
"@org/billing/feature-invoices": ["libs/billing/feature-invoices/src/index.ts"],
"@org/notifications/domain": ["libs/notifications/domain/src/index.ts"]
}
}
}
Two rules I won’t bend on. Apps never import from other apps. If api-gateway needs something from billing-service, that something belongs in a lib. And libs are typed in layers: domain knows nothing, data-access knows domain, feature-* knows both. Tags enforce both at lint time so I don’t have to police it in PR reviews.
Nx tags are the part I wish I’d known about earlier. Each project declares what it is (type:domain, type:data-access, type:feature, type:app) and what context it belongs to (scope:billing, scope:notifications, scope:shared). Then a single ESLint rule in eslint.config.mjs decides who’s allowed to import who.
{
"name": "billing-domain",
"sourceRoot": "libs/billing/domain/src",
"projectType": "library",
"tags": ["type:domain", "scope:billing"],
"targets": {
"lint": { "executor": "@nx/eslint:lint" },
"test": { "executor": "@nx/jest:jest", "options": { "jestConfig": "libs/billing/domain/jest.config.ts" } }
}
}
// eslint.config.mjs (excerpt)
import nx from '@nx/eslint-plugin';
export default [
...nx.configs['flat/typescript'],
{
files: ['**/*.ts'],
rules: {
'@nx/enforce-module-boundaries': [
'error',
{
enforceBuildableLibDependency: true,
allow: [],
depConstraints: [
{ sourceTag: 'type:app', onlyDependOnLibsWithTags: ['type:feature', 'type:data-access', 'type:domain', 'scope:shared'] },
{ sourceTag: 'type:feature', onlyDependOnLibsWithTags: ['type:data-access', 'type:domain', 'scope:shared'] },
{ sourceTag: 'type:data-access', onlyDependOnLibsWithTags: ['type:domain', 'scope:shared'] },
{ sourceTag: 'type:domain', onlyDependOnLibsWithTags: ['type:domain', 'scope:shared'] },
{ sourceTag: 'scope:billing', onlyDependOnLibsWithTags: ['scope:billing', 'scope:shared'] },
{ sourceTag: 'scope:notifications', onlyDependOnLibsWithTags: ['scope:notifications', 'scope:shared'] }
]
}
]
}
}
];
This is the rule that pays you back forever. A junior on my team once tried to import @org/billing/data-access from notifications-service. Lint failed in the IDE before they hit save. We had a 5-minute conversation about why, and that was the last time anyone tried it.
The app itself looks like a normal Nest app. Boring on purpose.
import { Module } from '@nestjs/common';
import { ConfigModule } from '@org/shared/config';
import { LoggingModule } from '@org/shared/logging';
import { InvoicesFeatureModule } from '@org/billing/feature-invoices';
import { TerminusModule } from '@nestjs/terminus';
import { HealthController } from './health.controller';
@Module({
imports: [
ConfigModule.forRoot({ schema: BillingEnvSchema }),
LoggingModule.forRoot({ service: 'billing-service' }),
TerminusModule,
InvoicesFeatureModule,
],
controllers: [HealthController],
})
export class AppModule {}
// apps/billing-service/src/main.ts
import { NestFactory } from '@nestjs/core';
import { AppModule } from './app/app.module';
import { ValidationPipe } from '@nestjs/common';
import { getLogger } from '@org/shared/logging';
async function bootstrap() {
const app = await NestFactory.create(AppModule, { logger: getLogger() });
app.useGlobalPipes(new ValidationPipe({ whitelist: true, transform: true }));
app.enableShutdownHooks();
await app.listen(process.env.PORT ?? 3000);
}
bootstrap().catch((err) => {
// crash loud, let the orchestrator decide what to do
console.error(err);
process.exit(1);
});
The @org/shared/config lib is one of the few places I’m strict. Every env var goes through a Zod schema and the app refuses to boot if anything is missing or malformed. I’ve lost too many hours to a service that started fine then failed at the first request because REDIS_URL was an empty string.
The CI that pays for the whole thing in a week.
# .github/workflows/ci.yml (excerpt)
name: ci
on:
pull_request:
push:
branches: [main]
jobs:
affected:
runs-on: ubuntu-latest
env:
NX_CLOUD_ACCESS_TOKEN: ${{ secrets.NX_CLOUD_ACCESS_TOKEN }}
NX_BRANCH: ${{ github.event.pull_request.head.ref || github.ref_name }}
steps:
- uses: actions/checkout@v4
with: { fetch-depth: 0 }
- uses: pnpm/action-setup@v4
- uses: actions/setup-node@v4
with: { node-version: 20, cache: 'pnpm' }
- run: pnpm install --frozen-lockfile
- run: pnpm nx-cloud start-ci-run --stop-agents-after=build
- run: pnpm nx affected -t lint test build --parallel=3
- run: pnpm nx affected -t docker-build --parallel=2
Nx Cloud’s remote cache is what actually moves the needle. The first PR touching libs/shared/logging rebuilds every app depending on it. Every PR after that, for branches that didn’t touch logging, pulls prebuilt artifacts from the cache. Build step drops from minutes to seconds. The same cache hits locally too, which is the better magic.
Per-app deploy is whichever app shows up in nx affected --plan=deploy. A small script reads that list, builds and pushes the docker image for each affected app, and triggers the matching Argo CD app. Unchanged apps don’t redeploy. Sounds obvious. It is not what most teams actually do.
Setting. A federation tournament platform I CTO’d in London, hundreds of microservices, Kafka as the async backbone, Saturday afternoon, a live combat-sports broadcast going out publicly. The standings-projector consumer ran across six pods.
What went wrong. The group started rebalancing every 30 seconds or so. The page froze at 14:32 local. Three PagerDuty pages in two minutes.
First wrong fix. Rolling restart of the deployment. Consumers re-joined cleanly, then triggered another rebalance about 40 seconds later. I was doing the same dance the group was already doing on its own.
Real fix. One pod, out of six, was running a different container image. Someone had pushed a config-touching fix without bumping the image tag and the deployment had pulled :latest. Its max.poll.interval.ms was 60 seconds against the others’ 300, and a downstream call inside its handler occasionally took 70. Cordoned the bad pod, group stabilized in 90 seconds. Patched it properly over the weekend by pinning image SHAs on every Kafka-touching deployment.
Cost. 12 minutes of stale standings during a live broadcast. The federation was understanding. The commentators were less so.
The Nx monorepo version of that foot-gun is per-app images that aren’t pinned to a content-addressable tag. Every affected app deploy in my pipeline now stamps ${gitSha}-${nxProjectHash} into the image tag. The Nx project hash includes the hash of every lib that app transitively depends on. Shared lib changed, hash changes, image tag changes, deploy is a different image. No two pods of the same Deployment can ever run different code by accident.
nx affected plus Nx Cloud remote cache is the real ROI of going Nx. Set it up on day one, not day ninety.domain, data-access, feature, app) and scopes (scope:billing, etc.) with enforce-module-boundaries. The IDE will catch what your code review will miss.${gitSha}-${nxProjectHash}. Never deploy :latest to anything that talks to a Kafka group.nx.json inputs like schema migrations. The build cache is keyed on what you declare in nx.json — change those inputs without care and you lose cache validity.Thanks for reading. If you’ve got thoughts, send them my way.