Why I default to one database per service in production, with TypeScript code for event-carried state transfer, a CQRS read model, and hard lessons from production.
After running hundreds of microservices across a couple of platforms, my default is hard: every service owns its database, nobody else reads it, and if you need that data somewhere else you copy it via events. No shared schemas. No “let’s just join across services for this one query”. You think it’s just one query. It’s never just one query.
A service owns its database when no other process, ever, opens a connection to that database. Not for reporting. Not for “a quick analytics pull”. Not for a one-off migration script. The instant another service has the credentials, your boundary is gone. The DBA can see the table, sure. The other service cannot.
That sounds extreme until you’ve lived through the alternative. I have. A few years back on a Rails monolith hitting Aurora PostgreSQL, we let a “reporting” service query users directly because it was read-only and “we’d swap it later.” Six months later, “later” arrived as a schema migration on users. The reporting service had three queries hard-coded against a column we needed to rename. Renaming meant coordinating a deploy across two repos owned by two teams, on a Friday afternoon, because the data team had a Monday morning board deck. We picked Friday. I deployed on Friday. I know.
The fix is structural. The service that owns users exposes either an API or a stream of events. Other services consume that. They never touch the table.
Once you accept that boundary, the next question lands fast: what about all the joins? The answer most teams stumble onto is event-carried state transfer. When something changes, the owning service emits an event carrying enough state for downstream services to do their work without calling back.
import { Kafka, logLevel, CompressionTypes } from 'kafkajs';
import { v4 as uuid } from 'uuid';
const kafka = new Kafka({
clientId: 'orders-service',
brokers: process.env.KAFKA_BROKERS!.split(','),
logLevel: logLevel.INFO,
});
const producer = kafka.producer({
idempotent: true,
allowAutoTopicCreation: false,
});
type OrderConfirmedEvent = {
eventId: string;
schemaVersion: 2;
occurredAt: string;
order: {
id: string;
customerId: string;
customerEmail: string;
customerName: string;
currency: 'USD' | 'EUR' | 'GBP';
totalCents: number;
items: Array<{ sku: string; qty: number; priceCents: number }>;
};
};
export async function publishOrderConfirmed(order: ConfirmedOrder) {
const event: OrderConfirmedEvent = {
eventId: uuid(),
schemaVersion: 2,
occurredAt: new Date().toISOString(),
order: {
id: order.id,
customerId: order.customer.id,
// duplicated on purpose so downstream doesn't call us
customerEmail: order.customer.email,
customerName: order.customer.name,
currency: order.currency,
totalCents: order.totalCents,
items: order.items.map((i) => ({
sku: i.sku,
qty: i.qty,
priceCents: i.priceCents,
})),
},
};
await producer.send({
topic: 'order.confirmed.v2',
compression: CompressionTypes.GZIP,
messages: [
{
key: order.id,
value: JSON.stringify(event),
headers: {
'x-event-id': event.eventId,
'x-trace-id': order.traceId,
},
},
],
});
}
Notice the duplication. customerEmail and customerName live in the customers service. They’re also baked into the order event. Downstream services keep a copy. If the customer renames themselves, a customer.profile.updated event flows out and downstream services update their local copies. Yes, the data is “duplicated”. That’s the trade you’re making. You’re paying storage and a bit of eventual consistency to buy independence. I’ll take that trade every time.
So how do you actually answer “show me all orders for customers in Germany who bought a yoga mat in the last 30 days”? You don’t run that query across three services. You build a read model.
import { Consumer, EachMessagePayload, Kafka } from 'kafkajs';
import { Pool } from 'pg';
const kafka = new Kafka({
clientId: 'order-search-projector',
brokers: process.env.KAFKA_BROKERS!.split(','),
});
const consumer: Consumer = kafka.consumer({
groupId: 'order-search-projector-v3',
sessionTimeout: 30_000,
heartbeatInterval: 3_000,
maxWaitTimeInMs: 500,
});
const pg = new Pool({ connectionString: process.env.READMODEL_DSN });
async function handleOrderConfirmed({ message }: EachMessagePayload) {
const event = JSON.parse(message.value!.toString());
if (event.schemaVersion < 2) return; // ignore old shape
const { order } = event;
await pg.query(
`insert into order_search
(order_id, customer_id, customer_country, total_cents, items, confirmed_at)
values ($1, $2, $3, $4, $5, $6)
on conflict (order_id) do update set
customer_country = excluded.customer_country,
total_cents = excluded.total_cents,
items = excluded.items`,
[
order.id,
order.customerId,
order.customerCountry ?? 'unknown',
order.totalCents,
JSON.stringify(order.items),
event.occurredAt,
],
);
}
async function handleCustomerCountryChanged({ message }: EachMessagePayload) {
const event = JSON.parse(message.value!.toString());
await pg.query(
`update order_search set customer_country = $2 where customer_id = $1`,
[event.customerId, event.country],
);
}
export async function start() {
await consumer.connect();
await consumer.subscribe({ topic: 'order.confirmed.v2' });
await consumer.subscribe({ topic: 'customer.country.changed.v1' });
await consumer.run({
eachMessage: async (payload) => {
const topic = payload.topic;
if (topic === 'order.confirmed.v2') return handleOrderConfirmed(payload);
if (topic === 'customer.country.changed.v1') return handleCustomerCountryChanged(payload);
},
});
}
That projector owns its own PostgreSQL database. It listens to a handful of topics, denormalizes into a order_search table tuned for the actual query, and that’s the surface other services hit. CQRS in the small. The write side stays normalized inside the owning service. The read side is whatever shape the question needs.
One thing to measure on any projector: freshness, not throughput. A consumer that is happily consuming Kafka but silently failing to write to the read store looks healthy on every lag dashboard. The question is not “is the consumer consuming” — it is “is the data in the read model current.” Those are different questions with different alert conditions.
Most teams aren’t greenfield. You have a shared database and you want out. Don’t try to split everything at once. The move I’ve run twice now is a staged one.
First, draw the boundary in code. Every service gets its own ORM models for the tables it considers part of its bounded context. No more cross-context joins in code, even though the tables still live in one database. Enforce it with a CI check.
# .github/workflows/db-boundaries.yml
name: db-boundaries
on: [pull_request]
jobs:
check-cross-context-queries:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: |
set -e
# any SQL JOIN across two bounded-context schemas fails CI
if rg -nP --type ts 'from\s+(orders|customers|catalog)\.\w+\s+join\s+(orders|customers|catalog)\.\w+' src; then
echo "cross-context join detected"
exit 1
fi
# any repository file importing another context's entity fails CI
if rg -nP --type ts "from\s+'\.\./\.\./(orders|customers|catalog)/entities/" src/contexts; then
echo "cross-context entity import detected"
exit 1
fi
Second, give each context its own schema in the same database, and route writes through the owning service’s API. Reads can still hit the database directly during the transition, but you’re closing one half of the leak at a time. Third, stand up the new service and run dual-write for a couple of weeks: the old shared schema gets written, the new service’s database gets written, both via the API. Compare them in a reconciliation job. Once they match for long enough, flip the reads. Then cut the dual-write. Then drop the old schema.
Sounds slow. It is. It’s also the only way I’ve seen it not melt down halfway through.
A few honest costs. Eventual consistency is real, not theoretical. A customer who just updated their email and then placed an order will sometimes see their old email on the receipt for a second or two. Your product people will ask why. The answer is “you bought independence between services and this is the price of admission”, and they’ll mostly accept it, but you have to be willing to have that conversation.
Schema migrations get harder when you own the database. There is no shared DBA to absorb the blast radius. The service team owns the migration locks, the downtime windows, and the rollback plan. Owning your data means owning the migrations against it.
Thanks for reading. If you’ve got thoughts, send them my way.