Senior Full Stack Software Engineer
With 10+ years of experience, I'm a Software Engineer who loves building scalable, high-performance systems and driving innovation through modern technologies. I have designed and built systems that reliably handle billions of requests. I specialize in backend heavy fullstack development, building scalable APIs and services while delivering high-quality web applications. I have the capability to produce quality, production-ready applications by correctly using and properly orchestrating AI tools. I enjoy collaborating across teams and taking ownership of complex projects that push boundaries.
Tech stack
Dec 2025 - Present
Mar 2023 - Jan 2026
Aug 2025 - Present
Jan 2023 - Present
Dec 2020 - Mar 2023
Feb 2021 - Aug 2024
Jan 2020 - Dec 2020
Aug 2016 - Oct 2019
Mar 2014 - Jun 2016
Feb 2012 - Mar 2014
Latest articles and technical writing.
Hard-earned notes on shipping WCAG 2.1 AA across a creator platform: semantic HTML over ARIA, focus management that survives SPA navigation, and audits that actually catch regressions in CI.
Why callback chains rot at scale in Rails, and how I replace them with service objects, domain events, and form objects.
Tracing the Arel AST to the adapter, prepared statement traps on PostgreSQL, and the query cache pitfalls that actually bite in production.
How Action Cable actually moves WebSocket frames through Puma, Redis, and your channels, and the point at which AnyCable starts paying for itself.
Discriminated unions, polymorphic components, template literals, and generics for tables and selects. The patterns I keep, the ones I dropped, and why type gymnastics is usually a smell.
How I think about aggregate sizing in DDD: small aggregates, one per transaction, optimistic concurrency, and how to split a god-aggregate without breaking invariants.
Three ways to persist a DDD aggregate (relational ORM, document store, event sourcing) behind the same repository interface, and the tradeoffs that actually showed up in production.
Why most React animations drop frames, how transform and opacity beat everything else, and when to reach for FLIP, Framer Motion, and Chrome DevTools.
How I actually build Anti-Corruption Layers between bounded contexts, with sync and async translator patterns drawn from real Payment-to-Order work.
Kong, Envoy, and a custom BFF gateway, weighed against a real build-vs-buy call where rate limiting, JWT validation, and response aggregation drove the decision.
How I think about use cases as the unit of work, where transaction boundaries belong, and the line between application and domain exceptions, after years of getting it wrong.
How I build audit logging in NestJS with interceptors, JWT-derived actor attribution, before-and-after entity snapshots, append-only tables, PII masking, and retention rules that hold up to a compliance review.
How 202 Accepted, webhooks, and WebSockets replaced a sync HTTP path that kept timing out under load on a real-time platform.
How I actually use CloudWatch in production: EMF, metric filters, composite alarms, and the cost gotchas I wish someone had warned me about.
Token storage trade-offs, silent refresh, the httpOnly cookie BFF pattern, OAuth2 PKCE in SPAs, and cross-tab logout, written from things I broke in production.
Notes from running multi-VPC AWS networking at scale. NAT Gateway cost traps, when peering breaks down, why Transit Gateway and PrivateLink are worth the hourly fee.
Fan-out via SNS to SQS, FIFO ordering, DLQ design, idempotent consumers, and visibility-timeout tuning, drawn from async pipelines I actually ran in production.
How includes, preload, and eager_load actually differ, where serializers hide N+1s, and how strict_loading plus a CI check kept the portfolio honest.
Sidekiq vs GoodJob from a creator-platform monolith perspective, idempotency that holds up to Apple's retry storms, and the queue metrics that actually predict outages.
How I built a terminal-native build status dashboard with Ink and React for our branded mobile app pipeline, plus the gotchas you only hit at the edges.
Building a design system at a live-video creator platform with Atomic CSS, semantic tokens, and a versioning strategy that survived a partially migrated codebase. The mistakes I'd avoid next time.
An opinionated take on shared chassis libraries for logging, metrics, tracing, health checks, and config across a fleet of microservices.
How I built a pluggable notification engine in NestJS using dynamic modules, discovery, and lifecycle hooks, and the failure modes that taught me to treat the registry as a public contract.
How I peeled callback-heavy Rails models into explicit in-process pub/sub with Wisper, Rails Event Store, and ActiveSupport::Notifications, without leaving the monolith.
Why I pick TipTap over raw ProseMirror for most product teams, plus schema design, Yjs collab, and serialization gotchas.
How I build production React tables with TanStack Table, virtualization, server-side pagination, inline editing, and memoization that actually holds up.
How to design read model projections in DDD systems: sync versus async, multiple projections per event stream, safe rebuilds, and lag monitoring.
How I build outgoing and incoming webhooks in Rails: HMAC signing, retries with backoff, idempotency keys, and per-endpoint observability that actually catches partner failures.
Why I partition thread pools, semaphores, and connection pools per dependency, and how that pairs with circuit breakers, timeout budgets, and load shedding in production.
Russian doll fragment caching, counter caches, and surrogate-key CDN invalidation on a read-heavy creator page that wouldn't sit still under load.
How I think about capacity planning for production web apps: load modeling, bottleneck hunting, headroom buffers, cost curves, and review cadence.
Cache key design, CloudFront Functions vs Lambda at Edge, origin shield, and multi-origin failover, written up from a few production incidents I'd rather not repeat.
A senior engineer's opinionated take on TCP, Redis, NATS, gRPC, and Kafka transports in NestJS, and when Kafka actually earns its complexity.
A working circuit breaker in TypeScript, the state machine that actually matters, and the two production incidents that made me stop trusting retries.
A Postgres outage caused by a nightly batch, analyst dashboards, and Sidekiq workers all competing for the same connection pool, and what actually fixed it.
Undo, redo, multi-user editing, and real-time sync inside a visual app-builder we shipped at the creator economy platform. What I shipped, what bit me, and what I'd do differently.
What actually keeps container images safe in production: distroless base images, CI scanning, cosign signing, and Pod Security Standards on EKS.
How I run @nestjs/cqrs in production. Commands, projections, sagas, optimistic concurrency, and the failure modes that taught me to measure freshness instead of throughput.
An opinionated take on splitting command and query services with Postgres writes, Elasticsearch and Redis reads, and event-driven projections.
CSS Modules, Tailwind, StyleX, and vanilla-extract compared on bundle size, DX, design tokens, and migration. Why I keep landing on Tailwind plus a typed token layer for product teams.
Why I default to a transactional outbox plus idempotent consumers across services, with TypeScript code, Debezium CDC, and two production scars that set the rule.
Server state vs client state, query keys as a public API, optimistic updates, and when SWR wins over TanStack Query. Lessons from production frontends.
How I separate data migrations from schema migrations in Rails, with resumable batched backfills, dual-write windows, and zero-downtime cutovers on Aurora.
Why I default to one database per service in production, with TypeScript code for event-carried state transfer, a CQRS read model, and hard lessons from production.
Logical vs physical backups, point-in-time recovery on Aurora, cross-region DR, and why recovery drills matter more than backup configs.
How to apply CQRS inside a DDD monolith without dragging in message brokers, distributed transactions, or eventual-consistency pain you don't need yet.
Anemic models, god aggregates, ORM-as-domain shortcuts, and big-bang migrations. The four DDD mistakes I keep watching teams make, with the refactors that actually pulled us out.
What bounded contexts, context maps, and ubiquitous language actually felt like inside a portfolio-wide DDD migration at a London product agency, and where I'd do it differently.
Entities, value objects, aggregate sizing, and repository design as they actually showed up in TypeScript and Ruby during a portfolio-wide DDD migration.
Why I model DDD aggregates as pure (State, Command) to (NewState, Events) functions with Result types, instead of mutable OO aggregates that throw.
How I actually structure NestJS apps around bounded contexts, with branded types, discriminated unions, and Result types pulling their weight.
How I apply DDD inside a Rails monolith without abandoning ActiveRecord. Value objects, bounded context namespacing, and Rails Event Store, as they showed up in real codebases.
How we carved a native mobile release flow out of a giant Rails monolith using Strangler Fig, bounded contexts, and a database split that did not blow up logins.
How decoupling deploy from release with feature flags turned a high-stress Rails monolith into boring daily ships, and the war stories that pushed me there.
How to make domain events first-class on the aggregate: dispatch timing, payload design, idempotent consumers, and asserting events in tests.
Redis Cluster sharding, L1+L2 cache layering, event-driven invalidation across services, and the stampede-prevention tricks that kept reads alive at peak hours.
How in-process domain events decoupled aggregates inside a monolith at a London product agency, where domain and integration events part ways, and the ordering and testing scars that came with it.
How I rolled out OpenTelemetry across NestJS and Rails services, with W3C context propagation, OTLP, sampling that doesn't lie, and trace-log correlation for SLI-driven debugging.
Why Domain Storytelling beats EventStorming when the team and domain experts haven't agreed on language yet, with a workshop walkthrough and TypeScript code.
Three service types, one decision: where does this method actually go. A pragmatic framework with code that shows the anemic-model trap and the fix.
A senior engineer's honest take on Cursor, Claude Code, and where AI tooling actually pulls weight in production work versus where it gets in the way.
A pragmatic error handling architecture for NestJS: domain errors mapped to HTTP, a stable error code catalog, a standard envelope, and the Sentry and GraphQL bits that catch the messy edges.
A practical take on @nestjs/event-emitter for decoupled side effects like notifications and audit, plus the in-process limits that tell you when to graduate to a real broker.
How I run Confluent Schema Registry with Avro and Protobuf across teams, with compatibility modes, evolution rules, and CI-time breaking-change detection.
How I pick between Kafka, RabbitMQ, and NATS for async microservice communication, with production code, broker config, and two war stories from the Kafka backbone I ran for years.
A practical guide to running an event-sourced aggregate inside a DDD monolith on Postgres: append-only log, replay, snapshots, projections.
How I keep feature flag debt under control: lifecycle policies, jscodeshift codemods that strip stale conditionals, and CI guardrails that block expired flags from shipping.
Why I picked Shrine over ActiveStorage and CarrierWave for direct-to-S3 uploads, image and video processing, and chunked large-file handling on a Rails app at creator-platform scale.
How I run streaming multipart uploads to S3, MIME and magic-number validation, and BullMQ pipelines for image, PDF, and video processing in NestJS. With a storage abstraction that doesn't lie to me.
React Hook Form plus Zod, multi-step wizards, useFieldArray, cross-field and async validation. The patterns I keep reaching for, the ones I've stopped trusting.
How a gateway tier ate 185ms of p99 latency without anyone noticing, and the boring fixes that gave it back. JWKS caching, DNS, keep-alive, and async logs.
Static hosting beats SSR for most products. Why I default to S3 plus CloudFront, preview deploys per PR, content-hashed assets, and rollback as a flip.
How I wire Sentry into React and Next.js apps in production, where Error Boundaries actually go, and why source maps and releases matter more than dashboards.
How I wire i18n into a React/Next.js app that ships in a dozen locales: i18next, ICU plurals, RTL with logical properties, lazy bundles, Intl APIs, and CI checks for missing keys.
Real user monitoring with web-vitals, Sentry Performance, and percentile dashboards. Tie frontend optimization to actual field data, not Lighthouse vibes.
How I shrank bundles, split routes, and cached at the edge on React and Vue apps serving millions, with two production scars to show for it.
Four React state libraries put through the same visual designer feature. Redux Toolkit, Zustand, Jotai, XState. What I'd actually pick today and why.
How I'd lay out a frontend test suite for a team of five-plus engineers, why I put the trophy ahead of the pyramid, and what the CI numbers actually look like.
When PostgreSQL full-text search is enough, when to reach for Searchkick on Elasticsearch, and where Meilisearch fits. Reindexing cost, ranking, facets, and typo tolerance from a Rails app I actually ran in production.
A working engineer's take on Apollo vs urql vs Relay - cache normalization, optimistic updates, pagination, codegen, and which one I default to for React apps.
A practical walk through rolling out an Apollo Federation 2 subgraph in NestJS, with entity resolution, gateway auth, and the path off schema stitching.
Where gRPC actually pays off in service-to-service traffic, where REST still wins, and how Protobuf, streaming, Envoy, and schema evolution shake out in production.
How a portfolio-wide DDD migration at a London product agency landed on a hexagonal monolith, where driving and driven ports sit, and the folder layout that actually survived contact with a real team.
A production take on graphql-ruby in Rails API mode: Dataloader for N+1, complexity limits, persisted queries, ActionCable subscriptions, and why versioned endpoints are the wrong instinct.
How I tune liveness, readiness, and startup probes across NestJS services so rolling updates don't cascade and a slow downstream doesn't kill the whole fleet.
Turbo Frames, Streams, morphing, and Stimulus patterns from a Rails monolith at a creator platform. When Hotwire beats a React SPA and the pitfalls that bite at scale.
A new search feature shipped clean in staging, then sequential-scanned a half-billion-row table on the busiest day of the year. The lessons that became hard CI gates.
Why idempotency keys, DB-level uniqueness, and consumer dedup tables are the actual contract for safe retries on payments, emails, and webhook handlers.
Severity definitions, runbooks that get read at 3 a.m., on-call rotation health, and blameless post-mortems that actually change behavior.
How I structured i18n for a Rails app that shipped into a non-English market. Locale resolution, file layout, DB-backed translations with Mobility, and a CI check that catches missing keys before launch.
How I carved bounded contexts out of a large legacy codebase using bubble contexts, an anti-corruption layer, and a strangler fig migration with event-driven integration.
What a backend engineer actually has to get right about Kubernetes before shipping a service to production. Pods, probes, HPA, and resource sizing from a real production angle.
How I think about k6, Gatling, and Locust for HTTP load testing, why percentile design beats average-based SLOs, and how to load-test production without taking it down.
Notes from running a multi-team Nx monorepo. CODEOWNERS, module boundaries, affected-only CI, and the day Conway's Law stopped being a theory.
Runtime composition with Webpack Module Federation, the sharp edges I hit at Superpeer and Kajabi, and when a monorepo beats it.
Distributed monoliths, nano-services, and shared-library coupling. Detection signals and remediation steps pulled from real architecture reviews.
An opinionated take on independently deployable microservices: mono- vs multi-pipeline, affected-only builds, Pact contract verification, and per-service canary releases.
How I map DDD aggregate roots to single tables, embed value objects as columns, run optimistic concurrency on a version column, and evolve schema without locking hot tables.
Centralizing config across hundreds of services with Consul KV, AWS Parameter Store, and Vault, with hot reloads, secret rotation, and dev/prod parity that actually holds.
How I use event-carried state transfer plus a bootstrap snapshot to keep materialized views in sync across services, and the lag metrics I actually page on.
Why I default to Grafana Loki over ELK for shipping microservice logs, with Fluent Bit configs, trace-ID correlation, and sampling rules from production.
How I run mTLS, JWT propagation, machine-to-machine OAuth2, API key rotation, and Kubernetes NetworkPolicies across a hundreds-of-services topology, with two war stories where trust was the bug.
URL, header, and query-param API versioning compared, what counts as breaking, event upcasting, and consumer-driven contract tests.
How I ran a time-partitioned table migration on Aurora at 50K writes per second, with dual-write, throttled backfill, sequence sync, and a rollback plan that actually held.
How a portfolio-wide DDD migration at a London product agency settled on a modular monolith with strict namespace isolation, internal and published events, and a database-per-module discipline.
How I enforce monorepo package boundaries with TypeScript project references, ESLint import rules, and dependency-cruiser, and how to roll it out without breaking the team.
How I run Rails 7 against Aurora writers, reader replicas, and sharded clusters in production. Role switching, replica-lag stickiness, and migrations that don't take down login.
Named DataSources, replica routing via an interceptor, failover, and keeping transactions on the primary so reads see your own writes.
Schema-per-tenant versus row-based isolation in Rails, default_scope traps that leak data, and the tenant-aware cache keys that survived a real SaaS.
Tenant detection, AsyncLocalStorage context, and schema vs row isolation in NestJS. The cross-tenant leak that taught me to test isolation as a first-class feature.
Passport strategies, JWT with refresh-token rotation and family detection, Redis sessions, OAuth, and the lockout patterns I learned the hard way.
How I layer in-memory cache, Redis, custom keys, event-driven invalidation, and HTTP ETag headers in NestJS to take real load off Postgres.
How I run @nestjs/config in production. Typed namespaces, Zod validation that fails fast at boot, AWS Secrets Manager rotation, and the test overrides that keep CI honest.
How I wire Terminus indicators, readiness vs liveness probes, and shutdown hook ordering in NestJS so Kubernetes rolling updates actually stay zero-downtime.
How Nest's IoC container actually wires the graph, why dynamic modules and request scope bite you at scale, and the custom-decorator habits that kept the codebase testable as the team grew.
How I run multiple NestJS services in an Nx monorepo with shared libs, affected builds, tag-enforced boundaries, and per-app deploys, plus the production incidents that shaped the setup.
How I wire Pino, AsyncLocalStorage correlation IDs, OpenTelemetry, and PII redaction into NestJS, plus what shipping to CloudWatch and Datadog actually looks like under load.
The middleware to exception filter chain in order, with a rate-limiting guard and an audit-trail interceptor pulled from real production placement decisions.
How I structure class-transformer groups, ClassSerializerInterceptor, role-based field exposure, pagination envelopes, and serialization-driven API versioning in NestJS, with the war stories that pushed me there.
How I write NestJS tests that catch real bugs. Test.createTestingModule with overrides, Supertest E2E, TestContainers for real Postgres, and what to do about flaky tests in CI.
Flow producers, worker concurrency, retry backoff, Redis Cluster connections, and Bull Board in NestJS. The patterns I keep reaching for when the queue has to survive real on-call.
Schema-first Prisma in NestJS. PrismaService lifecycle, migrations on Aurora, interactive transactions, client extensions, and pool tuning that actually holds under load.
Why I swap NestJS's Express adapter for Fastify on real services, what changes underneath, and the plugin, multipart, and Passport gotchas that actually cost time.
Six hours of stale prices on millions of product pages at a creator-economy platform, and the event-driven invalidation, two-level coordination, and freshness monitoring that finally killed the drift.
A short story about an ElastiCache partition that oversold inventory and corrupted sessions, and the rule it left behind: the cache is never the source of truth.
A live sports final pushed our Socket.io gateway past the point of recovery. Here's how we rebuilt it on AnyCable, topic-sharded fan-out, and Redis Streams.
How I use the Policy pattern to keep variable business rules out of aggregates, with pricing, discount, and shipping examples in TypeScript.
Distributed locking in Rails without adding Redis. How I replaced a SETNX coordination layer with Postgres advisory locks, dropped a dependency, and avoided the usual deadlock traps.
When JSONB beats normalized tables on Aurora and when it bites back. GIN, jsonb_path_ops, expression indexes, and the migration patterns I actually trust in production.
How I planned and executed a major-version PostgreSQL upgrade on a multi-terabyte Aurora cluster using logical replication, with the sequence sync, DDL freeze, cutover, and rollback details that actually matter.
How I actually use pg_stat_statements to find and fix slow queries on a large Aurora cluster, with snapshots for trend tracking and Prometheus/Grafana wiring.
Streaming, logical, and failover replication on Aurora and self-hosted PostgreSQL. What actually breaks, how to monitor lag honestly, and where Patroni earns its keep.
How I use process managers and sagas to keep aggregates consistent across long-running workflows, with compensation, watchdogs, and tests for partial failure.
How I treat Postgres RLS as a backstop behind tenant-aware queries: session-var context, policy types, performance, and isolation tests in CI.
Hypothesis-driven production debugging with flame graphs, continuous profiling, bpftrace, and core dumps, drawn from real incidents on Aurora and WebSocket gateways.
How I ran a dual-boot Rails 7 to Rails 8 upgrade across a portfolio of legacy client apps, with Zeitwerk fixes, a real gem audit, and a feature-flag gated rollout.
URL path vs header vs content negotiation API versioning in Rails, with deprecation headers, sunset dates, and tolerant readers.
When a PWA actually beats native, when it loses, and the service worker, Workbox, Background Sync, and Web Push details I keep getting bitten by.
Devise sitting on top of Warden, custom strategies that survive contact with production, and why I prefer ActionPolicy over Pundit at Aurora scale.
How Zeitwerk maps files to constants, why classic-mode habits break, and how to actually debug uninitialized constant errors in a real Rails monolith.
Carving team boundaries inside a Rails monolith with mountable engines, inter-engine contracts, and private gems, without going microservices.
Structured JSON logging and ActiveSupport::Notifications in a Rails monolith. The cleanup that turned an incident from a needle hunt into a single Datadog query.
Why I moved off @nestjs/throttler to a Redis-backed limiter, and how to pick between token bucket and sliding window without burning a weekend.
How Rack middleware actually runs inside Rails, and the request-ID, IP allowlist, and tenant-header middlewares I shipped at the creator platform when controllers were the wrong layer.
From Rack::Attack defaults to Redis-backed sliding windows with Lua atomics. How we shut down a scraping pattern abusing our branded-app admin API without bricking real creators.
When RSC actually pays off, where the server-client boundary belongs, and two scars I picked up moving real apps onto it.
How I actually profile a slow Rails endpoint at scale, with rack-mini-profiler, Stackprof, memory_profiler, and derailed_benchmarks.
Yjs primitives, awareness, providers, persistence, and the React glue that holds it together. Why I reach for CRDTs over OT in product code now.
How I choose between WebSocket Gateways, SSE, and long polling in NestJS. Redis adapter scaling, JWT-on-handshake auth, and the reconnect storm that taught me backoff lives on the client.
A grounded, step-by-step refactor from anemic transaction scripts to DDD aggregates, with value objects, behavior pushdown, and domain events kept honest by characterization tests.
Why I default to orchestrated sagas across distributed transactions, with TypeScript code, compensating actions, and two production incidents that set the rule.
How I run @nestjs/schedule across multiple pods without double-firing, using Redis locks or Postgres advisory locks, plus the dynamic scheduling and monitoring patterns I trust in production.
Real-world trade-offs between AWS Secrets Manager, Parameter Store, and Vault, plus rotation, Kubernetes wiring, and a compromise response playbook.
An opinionated take on Consul, Eureka, and Kubernetes DNS-based discovery, with a real migration path off Consul and the health-check failure modes that bite in production.
Rails defaults handle most of OWASP for free. The bugs that bite are the ones you write on top of them: leaky cache keys, unsigned webhooks, mass-assignment shortcuts, and sessions that outlive their tokens.
An opinionated take on Istio versus Linkerd, when sidecars earn their resource overhead, and when a service mesh is just expensive YAML.
When a plain PORO is enough, when the interactor gem earns its keep, and where dry-rb command objects actually fit on a Rails monolith at scale.
A first-person scaling story from a real-time trading platform. Connection storms, cache stampedes, replica lag, and the rebuild that actually held.
How a creator-economy platform finally sharded a multi-terabyte Aurora cluster after two years of avoidance, including the shard key call, hot-tenant rebalancing, a mid-flight UUID migration, and pushing cross-shard reporting to the warehouse.
When sidecars actually earn their pod slot for TLS, retries, logging, and config sync, and when a plain library beats one.
A senior Rails engineer's take on Discard, Paranoia, and PaperTrail, GDPR right-to-deletion conflicts, and keeping version tables from eating your Aurora writer.
How I use specifications to keep business rules out of fat services and out of raw SQL, with TypeScript and Prisma examples drawn from real production code.
How we modeled tree state, selection, and multi level nesting in a visual app builder without the recursive React render storm.
After almost ten years on JetBrains IDEs, I moved to Cursor on top of VS Code. Here's what broke, what stuck, and the few extensions that made the switch worth it.
How to write DDD tests that read like the domain itself, using Given-When-Then naming, aggregate invariant tests, and domain-specific test builders that double as living documentation.
How I keep a large Rails test suite under twelve minutes on CI: factory_bot tuning, parallel execution, fixture trade-offs, and how to quarantine flaky system tests before they block every deploy.
How I structure Terraform for multi-environment AWS infra, with remote state on S3 and DynamoDB locks, plan-on-PR in CI, drift detection, and tfsec plus Infracost in the loop.
Why the testing pyramid flips toward integration in distributed systems, and how I use Pact, TestContainers, and chaos drills to stop production from teaching me lessons.
Ninety seconds. That's how long it took for one slow downstream call to drag a hundreds-of-services topology to its knees. What I learned about circuit breakers, bulkheads, timeout budgets, and the difference between a health check that's honest and one that lies.
A PostgreSQL post-mortem on autovacuum contention, a missing index, and an ORM-generated query that hid in plain sight. Plus why pg_stat_statements is not optional.
A duplicate-charge incident, the two-phase payment fix, and why idempotency keys belong in the contract, not in the optimization backlog.
A Monday morning support ticket taught me that Kafka consumer health is about event freshness, not throughput. Manual commits, idempotency keys, and a real consumer SLO.
A hidden serializer N+1 quietly turned every request into dozens of Aurora reads. Here's how we caught it, what the bill looked like, and the CI guardrails we shipped so it can't happen again.
A three-week hunt for a once-per-million duplicate order bug, a Sidekiq super_fetch surprise, a Redis blip, and the database forensics that finally cracked it.
We took down our own platform with naive retry logic. Here's the 50x amplification math, the false starts, and the retry budgets and circuit breakers that actually fixed it.
Why I run Zod end-to-end in NestJS instead of class-validator, how the OpenAPI document becomes the contract, and the bugs that taught me to treat schemas as the source of truth.
I've shipped all three in NestJS. Here's how they actually compare on DX, migrations, generated SQL, and transactions, and which one I reach for now.
CAP is not a label you stick on a database. It's a per-endpoint choice I make every week on Aurora, Elasticsearch, and real-time price feeds. Here's how I actually pick.
Money, Email, DateRange, Address. How I model value objects in TypeScript, why immutability and structural equality matter, and how to persist them without leaking infra into the domain.
Comlink, transferable objects, SharedArrayBuffer, and how I moved CSV parsing and search indexing off the main thread on a visual app builder. Measured wins and the gotchas that bit me.
How a quiet memory leak quadrupled an AWS bill, why NAT Gateway costs catch teams off guard, and the cost engineering program we built to stop the bleeding.
An honest look at AWS Lambda for backend workloads: cold starts, RDS Proxy, provisioned concurrency, and the specific shapes of work where Lambda is the wrong call.
A failed decomposition that made latency worse and outages louder. Why I now default to a modular monolith and only extract services when there's a real scaling axis.
When a DDD bounded context earns its own service, and when splitting it just gives you a distributed monolith with extra steps.
How I run expand-contract migrations on Aurora and MySQL at scale, with gh-ost, pg-osc, CREATE INDEX CONCURRENTLY, batched backfills, and CI validation against production-like data.
An opinionated take on Temporal vs AWS Step Functions for sagas, human approval steps, and observability from a backend architect lens.
No posts in this category yet.