Why in-process events and cron break in multi-core Node.js — and how to fix it properly

The moment you spread a Node.js app across multiple cores with cluster, two things that worked flawlessly in a single process start breaking silently: in-process events (EventEmitter) and decorator cron jobs (@Cron). The cause is one, and so is the fix. Below is the mechanism of the problem, why it shows up as "sometimes works, sometimes doesn't," and how to solve it properly with Redis/BullMQ — practically.

Examples are from the mnazorat backend (NestJS, cluster, 8 cores; 40+ organizations, 400+ employees tracked).

Foundation: cluster makes each worker a separate process

Node.js runs on a single thread — one process occupies one CPU core. To use 8 cores, the standard path is cluster: the primary process forks a worker per core, all listening on one port.

import * as cl from 'cluster';
import { cpus } from 'os';

const cluster = cl as unknown as cl.Cluster;

if (cluster.isPrimary) {
  const numCPUs = cpus().length; // 8
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }
  cluster.on('exit', (worker) => {
    cluster.fork(); // restart the dead worker
  });
} else {
  bootstrap(); // each worker is a fully separate NestJS app
}

Performance-wise this is excellent. But it comes with one fact that everything else follows from:

Each worker is a separate process: separate memory, separate variables. Workers share nothing in memory.

Problem 1: `EventEmitter` only works within its own process

@nestjs/event-emitter is convenient: emit in one place, @OnEvent in another — modules stay decoupled.

// emitter
this.eventEmitter.emit('gps.location.received', payload);

// listener (in the same process)
@OnEvent('gps.location.received')
async handleLocation(payload: GpsPayload) { /* process, persist */ }

EventEmitter is a plain in-memory object. emit() finds a listener only within its own process. An event emitted in Worker A never reaches Worker B.

What this looks like in practice. Our GPS points were processed through this in-process flow. The OS routes each incoming request to a random worker: one point to Worker 3, the next to Worker 7. If any link in the chain assumes "we're in one process," a point that landed on another worker can't join that chain and silently disappears — no error, no log.

The result is an intermittent bug: works for one user, not another; works today, not tomorrow — because it all depends on which worker the request hit. It took us ~7 days to find, mostly because we first looked at the database. In reality the data never reached the database at all — it was lost in the gap between workers.

Diagnostic sign: if data is lost "halfway," the database is healthy, and the error reproduces for random users/at random times — the first suspect is not storage but inter-process delivery.

Problem 2: `@Cron` is duplicated in every worker

The same "separate process" nature multiplies scheduled tasks.

@Cron(CronExpression.EVERY_10_MINUTES)
async cleanupOldData() { /* delete old records */ }

Each worker registers this decorator independently. 8 workers = the task runs 8 times in parallel every 10 minutes: 8 concurrent deletes in the database, race conditions, and any report/notification in 8 copies.

Anti-pattern: "leader worker" gating

The first idea that comes to mind is to designate one worker as "primary" and give singleton tasks only to it:

for (let i = 0; i < numCPUs; i++) {
  if (i === 0) process.env['PROCESS_WORKER_ID'] = '0';
  cluster.fork();
}
// then everywhere: if (process.env.PROCESS_WORKER_ID === '0') { ... }

It works, but it's a band-aid, not a solution. Why:

You have to wrap manually — forget one spot, and the duplicate is back.
No failover — if the "leader" dies, cron jobs stop entirely, with no re-election.
Doesn't solve the event problem — incoming requests still hit a random worker.

The real question is different: how do workers coordinate with each other?

Solution: move coordination outside the process

Both EventEmitter and @Cron are built for a single process. So multiple processes must coordinate through a shared external point. The practical tool is BullMQ on top of Redis.

1. Move cron to a BullMQ scheduler

Instead of @Cron, use upsertJobScheduler. All workers call it, but BullMQ stores the scheduler in Redis under a single id and merges duplicates (dedupe):

@Processor(GPS_CLEANUP_QUEUE)
export class GpsSchedule extends WorkerHost implements OnModuleInit {
  constructor(
    @InjectQueue(GPS_CLEANUP_QUEUE) private cleanupQueue: Queue,
  ) {
    super();
  }

  async onModuleInit() {
    // all 8 workers call it — one scheduler remains in Redis
    await this.cleanupQueue.upsertJobScheduler(
      'gps-cleanup-scheduler',
      { pattern: CronExpression.EVERY_DAY_AT_MIDNIGHT, tz: 'Asia/Tashkent' },
      { name: 'cleanup-old-data', data: { type: 'cleanup-old-data' } },
    );
  }

  // the job runs globally in only ONE worker
  async process(job: Job) {
    return this.handleCleanup(job);
  }
}

Result: the cron runs once globally (regardless of worker count), PROCESS_WORKER_ID gating is no longer needed, and failover comes for free.

2. Send events that cross the worker boundary through a queue

Data emitted in one process but that must be reliably processed goes through a queue instead of in-process emit() — now it lives in Redis, not in memory:

// instead of emit
await this.gpsQueue.add('location.received', payload);

@Processor('gps')
export class GpsProcessor extends WorkerHost {
  async process(job: Job) {
    await this.processLocation(job.data); // any worker picks it up
  }
}

Now it doesn't matter which worker the point landed on: it sits in the Redis queue, and even if the receiving worker dies, the job stays and is retried. This is a guarantee in-process EventEmitter cannot give.

When NOT to use a queue

Moving every event into a queue is the opposite mistake. If emit and the listener are always in one worker, within a single request (e.g., an inter-module signal inside a request), in-process EventEmitter is appropriate and faster. Add a queue only in two cases:

the data must cross the worker boundary, or
the work must run exactly once (with retry).

Otherwise you've only added extra infrastructure and latency.

Practical takeaway (checklist)

Before switching an app to cluster or PM2 cluster mode, or to multiple replicas, check:

`@Cron` / `setInterval` / `@Interval` — duplicated in every worker. Move the scheduler to a Redis-backed queue (BullMQ upsertJobScheduler).
`EventEmitter` / `@OnEvent` — if the emitter and listener may end up in different workers, use a queue or Redis Pub/Sub.
In-memory state — a Map/Set/variable cache, sessions, rate-limit counters are not shared across workers. Move shared state to Redis.
"One-off" tasks — don't rely on manual gating like PROCESS_WORKER_ID === '0'; hand dedupe to the infrastructure.

The general rule: in multi-core, ask "where should this state live — inside the process or outside it?" up front for every global task. Most often the answer is "outside" — and deciding that up front is cheaper than fixing it later.

Why in-process events and cron break in multi-core Node.js — and how to fix it properly

Foundation: cluster makes each worker a separate process

Problem 1: `EventEmitter` only works within its own process

Problem 2: `@Cron` is duplicated in every worker

Anti-pattern: "leader worker" gating

Solution: move coordination outside the process

1. Move cron to a BullMQ scheduler

2. Send events that cross the worker boundary through a queue

When NOT to use a queue

Practical takeaway (checklist)

Articles like this, every week

Related articles

Why in-process events and cron break in multi-core Node.js — and how to fix it properly

Foundation: cluster makes each worker a separate process

Problem 1: EventEmitter only works within its own process

Problem 2: @Cron is duplicated in every worker

Anti-pattern: "leader worker" gating

Solution: move coordination outside the process

1. Move cron to a BullMQ scheduler

2. Send events that cross the worker boundary through a queue

When NOT to use a queue

Practical takeaway (checklist)

Articles like this, every week

Related articles

Problem 1: `EventEmitter` only works within its own process

Problem 2: `@Cron` is duplicated in every worker