Implement a task queue that runs at most K tasks in parallel

Maintain an active count and a waiting queue. Each enqueue returns a promise; when active < K, run; otherwise wait. On finish, pull the next waiter. Bonus: cancellation, retries, prioritization.

7 min read·~30 min to think through

Concurrency-limited queues are the right answer to "fetch 1000 URLs without nuking the server" or "process 50 file uploads three at a time." The pattern is a tiny scheduler: active count + waiting queue.

Core implementation.

type Task<T> = () => Promise<T>;

export function createPool(limit: number) {
  let active = 0;
  const waiting: Array<() => void> = [];

  const next = () => {
    if (active < limit && waiting.length) {
      active++;
      waiting.shift()!();
    }
  };

  return function run<T>(task: Task<T>): Promise<T> {
    return new Promise<T>((resolve, reject) => {
      const start = () => {
        task().then(resolve, reject).finally(() => {
          active--;
          next();
        });
      };
      if (active < limit) {
        active++;
        start();
      } else {
        waiting.push(start);
      }
    });
  };
}

Usage:

const pool = createPool(3);
const results = await Promise.all(urls.map(u => pool(() => fetch(u).then(r => r.json()))));

Why a closure over a class. Closures keep the API tiny and prevent leaking internal state. A class is fine; the test is whether you can explain why the active counter and waiter queue must live outside any single call.

The subtle bug to discuss. A naive implementation increments active only inside start, not when enqueuing. If two callers race, both can see active < limit and start simultaneously, exceeding the limit. The fix above increments synchronously when admitting a task (either immediately or on dequeue).

Failure handling. If a task throws, you still must call next() and decrement active. finally is the right hook. Without it, one rejection wedges the queue.

Useful extensions.

Cancellation. Accept an AbortSignal; on abort, remove from waiting (reject with AbortError) or call the task with the signal.
Retry. Wrap the task: on rejection, decrement retry count and re-enqueue with backoff.
Priority. Replace the FIFO queue with a min-heap keyed by priority.
Per-key concurrency. Map of pools keyed by some attribute (e.g., per-host: limit 6 concurrent requests per origin to mirror browser behavior).

Comparison with libraries. p-limit, p-queue, fastq — all production-tested implementations of this pattern with extra features. In an interview, write the bones; in production, use one of these.

Common interview variant. "Run promises in batches of K." Different from concurrency limit: batch waits for all K before starting the next. The pool above is strictly better (starts the next as soon as any finishes).

Code

Follow-up questions

•How do you prevent the off-by-one race when admitting tasks?
•How would you support priorities?
•Why prefer a pool over batching (Promise.all on K at a time)?
•How do browsers limit concurrent connections per origin?

Common mistakes

•Forgetting to decrement active in the rejection path — queue wedges.
•Checking active < limit twice without atomic increment — races admit too many.
•Using setTimeout for the wait instead of an explicit queue — unbounded latency.
•Not exposing a way to await all in-flight tasks (drain).

Performance considerations

•Array.shift is O(n); for very large queues use a head-index or a linked list.
•Per-task overhead is a closure + microtask hops — negligible vs the work being done.

Edge cases

•limit = 0 → all tasks wait forever; reject in the constructor or treat as 1.
•limit = Infinity → degenerate to Promise.all.
•Caller cancels → must remove from waiting; not just abort the running fetch.

Real-world examples

•Browser limits 6 connections per host. Fetch crawlers, image preloaders, file upload queues.

Senior engineer discussion

Senior signal: the increment-on-admission detail, finally for cleanup, and knowing p-limit / p-queue exist.