BackendNode.jsStreamsPerformanceBackendFile Processing

Node.js Streams: Process Large Files Without Running Out of Memory

Master Node.js Streams to process large CSV files, video uploads, and data pipelines without loading everything into memory at once.

Abdur Razzak

Abdur Razzak

Full-Stack Web Developer

May 15, 2025 10 min read

The Problem with Large Files

Processing a 1GB CSV file by reading it entirely into memory will crash your Node.js server with an out-of-memory error. Node.js Streams solve this by processing data in small chunks — reading a bit, processing it, and discarding it before reading the next bit. Memory usage stays constant regardless of file size. This is essential for file upload processing, data import pipelines, and log analysis.

Types of Streams in Node.js

Node.js has four stream types: Readable (source of data — file read, HTTP request body), Writable (destination — file write, HTTP response), Duplex (both readable and writable — TCP sockets), and Transform (a Duplex that transforms data — compression, encryption). Most stream use cases combine a Readable source, one or more Transform streams for processing, and a Writable destination.

Piping Streams Together

The pipe() method connects a Readable to a Writable, handling backpressure automatically. For more complex pipelines, use the pipeline() function from stream/promises for proper error handling and cleanup: await pipeline(fs.createReadStream('input.csv'), csvTransformStream, fs.createWriteStream('output.json')). pipeline() is safer than pipe() because it handles errors and cleanup for all stream stages.

Parsing CSV with Streams

Use the csv-parse library in streaming mode to parse large CSV files. Create a parse Transform stream, pipe your file Readable into it, and each row is emitted as a JavaScript object. Process each row in the stream's data event, or use async iteration (for await...of csvParser) for cleaner syntax. Insert processed rows into MongoDB in batches of 1000 using insertMany for efficient bulk operations.

Streaming HTTP Responses

Stream data directly from your database to the HTTP response without buffering everything in memory. In Express, call res.setHeader('Content-Type', 'application/json') then stream data: response.write('['), stream each record with response.write(JSON.stringify(doc) + ','), then response.end(']'). For file downloads, pipe a file read stream directly to res: fs.createReadStream(filePath).pipe(res).

Backpressure and Flow Control

Backpressure occurs when a fast Readable overwhelms a slow Writable — data piles up in memory and causes the same problem you were trying to avoid. Node.js streams handle backpressure with the Writable.write() return value (false means pause) and the drain event (signals when to resume). The pipeline() function and async iteration handle backpressure automatically — prefer these over manual pipe() management.

Share this article

All posts
#Node.js#Streams#Performance#Backend#File Processing
Abdur Razzak — Full Stack Web Developer

Free Consultation

Got a Project Idea? Let's Talk.