This project demonstrates the use of Node.js streams to process large amounts of data efficiently. It includes an example of reading a large text file, transforming its contents, and writing the transformed data to a new file.
Streams in Node.js allow for efficient data processing by handling data in chunks rather than loading the entire data set into memory. This project showcases the benefits of using streams for large file transformations.
- Node.js (v14.x or later)
- npm (v6.x or later)
- Python (for generating test data)
-
Clone the repository:
git clone https://github.com/vitorrios1001/stream-learning.git cd stream-learning -
Install dependencies:
npm install
Before running the example, you need to generate a large text file. You can use the provided Python script to create a 10GB file.
-
Run the Python script:
python3 generator.py
This will create a file named
large-input.txtin the project directory.
Once you have generated the test data, you can run the Node.js script to process the file.
- Run the Node.js script:
node src/index.jsThis script reads the large-input.txt file, converts its contents to uppercase, and writes the transformed data to large-output.txt.
The Node.js script uses the following components:
- Readable Stream: Reads data from the input file in chunks.
- Transform Stream: Converts each chunk of data to uppercase.
- Writable Stream: Writes the transformed data to the output file.
The script also tracks the progress of the transformation, logging updates every 10% and displaying the total time taken at the end.
// src/index.js
const fs = require('fs');
const { Transform } = require('stream');
const { performance } = require('perf_hooks');
// Caminho para o arquivo de entrada e saída
const inputFile = 'large-input.txt';
const outputFile = 'large-output.txt';
// Cria um Readable Stream a partir do arquivo de entrada
const readableStream = fs.createReadStream(inputFile, { encoding: 'utf8' });
// Cria um Writable Stream para o arquivo de saída
const writableStream = fs.createWriteStream(outputFile);
// Variáveis para rastreamento de progresso
let totalSize = 0;
let processedSize = 0;
let lastLoggedProgress = 0;
const startTime = performance.now();
let processedLines = 0;
fs.stat(inputFile, (err, stats) => {
if (err) {
console.error('Erro ao obter informações do arquivo:', err);
return;
}
totalSize = stats.size;
// Pipe o Readable Stream para o Transform Stream e depois para o Writable Stream
readableStream
.pipe(
new Transform({
transform(chunk, encoding, callback) {
processedSize += chunk.length;
processedLines += chunk.toString().split('\n').length - 1;
// Converte o chunk de dados para letras maiúsculas
const upperCaseChunk = chunk.toString().toUpperCase();
// Chama o callback com o chunk transformado
callback(null, upperCaseChunk);
// Log de progresso
const progress = (processedSize / totalSize) * 100;
if (progress >= lastLoggedProgress + 10) {
console.log(
`Progresso: ${Math.floor(progress)}%, Linhas processadas: ${processedLines}`
);
lastLoggedProgress = Math.floor(progress);
}
},
})
)
.pipe(writableStream)
.on('finish', () => {
const endTime = performance.now();
const timeTaken = ((endTime - startTime) / 1000).toFixed(2);
console.log('Transformação completa e arquivo salvo.');
console.log(`Total de linhas processadas: ${processedLines}`);
console.log(`Tempo total: ${timeTaken} segundos`);
})
.on('error', (err) => {
console.error('Erro durante a transformação:', err);
});
});- Memory Efficiency: Streams process data in chunks, which avoids loading the entire file into memory. This is crucial for large files, preventing memory overflow and improving performance.
- Real-Time Data Processing: Streams allow continuous data processing, starting to process the first chunks of data while still receiving the next ones. This reduces the total processing time.
- Maintaining Responsiveness: By not blocking the Node.js Event Loop, streams help keep the application responsive even during intensive I/O operations.
To more details, take a look on article