Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,171 changes: 1,041 additions & 130 deletions LocalMind-Backend/package-lock.json

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions LocalMind-Backend/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@
"@types/jsonwebtoken": "^9.0.10",
"@types/mongoose": "^5.11.97",
"@types/morgan": "^1.9.10",
"@types/multer": "^2.0.0",
"argon2": "^0.44.0",
"axios": "^1.12.2",
"bcrypt": "^6.0.0",
Expand All @@ -70,9 +71,11 @@
"localtunnel": "^2.0.2",
"mongoose": "^8.19.1",
"morgan": "^1.10.1",
"multer": "^2.0.2",
"ngrok": "5.0.0-beta.2",
"nodemailer": "^7.0.10",
"ora": "^9.0.0",
"pdf-parse": "^2.4.5",
"zod": "^4.1.12"
}
}
97 changes: 88 additions & 9 deletions LocalMind-Backend/src/api/v1/DataSet/v1/DataSet.controller.ts
Original file line number Diff line number Diff line change
@@ -1,27 +1,106 @@
import { Request, Response } from 'express'

import { CSVLoader } from '@langchain/community/document_loaders/fs/csv'
import path from 'path'
import * as fs from 'fs'
import { SendResponse } from '../../../../utils/SendResponse.utils'
import DataSetService from './DataSet.service'
import FileLoaderUtils from './DataSet.fileLoader'
import { FileFormat } from './DataSet.type'

class DataSetController {
/**
* Upload and process dataset with support for multiple file formats
* Supports: CSV, PDF, TXT, JSON, TSV
*/
public async uploadDataSet(req: Request, res: Response): Promise<void> {
try {
const filePath = path.join(path.resolve(), 'src', 'data', 'Sample.csv')
// Check if file was uploaded
if (!req.file) {
SendResponse.error(res, 'No file uploaded', 400)
return
}

const uploadedFile = req.file
const filePath = uploadedFile.path

// Detect file format
const fileFormat = FileLoaderUtils.detectFileFormat(
uploadedFile.originalname,
uploadedFile.mimetype
)

if (!fileFormat) {
// Clean up uploaded file
if (fs.existsSync(filePath)) {
fs.unlinkSync(filePath)
}
Comment on lines +33 to +35
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using fs.unlinkSync is synchronous and blocks the Node.js event loop. In a server environment, this can degrade performance, especially under load. It's recommended to use the asynchronous version, fs.promises.unlink, to avoid blocking. This comment also applies to the other uses of unlinkSync in this file (lines 49, 60, and 71).

Suggested change
if (fs.existsSync(filePath)) {
fs.unlinkSync(filePath)
}
if (fs.existsSync(filePath)) {
await fs.promises.unlink(filePath)
}

SendResponse.error(
res,
'Unsupported file format. Supported formats: CSV, PDF, TXT, JSON, TSV',
400
)
return
}

// Load documents from file
const documents = await FileLoaderUtils.loadFile(filePath, fileFormat)

if (!documents || documents.length === 0) {
if (fs.existsSync(filePath)) {
fs.unlinkSync(filePath)
}
SendResponse.error(res, 'No data found in the uploaded file', 400)
return
}

const loader = new CSVLoader(filePath)
const documents = await loader.load()
// Process the dataset
const processedData = await DataSetService.Prepate_DataSet(documents)
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name Prepare_dataSet uses PascalCase with an underscore, which is inconsistent with JavaScript/TypeScript naming conventions. It should use camelCase like prepareDataSet or preparedDataSet. This same issue appears to exist in the service method being called.

Suggested change
const processedData = await DataSetService.Prepate_DataSet(documents)
const processedData = await (DataSetService as any)['Prepate_DataSet'](documents)

Copilot uses AI. Check for mistakes.

const Prepare_dataSet = await DataSetService.Prepate_DataSet(documents)
// Clean up uploaded file after processing
if (fs.existsSync(filePath)) {
fs.unlinkSync(filePath)
}

SendResponse.success(
res,
'Dataset uploaded and processed successfully',
JSON.parse(Prepare_dataSet)
`Dataset uploaded and processed successfully (Format: ${fileFormat.toUpperCase()})`,
JSON.parse(processedData)
)
} catch (error: any) {
SendResponse.error(res, 'Failed to upload dataset', 500, error)
// Clean up uploaded file on error
if (req.file && fs.existsSync(req.file.path)) {
fs.unlinkSync(req.file.path)
Comment on lines +33 to +71
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using synchronous file system operations (fs.existsSync, fs.unlinkSync, fs.readFileSync) can block the Node.js event loop. Consider using async alternatives (fs.promises.access, fs.promises.unlink, fs.promises.readFile) with try-catch blocks for better performance, especially since this is already in an async function.

Copilot uses AI. Check for mistakes.
}
Comment on lines +69 to +72
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the file cleanup fails (e.g., due to permissions or file locks), the error is silently ignored. Consider logging cleanup failures so administrators can identify and resolve issues with orphaned files that couldn't be deleted.

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +72
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file cleanup logic is duplicated in multiple places (lines 33-35, 48-50, 59-61, 70-72). Consider extracting this into a helper function like cleanupFile(filePath: string) to reduce code duplication and ensure consistent cleanup behavior across all code paths.

Copilot uses AI. Check for mistakes.
SendResponse.error(
res,
'Failed to upload and process dataset',
500,
error.message
)
Comment on lines +73 to +78
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Exposing raw error messages to the client can be a security risk, as it might leak sensitive information about the application's internals (e.g., file paths, library issues). It's better to log the detailed error on the server for debugging and send a generic error message to the client.

Suggested change
SendResponse.error(
res,
'Failed to upload and process dataset',
500,
error.message
)
console.error('Failed to upload and process dataset:', error);
SendResponse.error(
res,
'Failed to upload and process dataset',
500
)

}
}

/**
* Get list of supported file formats
*/
public async getSupportedFormats(
req: Request,
res: Response
): Promise<void> {
try {
const formats = Object.values(FileFormat)
SendResponse.success(res, 'Supported file formats', {
formats,
description: {
csv: 'Comma-separated values file',
xlsx: 'Excel spreadsheet (not yet fully supported)',
tsv: 'Tab-separated values file',
json: 'JSON file with Q&A pairs',
pdf: 'PDF document',
txt: 'Plain text file',
},
Comment on lines +95 to +100
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description states "Excel spreadsheet (not yet fully supported)" which is misleading. The XLSX format is included in the enum and allowed MIME types, but the implementation explicitly throws an error. Either remove XLSX from the supported formats list until it's implemented, or provide partial support with clear documentation about limitations.

Copilot uses AI. Check for mistakes.
})
Comment on lines +90 to +101
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The getSupportedFormats endpoint currently lists 'xlsx' as a supported format, but the implementation in DataSet.fileLoader.ts throws an error because it's not yet implemented. This can be misleading for API consumers. It's better to remove 'xlsx' from the list of supported formats until it is fully functional.

      const formats = Object.values(FileFormat).filter(
        (f) => f !== FileFormat.XLSX
      )
      SendResponse.success(res, 'Supported file formats', {
        formats,
        description: {
          csv: 'Comma-separated values file',
          tsv: 'Tab-separated values file',
          json: 'JSON file with Q&A pairs',
          pdf: 'PDF document',
          txt: 'Plain text file',
        },
      })

} catch (error: any) {
SendResponse.error(res, 'Failed to get supported formats', 500, error)
}
}
}
Expand Down
237 changes: 237 additions & 0 deletions LocalMind-Backend/src/api/v1/DataSet/v1/DataSet.fileLoader.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
import { CSVLoader } from '@langchain/community/document_loaders/fs/csv'
import { PDFLoader } from '@langchain/community/document_loaders/fs/pdf'
import { Document } from '@langchain/core/documents'
import { FileFormat, UploadedFileMetadata } from './DataSet.type'
import * as fs from 'fs'
import * as path from 'path'

/**
* File loader class to handle different file formats
*/
class FileLoaderUtils {
/**
* Detect file format from file extension or MIME type
*/
public detectFileFormat(filename: string, mimeType?: string): FileFormat | null {
const extension = path.extname(filename).toLowerCase().slice(1)

const formatMap: Record<string, FileFormat> = {
csv: FileFormat.CSV,
xlsx: FileFormat.XLSX,
xls: FileFormat.XLSX,
tsv: FileFormat.TSV,
json: FileFormat.JSON,
pdf: FileFormat.PDF,
txt: FileFormat.TXT,
}

// Try by extension first
if (formatMap[extension]) {
return formatMap[extension]
}

// Try by MIME type
if (mimeType) {
const mimeFormatMap: Record<string, FileFormat> = {
'text/csv': FileFormat.CSV,
'application/vnd.ms-excel': FileFormat.XLSX,
'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
FileFormat.XLSX,
'text/tab-separated-values': FileFormat.TSV,
'application/json': FileFormat.JSON,
'application/pdf': FileFormat.PDF,
'text/plain': FileFormat.TXT,
}

if (mimeFormatMap[mimeType]) {
return mimeFormatMap[mimeType]
}
}

return null
}

/**
* Load documents from a CSV file
*/
private async loadCSV(filePath: string): Promise<Document[]> {
const loader = new CSVLoader(filePath)
return await loader.load()
}

/**
* Load documents from a PDF file
*/
private async loadPDF(filePath: string): Promise<Document[]> {
const loader = new PDFLoader(filePath, {
splitPages: false, // Load entire PDF as one document
})
return await loader.load()
}

/**
* Load documents from a TXT file
*/
private async loadTXT(filePath: string): Promise<Document[]> {
try {
const fileContent = fs.readFileSync(filePath, 'utf-8')

// Split by double newlines to separate Q&A pairs or paragraphs
const sections = fileContent
.split(/\n\n+/)
.filter((section) => section.trim().length > 0)

if (sections.length === 0) {
// If no double newlines, treat entire content as one document
return [
new Document({
pageContent: fileContent,
metadata: { source: filePath },
}),
]
}

// Create a document for each section
const documents = sections.map((section, index) => {
return new Document({
pageContent: section.trim(),
metadata: { source: filePath, section: index },
})
})

return documents
} catch (error) {
throw new Error(`Failed to load TXT file: ${error}`)
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling in the catch block at line 104 doesn't properly wrap the error message. The template string should be: Failed to load TXT file: ${error} instead of Failed to load TXT file: ${error} (which appears correct but the Error constructor usage could be improved to preserve the original error stack). Consider using throw new Error(\Failed to load TXT file: ${(error as Error).message}`)` to preserve error information.

Suggested change
throw new Error(`Failed to load TXT file: ${error}`)
throw new Error(`Failed to load TXT file: ${(error as Error).message}`)

Copilot uses AI. Check for mistakes.
}
}

/**
* Load documents from a JSON file
*/
private async loadJSON(filePath: string): Promise<Document[]> {
try {
// Read the JSON file
const fileContent = fs.readFileSync(filePath, 'utf-8')
const jsonData = JSON.parse(fileContent)

// Check if it's an array of Q&A pairs
if (Array.isArray(jsonData)) {
// Convert array to documents
const documents = jsonData.map((item, index) => {
const pageContent = JSON.stringify(item)
return new Document({
pageContent,
metadata: { source: filePath, row: index },
})
})
return documents
}

// If it's a single object, wrap it in an array
const pageContent = JSON.stringify(jsonData)
return [
new Document({
pageContent,
metadata: { source: filePath },
}),
]
} catch (error) {
throw new Error(`Failed to parse JSON file: ${error}`)
}
}

/**
* Load documents from a TSV file
*/
private async loadTSV(filePath: string): Promise<Document[]> {
// TSV is similar to CSV but with tab delimiter
const loader = new CSVLoader(filePath, {
separator: '\t',
})
return await loader.load()
}

/**
* Main method to load file based on format
*/
public async loadFile(
filePath: string,
format: FileFormat
): Promise<Document[]> {
// Validate file exists
if (!fs.existsSync(filePath)) {
throw new Error(`File not found: ${filePath}`)
}

try {
switch (format) {
case FileFormat.CSV:
return await this.loadCSV(filePath)

case FileFormat.PDF:
return await this.loadPDF(filePath)

case FileFormat.TXT:
return await this.loadTXT(filePath)

case FileFormat.JSON:
return await this.loadJSON(filePath)

case FileFormat.TSV:
return await this.loadTSV(filePath)

case FileFormat.XLSX:
// For Excel files, you'll need to convert to CSV first
// or use a library like 'xlsx' to parse them
throw new Error(
'XLSX format not yet implemented. Please convert to CSV.'
)

default:
throw new Error(`Unsupported file format: ${format}`)
}
} catch (error) {
throw new Error(`Failed to load file: ${error}`)
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling at lines 139 and 194 wraps errors in a way that loses the original error stack trace. Consider preserving the original error information by using throw new Error(\Failed to parse JSON file: ${(error as Error).message}`, { cause: error })` or similar pattern to maintain debugging context.

Suggested change
throw new Error(`Failed to load file: ${error}`)
if (error instanceof Error) {
throw new Error(`Failed to load file: ${error.message}`, { cause: error })
}
throw new Error(`Failed to load file: ${String(error)}`)

Copilot uses AI. Check for mistakes.
}
Comment on lines +193 to +195
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Wrapping the caught error object directly in a new Error constructor will stringify it (often to [object Object]), losing valuable information like the original error's stack trace and type. To preserve this information for better debugging, you should re-throw the original error or create a new error that includes the original error's message.

Suggested change
} catch (error) {
throw new Error(`Failed to load file: ${error}`)
}
} catch (error: any) {
throw new Error(`Failed to load file: ${error.message}`)
}

}

/**
* Get file metadata
*/
public getFileMetadata(filePath: string): UploadedFileMetadata {
const stats = fs.statSync(filePath)
Comment on lines +201 to +202
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using synchronous file system operations (fs.readFileSync, fs.statSync) can block the Node.js event loop. Consider using async alternatives (fs.promises.readFile, fs.promises.stat) with await for better performance in the async methods loadTXT, loadJSON, and getFileMetadata.

Suggested change
public getFileMetadata(filePath: string): UploadedFileMetadata {
const stats = fs.statSync(filePath)
public async getFileMetadata(filePath: string): Promise<UploadedFileMetadata> {
const stats = await fs.promises.stat(filePath)

Copilot uses AI. Check for mistakes.
const filename = path.basename(filePath)
const format = this.detectFileFormat(filename)

if (!format) {
throw new Error(`Unable to detect file format for: ${filename}`)
}

return {
originalName: filename,
mimeType: this.getMimeType(format),
size: stats.size,
path: filePath,
format,
}
}

/**
* Get MIME type from file format
*/
private getMimeType(format: FileFormat): string {
const mimeTypes: Record<FileFormat, string> = {
[FileFormat.CSV]: 'text/csv',
[FileFormat.XLSX]:
'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
[FileFormat.TSV]: 'text/tab-separated-values',
[FileFormat.JSON]: 'application/json',
[FileFormat.PDF]: 'application/pdf',
[FileFormat.TXT]: 'text/plain',
}

return mimeTypes[format] || 'application/octet-stream'
}
}

export default new FileLoaderUtils()
Loading