Implement restore_backup_to_database for BackupJob with integrity checks and streaming support#25
Conversation
|
Connected to Huly®: MBM-23 |
Co-authored-by: Nonolanlan1007 <84439677+Nonolanlan1007@users.noreply.github.com>
Co-authored-by: Nonolanlan1007 <84439677+Nonolanlan1007@users.noreply.github.com>
…break Co-authored-by: Nonolanlan1007 <84439677+Nonolanlan1007@users.noreply.github.com>
Co-authored-by: Nonolanlan1007 <84439677+Nonolanlan1007@users.noreply.github.com>
src/utils/backup_manager.rs
Outdated
| Err(err) => return yield StreamEvent::Error(format!("Failed to parse metadata file: {err}")), | ||
| }; | ||
|
|
||
| yield StreamEvent::Info(format!("Restoring database: {}", metadata.name)); |
There was a problem hiding this comment.
Before restoring, you should check backup's integrity. To do so you have to create a datastore instance for this backup directory
There was a problem hiding this comment.
Added backup integrity check in commit ab9766f. The code now creates a datastore instance for the backup directory and verifies integrity before starting the restore process.
src/utils/backup_manager.rs
Outdated
| yield StreamEvent::Info(format!("Restoring collection: {}", collection_name)); | ||
|
|
||
| let collection_file_path = format!("{}/{}.json", backup_dir, collection_name); | ||
| let collection_content = match self.datastore.get_object(collection_file_path) { |
There was a problem hiding this comment.
Here you should create a open_read_stream in the Datastore trait and use it here to avoid loading the entire collection in RAM
There was a problem hiding this comment.
Added open_read_stream method to DatastoreTrait in commit ab9766f. The method is now implemented for FilesystemDatastore and S3Datastore, and the restore function uses it to stream collection data instead of loading everything into RAM.
src/utils/backup_manager.rs
Outdated
| self.encryption_key.is_some() | ||
| } | ||
|
|
||
| pub fn restore_backup_to_database(&self, backup_dir: String) -> impl Stream<Item = StreamEvent> { |
There was a problem hiding this comment.
And what if I want to restore the backup into another database?
There was a problem hiding this comment.
Added target_database_name: Option<String> parameter in commit ab9766f. You can now restore to a different database by passing Some("other_db_name"), or use None to restore to the original database from the BackupJob configuration.
…read support Co-authored-by: Nonolanlan1007 <84439677+Nonolanlan1007@users.noreply.github.com>
Co-authored-by: Nonolanlan1007 <84439677+Nonolanlan1007@users.noreply.github.com>
src/utils/backup_manager.rs
Outdated
|
|
||
| // Read the file content for JSON parsing | ||
| let mut collection_content = String::new(); | ||
| if let Err(err) = read_stream.read_to_string(&mut collection_content).await { |
There was a problem hiding this comment.
Here you still load the entire file in memory. You must not do that. You have to find a way to do not load the entire file. You must load a part, do what you have to do with it, free up the data, load the next part, etc until the end of the file
There was a problem hiding this comment.
Implemented true streaming in commit e24bc53. The code now:
- Reads the collection header separately (without data array)
- Processes the file in 8KB chunks using a buffered reader
- Parses JSON documents incrementally as they're read
- Inserts documents in batches and clears them from memory
- Never loads the entire file into memory
Documents are parsed by tracking brace depth as chunks are read, and only complete documents are deserialized and added to batches for insertion.
… memory Co-authored-by: Nonolanlan1007 <84439677+Nonolanlan1007@users.noreply.github.com>
…operations Co-authored-by: Nonolanlan1007 <84439677+Nonolanlan1007@users.noreply.github.com>
src/utils/backup_manager.rs
Outdated
|
|
||
| // Read lines until we find the "data" field | ||
| loop { | ||
| line.clear(); |
There was a problem hiding this comment.
The .json files are in one line to reduce useless disk usage
There was a problem hiding this comment.
Fixed in commit 2d98a1e. The code now correctly handles minified JSON files by:
- Reading chunks and searching for the
"data":[marker instead of using line-by-line parsing - Extracting the header portion before the marker
- Processing the remaining data stream for documents
The implementation no longer assumes the JSON is formatted with line breaks.
Co-authored-by: Nonolanlan1007 <84439677+Nonolanlan1007@users.noreply.github.com>
Adds restore capability to
BackupJobby implementingrestore_backup_to_databasemethod that reverses the backup process.Implementation
.database.jsonto enumerate collections and verify backup integrity via hashes"data":[markertarget_database_nameparameter to restore to a different databaseCreateCollectionOptionsIndexModeldefinitionsDOCUMENTS_BATCH_SIZE)StreamEventexecute()for consistencyNew Features Added
open_read_streamin DatastoreTrait: New async method implemented forFilesystemDatastoreandS3Datastoreto enable streaming file accessTechnical Details
The streaming implementation:
BufReaderwith 8KB chunks for efficient I/O"data":[marker in chunk stream to locate header/data boundaryserde_json::to_string()Usage
Method returns
impl Stream<Item = StreamEvent>for async progress reporting, mirroring theexecute()pattern.Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.