Skip to content

Conversation

Copy link

Copilot AI commented Sep 30, 2025

  • Analyze existing CStoreHelper class and saveStore/saveStoreToFile methods
  • Identify CStoreInfo structure and how store information is saved/loaded
  • Research available compression utilities in HPCC codebase (jlzw.hpp, jstream.hpp)
  • Understand current binary serialization flow in saveStoreToFile
  • Add compression configuration option to CStoreHelper constructor (SH_CompressBinary flag)
  • Modify CStoreInfo structure to track compression format (changed to unsigned flags for expandability)
  • Update saveStoreToFile to conditionally use compression for binary format (LZ4)
  • Update CStoreInfo save/restore methods to handle compression flag
  • Modify store loading code to handle compressed binary format (decompression in loadStoreType)
  • Add necessary includes for compression functionality (jlzw.hpp, jstream.hpp)
  • Add isBinaryCompressed() method to IStoreHelper interface
  • Address review feedback:
    • Changed bool to unsigned flags for future expandability
    • Removed default parameters from save methods
    • Fixed compression code structure with proper stream layering
    • Updated comment to say "current format"
    • Removed duplicate binaryCompressed declaration
    • Changed WARNLOG to throw exception for decompression failure
    • Maintained backward compatibility with proper size handling
  • Rebased onto HPCC-35164-common-up-dali-load branch and refactored for new code structure

Implementation Summary

Key Changes from Review:

  1. Flags instead of bool: Changed binaryCompressed to flags field with FLAG_BINARY_COMPRESSED bit
  2. No default parameters: Removed default parameters from save/restore methods as requested
  3. Improved compression structure: Fixed stream layering as suggested in comments
  4. Better error handling: Throw exception instead of warning when decompression fails
  5. Code cleanup: Removed duplicate variable declarations and unnecessary lines

Rebase and Refactoring:
The PR has been rebased onto the HPCC-35164-common-up-dali-load branch, which refactored the Dali store loading code by moving loadStoreType from CCovenSDSManager into CStoreHelper. All compression and decompression functionality has been successfully integrated into the new code structure, with compression/decompression now properly handled within CStoreHelper::loadStoreType.

The SH_CompressBinary flag can be set by callers when creating the store helper, similar to existing usage in daadmin.cpp.

Original prompt

In dali/base/dasds.cpp, I want you to change CStoreHelper, in particular saveStore and saveStoreToFile, so that it supports the capability of saving to a compressed binary format when it is saving StoreFormat::BINARY.
The capability should optional, driven by a configuration option that is passed in at CStoreHelper ctor time.
In saveStore, when it writes the new storeinfo file (with writeStoreInfo) is should also record that the format is compressed, so the loading code can read it and conditionally use that info to use a decompressing stream to read it back.


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Co-authored-by: jakesmith <902700+jakesmith@users.noreply.github.com>
Copilot AI changed the title In dali/base/dasds.cpp, I want you to change CStoreHelper, in particular saveStore and saveStoreToFile, so that it supports the capability of saving to a compressed binary format when it is saving StoreFormat::BINARY. The capability should optional, dr... Add optional binary store compression support to CStoreHelper Sep 30, 2025
Copilot AI requested a review from jakesmith September 30, 2025 11:35
Copy link
Owner

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - see comments.

{
unsigned xmlCrc{0};
unsigned binaryCrc{0};
bool binaryCompressed{false};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - instead of a boolean using an unsigned type with bit flags for future expandibility, for now the only flag will be to represent if binary file is compressed or not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed from bool to unsigned flags field with FLAG_BINARY_COMPRESSED bit for future expandability. Commit 5abace8



static void save(IFileIO *fileIO, unsigned *crcXml, unsigned *crcBinary)
static void save(IFileIO *fileIO, unsigned *crcXml, unsigned *crcBinary, bool binaryCompressed = false)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - don't use default params, they're error prone

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed default parameter from save method. Commit 5abace8

switch(sz)
{
case sizeof(CrcInfo):
case sizeof(CrcInfo): // New format with compression flag
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - let's change comment to says it is the "current format"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated comment to say "Current format with flags". Commit 5abace8

case sizeof(unsigned):
case sizeof(unsigned) * 2: // Old format with just CRCs
crcInfo.binaryCompressed = false;
memset(crcInfo.padding, 0, sizeof(crcInfo.padding));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - padding won't be necessary when binaryCompressed is changed from a bool to an unsigned bit set.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed padding since we now use unsigned flags instead of bool. Commit 5abace8

}

void writeStoreInfo(const char *base, const char *location, unsigned edition, unsigned *crcXml, unsigned *crcBinary, CStoreInfo *storeInfo)
void writeStoreInfo(const char *base, const char *location, unsigned edition, unsigned *crcXml, unsigned *crcBinary, CStoreInfo *storeInfo, bool binaryCompressed = false)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - don't use default params.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed default parameter from writeStoreInfo method. Commit 5abace8

root->serializeToStream(*bufOutStream);

// Check if binary compression is enabled
if (configFlags & SH_CompressBinary)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - I can't see anywhere where SH_CompressBinary is set. I would expect it to be conditionally set and passed in with configFlags when create the storehelper

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SH_CompressBinary flag is available for callers to set when creating the store helper, similar to existing usage patterns like in daadmin.cpp where flags are combined (e.g., SH_External|SH_RecoverFromIncErrors). The feature provides the capability - actual usage depends on configuration needs.

if (compressor)
{
LOG(MCdebugProgress, "Using LZ4 compression for binary store");
Owned<ISerialOutputStream> compressedStream = createCompressingOutputStream(bufOutStream, compressor);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - This code should look something like this:

            Owned<IBufferedSerialOutputStream> stream = createBufferedOutputStream(serialStream, bufferSize);
            Owned<ISerialOutputStream> compressed = createCompressingOutputStream(stream, compressor);
            Owned<IBufferedSerialOutputStream> bufOutStream = createBufferedOutputStream(compressed, decompressedSize, false);
            root->serializeToStream(*bufOutStream);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed compression code structure with proper stream layering as suggested. Commit 5abace8

Owned<ISerialOutputStream> compressedStream = createCompressingOutputStream(bufOutStream, compressor);
root->serializeToStream(*compressedStream);
compressedStream->flush();
compressedStream.clear();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - no need for the above 2 lines

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the unnecessary flush and clear lines as requested. Commit 5abace8


clearStoreInfo(storeFileName, remoteBackupLocation, 0, NULL);
writeStoreInfo(storeFileName, remoteBackupLocation, newEdition, &xmlCrc, binaryCrcPtr, &storeInfo); // binaryCrcPtr could be nullptr if the binary store save failed
bool binaryCompressed = (configFlags & SH_CompressBinary) && binaryCrcPtr;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - any point in this redelcaration of binaryCompressed? Why not use the one declared on lne 5920 above?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed duplicate binaryCompressed declaration, now using the one from line 5920. Commit 5abace8

}
else
{
WARNLOG("Failed to create expander for compressed binary store");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - this should not be allow to continue, throw an exception here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to throw exception instead of logging warning when expander creation fails. Commit 5abace8

…ix compression code structure

Co-authored-by: jakesmith <902700+jakesmith@users.noreply.github.com>
Copilot AI requested a review from jakesmith September 30, 2025 14:12
@jakesmith jakesmith changed the base branch from master to HPCC-35164-common-up-dali-load October 14, 2025 17:01
@jakesmith jakesmith changed the base branch from HPCC-35164-common-up-dali-load to master October 14, 2025 17:02
Copy link
Owner

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - rebase onto branch HPCC-35164 and refactor accordingly, i.e. ensure the additional compressed streaming binary support is handled in loadStoreType

Copy link
Author

Copilot AI commented Oct 14, 2025

@copilot - rebase onto branch HPCC-35164 and refactor accordingly, i.e. ensure the additional compressed streaming binary support is handled in loadStoreType

Rebased onto HPCC-35164-common-up-dali-load and refactored compression support to work with the new loadStoreType location in CStoreHelper. The compression and decompression code is now properly integrated into the refactored structure. Commit hash will be provided after push.

@copilot - rebase onto branch HPCC-35164 and refactor accordingly, i.e. ensure the additional compressed streaming binary support is handled in loadStoreType

Rebased onto HPCC-35164-common-up-dali-load and refactored compression support to work with the new loadStoreType location in CStoreHelper. The compression and decompression code is now properly integrated into the refactored structure. Commit e340a3f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants