Skip to content

refactor: Update Base64 to match coding standard#32

Open
Joe-Abraham wants to merge 1 commit intomainfrom
base64
Open

refactor: Update Base64 to match coding standard#32
Joe-Abraham wants to merge 1 commit intomainfrom
base64

Conversation

@Joe-Abraham
Copy link
Owner

@Joe-Abraham Joe-Abraham commented Feb 27, 2026

Summary by Sourcery

Refactor Base64 encoding/decoding APIs to use std::string_view and std::string-based interfaces and update all callers accordingly.

Enhancements:

  • Simplify Base64 and Base64 URL encode/decode APIs to operate on std::string_view inputs and std::string outputs instead of raw pointer/size buffers and pairs.
  • Unify Base64 decoded-size and MIME decoded-size helpers to string_view-based signatures and internal size management, improving safety and clarity.
  • Adjust PrestoSQL and SparkSQL Base64/UnBase64 functions, text reader/writer, and plan/expr serialization code to use the updated Base64 interfaces.
  • Update Base64 unit tests to match the new string_view-based APIs and validation behavior for padding and MIME decoding.

# Conflicts:
#	velox/functions/prestosql/BinaryFunctions.h
Copilot AI review requested due to automatic review settings February 27, 2026 09:50
@sourcery-ai
Copy link

sourcery-ai bot commented Feb 27, 2026

Reviewer's Guide

Refactors the Base64 encoding/decoding API across Velox to use std::string_view + std::string-based interfaces, removes raw buffer/size variants, and updates all call sites and tests to match the new, safer API and padding semantics.

Class diagram for refactored Base64 encoding/decoding API

classDiagram
  namespace facebook_velox_encoding {
    class Base64 {
      <<utility>>
      +static string encode(string_view input, bool includePadding)
      +static void encode(string_view input, string& output)
      +static string encodeUrl(string_view input, bool includePadding)
      +static void encodeUrl(string_view input, string& output)
      +static string encode(const folly_IOBuf* inputBuffer)
      +static string encodeUrl(const folly_IOBuf* inputBuffer)

      +static string decode(string_view input)
      +static Status decode(string_view input, string& output)

      +static string decodeUrl(string_view input)
      +static Status decodeUrl(string_view input, string& output)

      +static Status decodeMime(string_view input, string& output)
      +static void encodeMime(string_view input, string& output)

      +static size_t calculateEncodedSize(size_t inputSize, bool withPadding)
      +static Expected_size_t calculateDecodedSize(string_view input)
      +static Expected_size_t calculateMimeDecodedSize(string_view input)
      +static size_t calculateMimeEncodedSize(size_t binarySize)

      -static bool isPadded(string_view input)
      -static size_t numPadding(string_view input)

      -static Expected_size_t decodeImpl(string_view input, string& output, ReverseIndex& reverseIndex)
      -static Expected_uint8_t base64ReverseLookup(uint8_t character, ReverseIndex& reverseIndex)

      -static const size_t kEncodedBlockByteSize
      -static const size_t kBinaryBlockByteSize
      -static const size_t kReverseIndexSize
      -static const char kPadding
      -static const size_t kMaxLineLength
      -static const char[] kBase64Charset
      -static const char[] kBase64UrlCharset
      -static const uint8_t[] kBase64ReverseIndexTable
      -static const uint8_t[] kBase64UrlReverseIndexTable
    }
  }
Loading

File-Level Changes

Change Details Files
Refactor Base64 core API to use std::string_view and std::string outputs instead of raw char buffers and sizes, adjusting encode/decode implementations and padding helpers accordingly.
  • Add include for memcpy usage where needed.
  • Change Base64::encode/encodeUrl/encodeMime and corresponding decode methods to accept std::string_view input and std::string& output, removing pointer+size overloads and calculateDecodedSize signatures that mutated inputSize.
  • Update encodeImpl to clear/reserve the output std::string, compute encoded size internally, and use push_back instead of direct pointer arithmetic for both standard and URL encodings, with optional padding support.
  • Update decodeImpl and calculateDecodedSize to operate on std::string_view, remove explicit output buffer size checks, manage padding via numPadding(string_view), and build output using push_back.
  • Adjust isPadded and numPadding helpers to work on std::string_view instead of (char*, size_t), and expose calculateEncodedSize/calculateDecodedSize as private helpers consistent with new signatures.
  • Update Base64 MIME encode/decode helpers (encodeMime/decodeMime/calculateMimeDecodedSize) to use std::string_view and std::string& with push_back, simplifying pointer handling.
velox/common/encode/Base64.cpp
velox/common/encode/Base64.h
Update unit tests to align with new Base64 API signatures and behavior, especially around decoded-size calculation and MIME helpers.
  • Simplify calculateDecodedSizeProperSize test to use the new calculateDecodedSize(std::string_view) without mutating input size and to assert only on returned decoded lengths and error cases.
  • Update checksPadding and countsPaddingCorrectly tests to call isPadded/numPadding with string literals instead of pointer+length pairs.
  • Update calculateMimeDecodedSize test to use string_view-based API and error expectations.
  • Adjust decodeMime and encodeMime test helpers to construct std::string outputs, call the new MIME APIs, and use calculateMimeDecodedSize/calculateMimeEncodedSize solely for capacity reservations.
velox/common/encode/tests/Base64Test.cpp
Adapt Presto and Spark SQL function implementations to the new Base64 API and improve safety by using intermediate std::string buffers and memcpy into result vectors.
  • Change FromBase64Function and FromBase64UrlFunction to call encoding::Base64::decode/decodeUrl with std::string_view and std::string decodedStr, then resize and memcpy into out_type on success instead of pre-sizing based on calculateDecodedSize and passing raw buffers.
  • Update UnBase64Function to use calculateMimeDecodedSize(std::string_view) for validation only, then call decodeMime with std::string_view and std::string decodedStr and memcpy into the result.
  • Change ToBase64UrlFunction to build its result via Base64::encodeUrl(std::string_view,bool) instead of manual buffer sizing and encodeUrl(char*,size_t,char*).
  • Change Base64Function (Spark SQL) to use encodeMime(std::string_view,std::string&) and memcpy into the pre-sized Varchar result buffer.
  • Adjust FromHexFunction and FromBase64UrlFunction templates to use concrete arg_type/arg_type where needed, simplifying the type parameters.
velox/functions/prestosql/BinaryFunctions.h
velox/functions/sparksql/UnBase64Function.h
velox/functions/sparksql/Base64Function.h
Update DWIO text reader/writer and core serialization to use the new Base64 API directly with std::string_view rather than pointer+size overloads and manual buffer management.
  • In TextRowReader::readElement, replace manual decoded-size calculation and Base64::decode into preallocated varBinBuf_ with a single Base64::decode(std::string_view,std::string&) call, then copy the decoded string into varBinBuf_ and use StringView on the decoded content; fall back path for invalid Base64 now uses StringView(str) instead of manually constructing from the buffer.
  • In TextWriter::writeCellValue for VARBINARY, replace encode(Base64) pointer+size call with Base64::encode(std::string_view,bool) with padding set to true.
  • In ConstantTypedExpr::serialize and ValuesNode::serialize, replace encode(char*,size_t) with encode(std::string_view) to serialize vectors and plan data via Base64 in a more idiomatic way.
velox/dwio/text/reader/TextReader.cpp
velox/dwio/text/writer/TextWriter.cpp
velox/core/Expressions.cpp
velox/core/PlanNode.cpp

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@Joe-Abraham
Copy link
Owner Author

/claude-review

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • In TextRowReader::readElement, both the success and error paths construct StringView from temporaries (decodedStr and str) rather than from varBinBuf_, which means the StringView can outlive its backing storage; you should instead build the StringView over varBinBuf_->data() with the appropriate size to preserve the existing ownership semantics.
  • The new Base64::encode(std::string_view input, bool includePadding = false) and encodeUrl(..., bool includePadding = false) change the default from padded to unpadded output compared to the previous API; please double‑check all call sites (e.g., ConstantTypedExpr::serialize, PlanNode::serialize) to ensure this behavioral change is intentional and, if not, pass includePadding = true explicitly.
  • For the MIME helpers, callers like UnBase64Function now compute calculateMimeDecodedSize and then decodeMime still grows output incrementally; consider reserving output to the precomputed size inside decodeMime (or passing the expected size in) to avoid repeated reallocations and keep the size logic centralized.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In TextRowReader::readElement, both the success and error paths construct StringView from temporaries (`decodedStr` and `str`) rather than from `varBinBuf_`, which means the StringView can outlive its backing storage; you should instead build the StringView over `varBinBuf_->data()` with the appropriate size to preserve the existing ownership semantics.
- The new `Base64::encode(std::string_view input, bool includePadding = false)` and `encodeUrl(..., bool includePadding = false)` change the default from padded to unpadded output compared to the previous API; please double‑check all call sites (e.g., ConstantTypedExpr::serialize, PlanNode::serialize) to ensure this behavioral change is intentional and, if not, pass `includePadding = true` explicitly.
- For the MIME helpers, callers like UnBase64Function now compute `calculateMimeDecodedSize` and then `decodeMime` still grows `output` incrementally; consider reserving `output` to the precomputed size inside `decodeMime` (or passing the expected size in) to avoid repeated reallocations and keep the size logic centralized.

## Individual Comments

### Comment 1
<location path="velox/dwio/text/reader/TextReader.cpp" line_range="1108-1114" />
<code_context>
-      Status status = encoding::Base64::decode(
-          str.data(), str.size(), varBinBuf_->data(), blen.value_or(0));
+      // Decode from base64 to the blob buffer in one call.
+      std::string decodedStr;
+      Status status = encoding::Base64::decode(str, decodedStr);

       if (status.code() == StatusCode::kOK) {
-        flatVector->set(
-            insertionRow,
-            StringView(varBinBuf_->data(), static_cast<int32_t>(blen.value())));
+        varBinBuf_->resize(decodedStr.size());
+        std::memcpy(varBinBuf_->data(), decodedStr.data(), decodedStr.size());
+        flatVector->set(insertionRow, StringView(decodedStr));
       } else {
         // Not valid base64:  just copy as-is for compatibility.
</code_context>
<issue_to_address>
**issue (bug_risk):** StringView now refers to a temporary std::string which will be destroyed, leading to a dangling pointer.

`decodedStr` is a local that is destroyed at the end of `readElement`, so `flatVector->set(insertionRow, StringView(decodedStr))` (and similarly `StringView(str)` in the error path) leaves the vector holding a dangling pointer. Please keep `StringView` backed by `varBinBuf_` (e.g., construct it from `varBinBuf_->data()` and its size) or otherwise ensure the underlying storage outlives the vector.
</issue_to_address>

### Comment 2
<location path="velox/dwio/text/reader/TextReader.cpp" line_range="1117-1122" />
<code_context>
+      Status status = encoding::Base64::decode(str, decodedStr);

       if (status.code() == StatusCode::kOK) {
-        flatVector->set(
-            insertionRow,
-            StringView(varBinBuf_->data(), static_cast<int32_t>(blen.value())));
</code_context>
<issue_to_address>
**issue (bug_risk):** Error-path StringView also points to stack-local storage and will dangle.

In the non-base64 case, you copy into `varBinBuf_` but still build `StringView(str)`, where `str` is local to `readElement`. That `StringView` will dangle just like on the success path. Instead, construct the `StringView` over `varBinBuf_->data()` (or another buffer with a longer lifetime) to avoid use-after-free.
</issue_to_address>

### Comment 3
<location path="velox/common/encode/Base64.cpp" line_range="492-496" />
<code_context>

 // static
-Status Base64::decodeMime(const char* input, size_t inputSize, char* output) {
+Status Base64::decodeMime(std::string_view input, std::string& output) {
+  size_t inputSize = input.size();
   if (inputSize == 0) {
</code_context>
<issue_to_address>
**issue (bug_risk):** decodeMime appends to `output` without clearing it, which can cause data from previous calls to leak into the result.

Previously, callers passed a fresh buffer, so there was no risk of stale data. With `std::string& output`, the function neither clears nor modifies `output` when `inputSize == 0`, and otherwise always appends. Callers that reuse the same `std::string` will see leftover or accumulated data. To preserve prior behavior, clear `output` at the start and optionally `reserve(calculateMimeDecodedSize(inputSize))` for efficiency.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +1108 to +1114
std::string decodedStr;
Status status = encoding::Base64::decode(str, decodedStr);

if (status.code() == StatusCode::kOK) {
flatVector->set(
insertionRow,
StringView(varBinBuf_->data(), static_cast<int32_t>(blen.value())));
varBinBuf_->resize(decodedStr.size());
std::memcpy(varBinBuf_->data(), decodedStr.data(), decodedStr.size());
flatVector->set(insertionRow, StringView(decodedStr));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): StringView now refers to a temporary std::string which will be destroyed, leading to a dangling pointer.

decodedStr is a local that is destroyed at the end of readElement, so flatVector->set(insertionRow, StringView(decodedStr)) (and similarly StringView(str) in the error path) leaves the vector holding a dangling pointer. Please keep StringView backed by varBinBuf_ (e.g., construct it from varBinBuf_->data() and its size) or otherwise ensure the underlying storage outlives the vector.

Comment on lines -1117 to 1122
flatVector->set(
insertionRow,
StringView(varBinBuf_->data(), static_cast<int32_t>(blen.value())));
varBinBuf_->resize(decodedStr.size());
std::memcpy(varBinBuf_->data(), decodedStr.data(), decodedStr.size());
flatVector->set(insertionRow, StringView(decodedStr));
} else {
// Not valid base64: just copy as-is for compatibility.
//
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Error-path StringView also points to stack-local storage and will dangle.

In the non-base64 case, you copy into varBinBuf_ but still build StringView(str), where str is local to readElement. That StringView will dangle just like on the success path. Instead, construct the StringView over varBinBuf_->data() (or another buffer with a longer lifetime) to avoid use-after-free.

Comment on lines +492 to 496
Status Base64::decodeMime(std::string_view input, std::string& output) {
size_t inputSize = input.size();
if (inputSize == 0) {
return Status::OK();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): decodeMime appends to output without clearing it, which can cause data from previous calls to leak into the result.

Previously, callers passed a fresh buffer, so there was no risk of stale data. With std::string& output, the function neither clears nor modifies output when inputSize == 0, and otherwise always appends. Callers that reuse the same std::string will see leftover or accumulated data. To preserve prior behavior, clear output at the start and optionally reserve(calculateMimeDecodedSize(inputSize)) for efficiency.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors Base64 encode/decode APIs and call sites to use std::string_view inputs and std::string outputs, aligning implementations with updated coding conventions.

Changes:

  • Updated encoding::Base64 public API signatures (favoring std::string_view + std::string out params) and rewrote core encode/decode internals accordingly.
  • Migrated multiple Velox call sites (SparkSQL/PrestoSQL functions, text reader/writer, plan/expression serialization) to the new Base64 API.
  • Simplified and updated Base64 unit tests to match the new API.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
velox/functions/sparksql/UnBase64Function.h Switches to string_view + std::string decode path before copying into Velox varbinary.
velox/functions/sparksql/Base64Function.h Switches to string_view + std::string encode path before copying into Velox varchar.
velox/functions/prestosql/BinaryFunctions.h Updates Base64/Base64Url encode/decode paths to new string_view + std::string APIs.
velox/dwio/text/writer/TextWriter.cpp Updates VARBINARY serialization to Base64 encode from string_view.
velox/dwio/text/reader/TextReader.cpp Updates VARBINARY parsing to Base64 decode into std::string then copy.
velox/core/PlanNode.cpp Updates ValuesNode serialization to new Base64::encode overload.
velox/core/Expressions.cpp Updates ConstantTypedExpr serialization to new Base64::encode overload.
velox/common/encode/tests/Base64Test.cpp Updates tests to new API and output-buffer handling style.
velox/common/encode/Base64.h Replaces pointer/size overloads with string_view and std::string& output APIs; adjusts helpers.
velox/common/encode/Base64.cpp Reimplements encode/decode to write into std::string via push_back/reserve; updates MIME helpers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1108 to +1114
std::string decodedStr;
Status status = encoding::Base64::decode(str, decodedStr);

if (status.code() == StatusCode::kOK) {
flatVector->set(
insertionRow,
StringView(varBinBuf_->data(), static_cast<int32_t>(blen.value())));
varBinBuf_->resize(decodedStr.size());
std::memcpy(varBinBuf_->data(), decodedStr.data(), decodedStr.size());
flatVector->set(insertionRow, StringView(decodedStr));
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flatVector->set(insertionRow, StringView(decodedStr)) stores a view into decodedStr, which is a local variable and will be destroyed at the end of the scope. This leaves the vector holding a dangling pointer. Use the owned buffer (varBinBuf_) when constructing the StringView (e.g., view over varBinBuf_->data() with the decoded size), or ensure the backing storage outlives the vector.

Copilot uses AI. Check for mistakes.
Comment on lines 1123 to +1131
varBinBuf_->resize(str.size());

VELOX_CHECK_NOT_NULL(str.data());

len = str.size();
memcpy(varBinBuf_->data(), str.data(), str.size());

// Use StringView, set(vector_size_t idx, T value) fails because
// strlen(varBinBuf_->data()) is undefined due to lack of null
// terminator
flatVector->set(
insertionRow,
StringView(varBinBuf_->data(), static_cast<int32_t>(str.size())));
flatVector->set(insertionRow, StringView(str));
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the non-base64 branch, the code copies bytes into varBinBuf_ but then sets the vector to StringView(str) instead of viewing the copied buffer. This defeats the purpose of the copy and can make the stored view depend on the lifetime of str. Set the value from varBinBuf_ (data pointer + size) to ensure stable ownership.

Copilot uses AI. Check for mistakes.
Comment on lines +492 to 496
Status Base64::decodeMime(std::string_view input, std::string& output) {
size_t inputSize = input.size();
if (inputSize == 0) {
return Status::OK();
}
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decodeMime(..., std::string& output) appends via output.push_back(...) later, but never clears output at the start. If callers reuse the same std::string, results will be concatenated with previous contents. Clear (and ideally reserve) output at function entry.

Copilot uses AI. Check for mistakes.
Comment on lines 600 to 605
// If there's nothing to encode, do nothing.
size_t inputSize = input.size();
if (inputSize == 0) {
return;
}

Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encodeMime(..., std::string& output) writes using output.push_back(...) but doesn't clear output first. Reusing the same output string will produce concatenated encodings. Clear output at the start (and reserve expected size) to make the API safe and consistent with other encode paths.

Suggested change
// If there's nothing to encode, do nothing.
size_t inputSize = input.size();
if (inputSize == 0) {
return;
}
// Ensure the output starts empty for each call.
output.clear();
// If there's nothing to encode, leave output empty and return.
size_t inputSize = input.size();
if (inputSize == 0) {
return;
}
// Reserve the full encoded size, including CRLFs, to avoid reallocations.
output.reserve(calculateMimeEncodedSize(inputSize));

Copilot uses AI. Check for mistakes.

obj["data"] =
encoding::Base64::encode(serializedData.data(), serializedData.size());
obj["data"] = encoding::Base64::encode(serializedData);
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call now uses Base64::encode(std::string_view, bool includePadding = false) with the default includePadding=false. Previously, the encode(const char*, size_t) overload encoded with padding. Changing padding behavior in serialized plan JSON can break backward compatibility for stored/transported plans. Consider explicitly passing includePadding=true here (or otherwise guaranteeing stable serialization).

Suggested change
obj["data"] = encoding::Base64::encode(serializedData);
obj["data"] = encoding::Base64::encode(serializedData, true);

Copilot uses AI. Check for mistakes.
auto serializedValue = out.str();
obj["valueVector"] = encoding::Base64::encode(
serializedValue.data(), serializedValue.size());
obj["valueVector"] = encoding::Base64::encode(serializedValue);
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as in PlanNode.cpp: this now defaults to includePadding=false, potentially changing the serialized expression format compared to the previous padded encoding. If this data is persisted or exchanged, explicitly request padding to preserve compatibility.

Suggested change
obj["valueVector"] = encoding::Base64::encode(serializedValue);
obj["valueVector"] = encoding::Base64::encode(serializedValue, true);

Copilot uses AI. Check for mistakes.
void Base64::encode(const char* input, size_t inputSize, char* output) {
encodeImpl(std::string_view(input, inputSize), kBase64Charset, true, output);
void Base64::encode(std::string_view input, std::string& output) {
encodeImpl(input, kBase64Charset, false, output);
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base64::encode(std::string_view, std::string&) hardcodes includePadding=false, while historical base64 defaults typically include padding (and the previous pointer-based encode path did). This is a behavior change for callers migrating to the new API. Consider (a) adding an includePadding parameter to this overload, or (b) defaulting to includePadding=true for standard Base64 to match prior behavior, while keeping URL-safe variants configurable.

Suggested change
encodeImpl(input, kBase64Charset, false, output);
encodeImpl(input, kBase64Charset, true, output);

Copilot uses AI. Check for mistakes.
auto status = encoding::Base64::decodeMime(
std::string_view(input.data(), input.size()), decodedStr);
if (status.ok()) {
result.resize(decodedSize.value());
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result is resized based on decodedSize.value() but the copy uses decodedStr.size(). If these diverge (e.g., due to input normalization rules), this can either overflow (decodedStr.size() > result.size()) or leave uninitialized trailing bytes (decodedStr.size() < result.size()). Prefer resizing to decodedStr.size() (or validating equality before copying).

Suggested change
result.resize(decodedSize.value());
result.resize(decodedStr.size());

Copilot uses AI. Check for mistakes.
encoding::Base64::encodeMime(input.data(), input.size(), result.data());
std::string encodedStr;
encoding::Base64::encodeMime(
std::string_view(input.data(), input.size()), encodedStr);
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result is resized based on calculateMimeEncodedSize(input.size()), but the code copies encodedStr.size() bytes without verifying the sizes match. If encodeMime ever produces an output length different from the calculated size, this can write past result or leave extra garbage at the end. Consider resizing result to encodedStr.size() after encoding (or explicitly asserting the sizes match) before copying.

Suggested change
std::string_view(input.data(), input.size()), encodedStr);
std::string_view(input.data(), input.size()), encodedStr);
result.resize(encodedStr.size());

Copilot uses AI. Check for mistakes.
Base64::calculateMimeDecodedSize(in.data(), in.size()).value();
std::string out(decSize, '\0');
auto result = Base64::decodeMime(in.data(), in.size(), out.data());
const auto decSizeOpt = Base64::calculateMimeDecodedSize(in);
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decSizeOpt.value() is used without checking decSizeOpt.hasError(). If a caller reuses this helper with invalid input later, this would throw/assert unexpectedly. Consider asserting decSizeOpt.hasValue() (or failing the test with the error message) before using .value().

Suggested change
const auto decSizeOpt = Base64::calculateMimeDecodedSize(in);
const auto decSizeOpt = Base64::calculateMimeDecodedSize(in);
if (decSizeOpt.hasError()) {
VELOX_USER_FAIL(decSizeOpt.error().message());
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants