3634: Improve code quality and increase PHPStan level#30
3634: Improve code quality and increase PHPStan level#30martin-augment wants to merge 2 commits intomainfrom
Conversation
WalkthroughThis pull request refactors the Apache Avro PHP library to improve code consistency and type safety. Changes include renaming properties and parameters from snake_case to camelCase throughout multiple classes, adding explicit type hints and return type declarations to method signatures, and updating PHPDoc annotations from ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code Review: PR #30 - Improve code quality and increase PHPStan levelSummaryThis PR makes significant improvements to the PHP codebase by modernizing type hints, improving naming conventions, and increasing PHPStan level to 6. Overall, this is a positive change that enhances code quality and maintainability. ✅ Strengths1. Excellent Type System Modernization
2. Better Code Quality
3. Cleaner Architecture
4. PHPStan Level 6
|
| Aspect | Rating | Notes |
|---|---|---|
| Type Safety | ⭐⭐⭐⭐⭐ | Excellent improvements |
| Naming Conventions | ⭐⭐⭐⭐⭐ | Consistent camelCase |
| Documentation | ⭐⭐⭐⭐ | Good PHPDoc, could document baseline |
| Test Coverage | ⭐⭐⭐ | Relies on existing tests, needs new ones |
| Breaking Changes | ⭐⭐⭐ | Minor concerns, needs documentation |
✅ Verdict: Approve with Recommendations
This PR significantly improves code quality and is ready to merge with the understanding that:
- Follow-up work is needed to address the 37 baselined PHPStan errors
- Additional test coverage should be added (can be in a follow-up PR)
- Breaking changes should be documented in release notes
- The
if.alwaysFalsesuppression should be investigated
The pragmatic approach of using a baseline for incremental improvement is sound. The code is cleaner, safer, and more maintainable after these changes.
📝 Action Items
Before Merge:
- Investigate the
if.alwaysFalsesuppression in AvroDataIOReader.php:100
Follow-up PRs:
- Add unit tests for extracted compression methods
- Create issues to track resolution of baselined PHPStan errors
- Document breaking changes in CHANGELOG/release notes
- Add migration guide for class extension scenarios
Great work on modernizing the codebase! 🎉
Summary of ChangesHello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the code quality and maintainability of the PHP Avro library by introducing comprehensive type hinting, adhering to stricter static analysis rules (PHPStan level 6), and refactoring several core components for better readability and error handling. Key changes include consistent variable naming, explicit type declarations for parameters and return values, and modularization of compression/decompression logic. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🤖 Augment PR SummarySummary: This PR focuses on improving the PHP implementation’s code quality and raising static analysis coverage. Changes:
Technical Notes: The PHPStan config now includes the baseline, analyzes 🤖 Was this summary useful? React with 👍 or 👎 |
| case self::ERROR_SCHEMA: | ||
| case self::REQUEST_SCHEMA: | ||
| if (!($expected_schema instanceof AvroRecordSchema)) { | ||
| if (!$expected_schema instanceof AvroRecordSchema) { |
There was a problem hiding this comment.
!$expected_schema instanceof AvroRecordSchema is affected by operator precedence (it’s parsed as (!$expected_schema) instanceof ...), so this guard will never trigger and non-record schemas could be treated as records.
The same ! ... instanceof precedence issue also appears in other locations.
Other Locations
lang/php/lib/Datum/AvroIODatumReader.php:231
🤖 Was this useful? React with 👍 or 👎
| foreach ($fieldSchema->fields() as $field) { | ||
| $field_name = $field->name(); | ||
| if (!$json_val = $default_value[$field_name]) { | ||
| if (!$json_val = $defaultValue[$field_name]) { |
| break; | ||
| case self::SEEK_CUR: | ||
| if (0 > $this->current_index + $whence) { | ||
| if (0 > $this->currentIndex + $whence) { |
| if (!extension_loaded('snappy')) { | ||
| throw new AvroException('Please install ext-snappy to use snappy compression.'); | ||
| } | ||
| $crc32 = unpack('N', substr((string) $compressed, -4))[1]; |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| return $writersSchemaType === $readersSchemaType; | ||
| } | ||
|
|
||
| switch ($readers_schema_type) { |
There was a problem hiding this comment.
Primitive type promotion broken by early return
High Severity
The new primitive type check in schemasMatch returns false when writer and reader types differ, even when type promotion is valid per the Avro spec. For example, if a writer uses int and a reader uses long, both are primitives, so the condition passes, but since 'int' !== 'long', it returns false immediately. The type promotion logic at lines 185-208 (allowing int→long, long→float, float→double) is never reached, breaking schema evolution for primitive types.
Additional Locations (1)
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lang/php/lib/Datum/AvroIODatumReader.php (1)
457-526: Handle falsy default values correctly in record defaults.
The current check treats valid defaults like0,false, or""as missing and can also emit undefined-index notices. Preferarray_key_existsto distinguish “missing” from “falsy.”🐛 Proposed fix
- foreach ($fieldSchema->fields() as $field) { + foreach ($fieldSchema->fields() as $field) { $field_name = $field->name(); - if (!$json_val = $defaultValue[$field_name]) { - $json_val = $field->default_value(); - } + if (is_array($defaultValue) && array_key_exists($field_name, $defaultValue)) { + $json_val = $defaultValue[$field_name]; + } else { + $json_val = $field->default_value(); + } $record[$field_name] = $this->readDefaultValue( $field->type(), $json_val ); }
🤖 Fix all issues with AI agents
In `@lang/php/lib/Avro.php`:
- Around line 63-70: Update the docblock on the usesGmp() method to reference
the renamed property $bigIntegerMode (instead of Avro::$biginteger_mode);
mention the dependency on Avro::check64Bit() (or Avro::checkPlatform()) setting
Avro::$bigIntegerMode so the documentation matches the actual property name and
casing.
In `@lang/php/lib/DataFile/AvroDataIOReader.php`:
- Around line 214-221: The docblock for readBlockHeader() currently says
"@return int" but the method signature returns string|int; update the docblock
to "@return string|int" (and adjust the descriptive text if present) so the
PHPDoc matches the method signature, referencing the readBlockHeader() method
and its use of $this->decoder->readLong(); alternatively, if you want to
guarantee an int, change the signature to return int and cast the decoder result
to int where readLong() is returned.
In `@lang/php/lib/DataFile/AvroDataIOWriter.php`:
- Around line 86-96: The writers schema passed into AvroDataIOWriter is not
being applied to the injected datumWriter, leaving datumWriter potentially
unconfigured; after setting $this->metadata[AvroDataIO::METADATA_SCHEMA_ATTR]
and before writeHeader() in the AvroDataIOWriter initialization, assign the
passed $writersSchema to the datum writer (i.e., set the datumWriter's
writersSchema property or equivalent on $this->datumWriter) so
$this->datumWriter and $this->metadata stay in sync.
In `@lang/php/lib/Datum/AvroIODatumWriter.php`:
- Around line 136-140: The throw in AvroIODatumWriter is accessing
$writersSchema->type as a property while the rest of the class (e.g., earlier at
line 76) uses the method $writersSchema->type(); update the throw to call
$writersSchema->type() instead of accessing the property, and scan
AvroIODatumWriter for any other occurrences of ->type property access to make
them consistent with the method form.
In `@lang/php/lib/IO/AvroStringIO.php`:
- Around line 132-136: In the SEEK_CUR branch of AvroStringIO (case
self::SEEK_CUR) the boundary check incorrectly uses $whence instead of $offset;
update the check to use $offset so it reads the intended condition (e.g., verify
0 > $this->currentIndex + $offset) and keep the subsequent update
$this->currentIndex += $offset unchanged, ensuring seeks before the buffer start
are detected correctly.
In `@lang/php/lib/Protocol/AvroProtocol.php`:
- Around line 33-38: Update the phpstan array-shape alias
AvroProtocolDefinitionArray to mark the "types" and "messages" keys as optional
(use the nullable/optional `?` shape syntax) so it matches how realParse()
checks `array_key_exists("types", $avro)` and uses `$avro["messages"] ?? null`;
edit the AvroProtocolDefinitionArray type (and the similar shape around lines
71-75) so `types` and `messages` are optional in the shape, ensuring the static
type matches the runtime checks in realParse().
In `@lang/php/lib/Protocol/AvroProtocolMessage.php`:
- Around line 32-35: Update the PHPStan type alias for the Avro protocol message
to make response optional to match the constructor's behavior: locate the
AvroProtocolMessage class/type definitions (symbols
AvroProtocolMessageDefinitionArray and the other similar alias around lines
43-50) and change the shape so response is optional (e.g., use response?: string
or response: string|null) in both occurrences; ensure any dependent
phpstan-import-type references remain unchanged so the constructor's
array_key_exists('response', $avro) and the ?AvroSchema typed property are
consistent with the type.
In `@lang/php/lib/Schema/AvroEnumSchema.php`:
- Around line 94-98: Update the AvroEnumSchema::symbolIndex docblock to reflect
the tightened signature and new exception behavior: change/add `@param` string
$symbol and `@return` int, add `@throws` \TypeError to indicate non-string arguments
will raise a TypeError, and keep/clarify `@throws` AvroException for cases where
the symbol is not present; include a short note that callers must pass a string
to avoid TypeError (this clarifies the public API BC change).
In `@lang/php/lib/Schema/AvroRecordSchema.php`:
- Around line 64-65: The destructuring assigns an unused variable $x from
nameAndNamespace(); update the assignment in AvroRecordSchema (around the call
to nameAndNamespace()) to discard the first element (e.g., use an
underscore-prefixed variable like $_ or a two‑slot list with the first slot
ignored) so only $namespace is used, then pass $namespace into
self::parseFields($fields, $namespace, $schemata); keep parseFields and
nameAndNamespace() unchanged.
In `@lang/php/test/InterOpTest.php`:
- Around line 23-38: In InterOpTest::setUp, guard the file_get_contents call
before assigning to the typed property $projectionJson and before parsing into
$projection: call file_get_contents(AVRO_INTEROP_SCHEMA) into a local variable,
check if the result === false (or !is_string), and if so call
$this->fail("Unable to read AVRO_INTEROP_SCHEMA: file missing or unreadable") or
throw a clear exception; otherwise assign the string to $this->projectionJson
and then call AvroSchema::parse($this->projectionJson) to populate
$this->projection.
🧹 Nitpick comments (16)
lang/php/lib/Schema/AvroArraySchema.php (1)
62-68: PHPDoc tag correction looks good, but return type union may be inconsistent.The
@returns→@returnfix is correct. However, the declared return typeAvroName|AvroSchemaappears inconsistent with the actual implementation: the$itemsproperty (line 32) is typed asAvroSchema, and the method simply returns$this->items. TheAvroNamevariant in the union type seems unreachable.Consider whether the return type should be just
AvroSchema, or if the property type needs updating to match the declared union.lang/php/lib/IO/AvroStringIO.php (1)
82-84: Stale TODO comment references old naming convention.The TODO comment on line 82 still mentions
current_indexbut the property has been renamed tocurrentIndex.📝 Proposed fix
/** - * `@todo` test for fencepost errors wrt updating current_index + * `@todo` test for fencepost errors wrt updating currentIndex * `@param` mixed $len * `@return` string bytes read from buffer */lang/php/lib/AvroDebug.php (1)
57-65: PHPDoc references old parameter name.The docblock on line 58 still references
$debug_levelbut the parameter has been renamed to$debugLevel.📝 Proposed fix
/** - * `@return` bool true if the given $debug_level is equivalent + * `@return` bool true if the given $debugLevel is equivalent * or more verbose than than the current debug level * and false otherwise. */lang/php/lib/AvroGMP.php (3)
78-103: Missing explicit return type declaration.The
shiftLeftmethod has@return \GMPin its PHPDoc but lacks an explicit return type in the method signature. For consistency with the private helper methods (e.g.,gmp_0xfs(): \GMP), consider adding the return type.Also, line 79 has a typo:
@interalshould be@internal.♻️ Proposed fix
/** - * `@interal` Only works up to shift 63 (doesn't wrap bits around). + * `@internal` Only works up to shift 63 (doesn't wrap bits around). * `@param` int $shift number of bits to shift left * `@return` \GMP $g shifted left */ - public static function shiftLeft(int|string|\GMP $g, int $shift) + public static function shiftLeft(int|string|\GMP $g, int $shift): \GMP
105-111: Missing explicit return type declaration.The
gmpTwosComplementmethod has@return \GMPin PHPDoc but lacks an explicit return type declaration for consistency.♻️ Proposed fix
/** * `@return` \GMP resource 64-bit two's complement of input. */ - public static function gmpTwosComplement(int|string|\GMP $g) + public static function gmpTwosComplement(int|string|\GMP $g): \GMP
113-141: Missing explicit return type declaration.The
shiftRightmethod has@return \GMPin PHPDoc but lacks an explicit return type declaration for consistency.♻️ Proposed fix
/** * Arithmetic right shift * `@param` int $shift number of bits to shift right * `@return` \GMP $g shifted right $shift bits */ - public static function shiftRight(int|string|\GMP $g, int $shift) + public static function shiftRight(int|string|\GMP $g, int $shift): \GMPlang/php/lib/DataFile/AvroDataIOReader.php (2)
94-94: Use strict comparison for consistency.Line 94 uses loose comparison (
==) while other codec checks in this file use strict comparison (===). For consistency and to avoid type coercion issues, use strict comparison here as well.Proposed fix
- if (0 == $this->blockCount) { + if (0 === $this->blockCount) {
274-291: Potential issue withunpack()return value.
unpack()can returnfalseon failure. While unlikely with controlled input, accessing[1]without validating the return could cause an error.Defensive fix
private function snappyUncompress(string $compressed): string { if (!extension_loaded('snappy')) { throw new AvroException('Please install ext-snappy to use snappy compression.'); } - $crc32 = unpack('N', substr((string) $compressed, -4))[1]; + $unpacked = unpack('N', substr($compressed, -4)); + if (false === $unpacked) { + throw new AvroException('snappy uncompression failed - could not read CRC32.'); + } + $crc32 = $unpacked[1]; - $datum = snappy_uncompress(substr((string) $compressed, 0, -4)); + $datum = snappy_uncompress(substr($compressed, 0, -4)); if (false === $datum) { throw new AvroException('snappy uncompression failed.'); } if ($crc32 !== crc32($datum)) { throw new AvroException('snappy uncompression failed - crc32 mismatch.'); } return $datum; }Note: The
(string)casts on lines 279-280 are redundant since$compressedis already typed asstring.lang/php/lib/Schema/AvroRecordSchema.php (1)
152-167: Consider memoization forfieldsByAlias().Unlike
fieldsHash()which memoizes the result,fieldsByAlias()rebuilds the hash on every call. If this method is called frequently, consider applying the same memoization pattern.Proposed memoization pattern
+ /** + * `@var` null|array<string, AvroField> map of field aliases to field objects. + */ + private ?array $fieldsByAliasHash = null; + /** * `@return` array<string, AvroField> */ public function fieldsByAlias(): array { + if (is_null($this->fieldsByAliasHash)) { $hash = []; foreach ($this->fields as $field) { if ($field->hasAliases()) { foreach ($field->getAliases() as $a) { $hash[$a] = $field; } } } + $this->fieldsByAliasHash = $hash; + } - return $hash; + return $this->fieldsByAliasHash; }lang/php/lib/Datum/AvroIODatumWriter.php (3)
143-159: Inconsistent parameter naming.The
writeBytesmethod still uses$writers_schema(snake_case) while other methods use$writersSchema(camelCase). Update for consistency with the rest of the file.Proposed fix
- private function writeBytes(AvroSchema $writers_schema, string $datum, AvroIOBinaryEncoder $encoder): void + private function writeBytes(AvroSchema $writersSchema, string $datum, AvroIOBinaryEncoder $encoder): void { - $logicalType = $writers_schema->logicalType(); + $logicalType = $writersSchema->logicalType();
178-193: Incomplete camelCase rename in writeMap.The local variable
$datum_counton line 184 should be renamed to$datumCountfor consistency with the camelCase refactoring in this PR.Proposed fix
private function writeMap(AvroMapSchema $writersSchema, array $datum, AvroIOBinaryEncoder $encoder): void { - $datum_count = count($datum); - if ($datum_count > 0) { - $encoder->writeLong($datum_count); + $datumCount = count($datum); + if ($datumCount > 0) { + $encoder->writeLong($datumCount);
252-271: Incomplete camelCase rename in writeUnion.The local variables
$datum_schema_indexand$datum_schemaon lines 254-255 should be renamed to camelCase for consistency.Proposed fix
private function writeUnion(AvroUnionSchema $writersSchema, mixed $datum, AvroIOBinaryEncoder $encoder): void { - $datum_schema_index = -1; - $datum_schema = null; + $datumSchemaIndex = -1; + $datumSchema = null; foreach ($writersSchema->schemas() as $index => $schema) { if (AvroSchema::isValidDatum($schema, $datum)) { - $datum_schema_index = $index; - $datum_schema = $schema; + $datumSchemaIndex = $index; + $datumSchema = $schema; break; } } - if (is_null($datum_schema)) { + if (is_null($datumSchema)) { throw new AvroIOTypeException($writersSchema, $datum); } - $encoder->writeLong($datum_schema_index); - $this->writeValidatedData($datum_schema, $datum, $encoder); + $encoder->writeLong($datumSchemaIndex); + $this->writeValidatedData($datumSchema, $datum, $encoder); }lang/php/lib/DataFile/AvroDataIO.php (2)
132-165: Inconsistent parameter naming in openFile.The parameter
$file_pathuses snake_case while other parameters use camelCase ($schemaJson). Consider renaming to$filePathfor consistency.Proposed fix
public static function openFile( - string $file_path, + string $filePath, string $mode = AvroIO::READ_MODE, ?string $schemaJson = null, string $codec = self::NULL_CODEC ): AvroDataIOReader|AvroDataIOWriter {
170-173: Simplify null check in isValidCodec.Since
$codecis typed as?string, theis_string($codec)check is essentially a null check. Consider using!is_null($codec)or simply$codec !== nullfor clarity.Proposed fix
public static function isValidCodec(?string $codec): bool { - return is_string($codec) && in_array($codec, self::validCodecs()); + return null !== $codec && in_array($codec, self::validCodecs(), true); }Note: Also added
trueas the third parameter toin_arrayfor strict comparison.lang/php/lib/Datum/AvroIOBinaryDecoder.php (2)
161-164: Minor: Remove empty line in skipNull.The empty line in the method body is unnecessary.
Proposed fix
public function skipNull(): void { - }
218-221: Consider suppressing unused parameter warning.The
$writersSchemaparameter inskipEnumis unused but likely kept for API consistency with other skip methods. Add a PHPDoc annotation to suppress the warning and document the intent.Proposed fix
+ /** + * `@param` AvroSchema $writersSchema kept for API consistency with other skip methods + * `@SuppressWarnings`(PHPMD.UnusedFormalParameter) + */ public function skipEnum(AvroSchema $writersSchema, AvroIOBinaryDecoder $decoder): void { $decoder->skipInt(); }
| * @return bool true if the PHP GMP extension is used and false otherwise. | ||
| * @internal Requires Avro::check64Bit() (exposed via Avro::checkPlatform()) | ||
| * to have been called to set Avro::$biginteger_mode. | ||
| */ | ||
| public static function usesGmp(): bool | ||
| { | ||
| return self::GMP_BIGINTEGER_MODE === self::$biginteger_mode; | ||
| return self::GMP_BIGINTEGER_MODE === self::$bigIntegerMode; | ||
| } |
There was a problem hiding this comment.
Update the docblock to reference the renamed property.
Line 65 still references Avro::$biginteger_mode but the property was renamed to $bigIntegerMode.
📝 Suggested fix
/**
* `@return` bool true if the PHP GMP extension is used and false otherwise.
* `@internal` Requires Avro::check64Bit() (exposed via Avro::checkPlatform())
- * to have been called to set Avro::$biginteger_mode.
+ * to have been called to set Avro::$bigIntegerMode.
*/📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| * @return bool true if the PHP GMP extension is used and false otherwise. | |
| * @internal Requires Avro::check64Bit() (exposed via Avro::checkPlatform()) | |
| * to have been called to set Avro::$biginteger_mode. | |
| */ | |
| public static function usesGmp(): bool | |
| { | |
| return self::GMP_BIGINTEGER_MODE === self::$biginteger_mode; | |
| return self::GMP_BIGINTEGER_MODE === self::$bigIntegerMode; | |
| } | |
| /** | |
| * `@return` bool true if the PHP GMP extension is used and false otherwise. | |
| * `@internal` Requires Avro::check64Bit() (exposed via Avro::checkPlatform()) | |
| * to have been called to set Avro::$bigIntegerMode. | |
| */ | |
| public static function usesGmp(): bool | |
| { | |
| return self::GMP_BIGINTEGER_MODE === self::$bigIntegerMode; | |
| } |
🤖 Prompt for AI Agents
In `@lang/php/lib/Avro.php` around lines 63 - 70, Update the docblock on the
usesGmp() method to reference the renamed property $bigIntegerMode (instead of
Avro::$biginteger_mode); mention the dependency on Avro::check64Bit() (or
Avro::checkPlatform()) setting Avro::$bigIntegerMode so the documentation
matches the actual property name and casing.
| * @return int length in bytes of the block. | ||
| */ | ||
| private function readBlockHeader(): string|int | ||
| { | ||
| $this->block_count = $this->decoder->readLong(); | ||
| $this->blockCount = $this->decoder->readLong(); | ||
|
|
||
| return $this->decoder->readLong(); | ||
| } |
There was a problem hiding this comment.
Return type mismatch between docblock and signature.
The docblock states @return int but the method signature declares string|int. Either update the docblock to match the actual return type or narrow the implementation if only int is ever returned.
Proposed fix if int is always returned
/**
* Reads the block header (which includes the count of items in the block
* and the length in bytes of the block)
* `@return` int length in bytes of the block.
*/
- private function readBlockHeader(): string|int
+ private function readBlockHeader(): int
{
- $this->blockCount = $this->decoder->readLong();
+ $this->blockCount = (int) $this->decoder->readLong();
- return $this->decoder->readLong();
+ return (int) $this->decoder->readLong();
}🤖 Prompt for AI Agents
In `@lang/php/lib/DataFile/AvroDataIOReader.php` around lines 214 - 221, The
docblock for readBlockHeader() currently says "@return int" but the method
signature returns string|int; update the docblock to "@return string|int" (and
adjust the descriptive text if present) so the PHPDoc matches the method
signature, referencing the readBlockHeader() method and its use of
$this->decoder->readLong(); alternatively, if you want to guarantee an int,
change the signature to return int and cast the decoder result to int where
readLong() is returned.
| if ($writersSchema) { | ||
| if (!AvroDataIO::isValidCodec($codec)) { | ||
| throw new AvroDataIOException( | ||
| sprintf('codec %s is not supported', $codec) | ||
| ); | ||
| } | ||
|
|
||
| $this->sync_marker = self::generateSyncMarker(); | ||
| $this->syncMarker = self::generateSyncMarker(); | ||
| $this->metadata[AvroDataIO::METADATA_CODEC_ATTR] = $this->codec = $codec; | ||
| $this->metadata[AvroDataIO::METADATA_SCHEMA_ATTR] = (string) $writers_schema; | ||
| $this->metadata[AvroDataIO::METADATA_SCHEMA_ATTR] = (string) $writersSchema; | ||
| $this->writeHeader(); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, check the AvroIODatumWriter class structure
echo "=== AvroIODatumWriter class definition ==="
rg -n "class AvroIODatumWriter" lang/php/lib/Datum/AvroIODatumWriter.php -A 20
echo -e "\n=== writersSchema property in AvroIODatumWriter ==="
rg -n "writersSchema|writers_schema" lang/php/lib/Datum/AvroIODatumWriter.php
echo -e "\n=== AvroDataIOWriter class context ==="
rg -n "class AvroDataIOWriter" lang/php/lib/DataFile/AvroDataIOWriter.php -A 15
echo -e "\n=== datumWriter initialization in AvroDataIOWriter ==="
rg -n "datumWriter" lang/php/lib/DataFile/AvroDataIOWriter.php -B 2 -A 2Repository: martin-augment/avro
Length of output: 5183
🏁 Script executed:
#!/bin/bash
# Find usages and tests of AvroDataIOWriter constructor
echo "=== Files that instantiate AvroDataIOWriter ==="
rg -n "new AvroDataIOWriter" --type php
echo -e "\n=== Test files for AvroDataIOWriter ==="
fd ".*Test.*" --type f | grep -i "writer\|dataio" | head -20
# Check test files for context on how datumWriter is passed
echo -e "\n=== Look for test patterns ==="
rg -n "AvroDataIOWriter" lang/php/tests/ -B 2 -A 2 | head -50Repository: martin-augment/avro
Length of output: 893
🏁 Script executed:
#!/bin/bash
# Examine the actual usages
echo "=== AvroDataIO.php constructor usage ==="
rg -n "new AvroDataIOWriter" lang/php/lib/DataFile/AvroDataIO.php -B 5 -A 5
echo -e "\n=== write_read.php example ==="
cat -n lang/php/examples/write_read.php | sed -n '60,80p'
echo -e "\n=== StringIOTest.php ==="
cat -n lang/php/test/StringIOTest.php | sed -n '70,95p'
echo -e "\n=== Check how datumWriter is created before passing ==="
rg -n "new AvroIODatumWriter" lang/php -B 2 -A 2Repository: martin-augment/avro
Length of output: 6019
Sync $writersSchema to datumWriter for consistency and robustness.
The existing file path (line 104) explicitly sets $this->datumWriter->writersSchema from the file metadata, but the new file path (lines 86-96) doesn't sync the passed $writersSchema to the injected datumWriter instance. While all current callers pre-configure the datumWriter with schema before passing it, this creates inconsistency and makes the code fragile if someone passes an unconfigured writer with $writersSchema. Add this pattern to match the existing precedent:
Suggested change
$this->syncMarker = self::generateSyncMarker();
$this->metadata[AvroDataIO::METADATA_CODEC_ATTR] = $this->codec = $codec;
$this->metadata[AvroDataIO::METADATA_SCHEMA_ATTR] = (string) $writersSchema;
+ if ($writersSchema instanceof AvroSchema) {
+ $this->datumWriter->writersSchema = $writersSchema;
+ } else {
+ $this->datumWriter->writersSchema = AvroSchema::parse($writersSchema);
+ }
$this->writeHeader();🤖 Prompt for AI Agents
In `@lang/php/lib/DataFile/AvroDataIOWriter.php` around lines 86 - 96, The writers
schema passed into AvroDataIOWriter is not being applied to the injected
datumWriter, leaving datumWriter potentially unconfigured; after setting
$this->metadata[AvroDataIO::METADATA_SCHEMA_ATTR] and before writeHeader() in
the AvroDataIOWriter initialization, assign the passed $writersSchema to the
datum writer (i.e., set the datumWriter's writersSchema property or equivalent
on $this->datumWriter) so $this->datumWriter and $this->metadata stay in sync.
| throw new AvroException(sprintf( | ||
| 'Unknown type: %s', | ||
| $writers_schema->type | ||
| $writersSchema->type | ||
| )); | ||
| } |
There was a problem hiding this comment.
Inconsistent property access vs method call.
Line 138 accesses $writersSchema->type as a property, while line 76 uses $writersSchema->type() as a method. Use the method consistently.
Proposed fix
default:
throw new AvroException(sprintf(
'Unknown type: %s',
- $writersSchema->type
+ $writersSchema->type()
));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| throw new AvroException(sprintf( | |
| 'Unknown type: %s', | |
| $writers_schema->type | |
| $writersSchema->type | |
| )); | |
| } | |
| throw new AvroException(sprintf( | |
| 'Unknown type: %s', | |
| $writersSchema->type() | |
| )); | |
| } |
🤖 Prompt for AI Agents
In `@lang/php/lib/Datum/AvroIODatumWriter.php` around lines 136 - 140, The throw
in AvroIODatumWriter is accessing $writersSchema->type as a property while the
rest of the class (e.g., earlier at line 76) uses the method
$writersSchema->type(); update the throw to call $writersSchema->type() instead
of accessing the property, and scan AvroIODatumWriter for any other occurrences
of ->type property access to make them consistent with the method form.
| case self::SEEK_CUR: | ||
| if (0 > $this->current_index + $whence) { | ||
| if (0 > $this->currentIndex + $whence) { | ||
| throw new AvroIOException('Cannot seek before beginning of file.'); | ||
| } | ||
| $this->current_index += $offset; | ||
| $this->currentIndex += $offset; |
There was a problem hiding this comment.
Bug: Wrong variable used in SEEK_CUR boundary check.
Line 133 uses $whence instead of $offset for the boundary check. Since $whence equals self::SEEK_CUR (constant value 1), this check is incorrect and will not properly detect seeks before the beginning of the buffer.
🐛 Proposed fix
case self::SEEK_CUR:
- if (0 > $this->currentIndex + $whence) {
+ if (0 > $this->currentIndex + $offset) {
throw new AvroIOException('Cannot seek before beginning of file.');
}
$this->currentIndex += $offset;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| case self::SEEK_CUR: | |
| if (0 > $this->current_index + $whence) { | |
| if (0 > $this->currentIndex + $whence) { | |
| throw new AvroIOException('Cannot seek before beginning of file.'); | |
| } | |
| $this->current_index += $offset; | |
| $this->currentIndex += $offset; | |
| case self::SEEK_CUR: | |
| if (0 > $this->currentIndex + $offset) { | |
| throw new AvroIOException('Cannot seek before beginning of file.'); | |
| } | |
| $this->currentIndex += $offset; |
🤖 Prompt for AI Agents
In `@lang/php/lib/IO/AvroStringIO.php` around lines 132 - 136, In the SEEK_CUR
branch of AvroStringIO (case self::SEEK_CUR) the boundary check incorrectly uses
$whence instead of $offset; update the check to use $offset so it reads the
intended condition (e.g., verify 0 > $this->currentIndex + $offset) and keep the
subsequent update $this->currentIndex += $offset unchanged, ensuring seeks
before the buffer start are detected correctly.
| * @phpstan-type AvroProtocolDefinitionArray array{ | ||
| * types: AvroSchemaDefinitionArray, | ||
| * protocol: string, | ||
| * namespace: string, | ||
| * messages: array<string, AvroProtocolMessageDefinitionArray> | ||
| * } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd lang/php/lib/Protocol && wc -l AvroProtocol.phpRepository: martin-augment/avro
Length of output: 83
🏁 Script executed:
cd lang/php/lib/Protocol && cat -n AvroProtocol.php | sed -n '1,100p'Repository: martin-augment/avro
Length of output: 4058
🏁 Script executed:
cd lang/php/lib/Protocol && grep -n "realParse" AvroProtocol.php | head -20Repository: martin-augment/avro
Length of output: 245
Mark types and messages as optional in the PHPStan array-shape.
The realParse() method checks array_key_exists("types", $avro) before accessing types (line 78) and uses $avro["messages"] ?? null for messages (line 83), making both keys optional. The current type definition incorrectly marks them as required, which can cause PHPStan false positives. Update the shape at lines 33-38 to mark these keys with ?:
Proposed adjustment
- * `@phpstan-type` AvroProtocolDefinitionArray array{
- * types: AvroSchemaDefinitionArray,
- * protocol: string,
- * namespace: string,
- * messages: array<string, AvroProtocolMessageDefinitionArray>
- * }
+ * `@phpstan-type` AvroProtocolDefinitionArray array{
+ * protocol: string,
+ * namespace: string,
+ * types?: AvroSchemaDefinitionArray,
+ * messages?: array<string, AvroProtocolMessageDefinitionArray>
+ * }Also applies to: 71-75
🤖 Prompt for AI Agents
In `@lang/php/lib/Protocol/AvroProtocol.php` around lines 33 - 38, Update the
phpstan array-shape alias AvroProtocolDefinitionArray to mark the "types" and
"messages" keys as optional (use the nullable/optional `?` shape syntax) so it
matches how realParse() checks `array_key_exists("types", $avro)` and uses
`$avro["messages"] ?? null`; edit the AvroProtocolDefinitionArray type (and the
similar shape around lines 71-75) so `types` and `messages` are optional in the
shape, ensuring the static type matches the runtime checks in realParse().
| /** | ||
| * @phpstan-import-type AvroSchemaDefinitionArray from AvroSchema | ||
| * @phpstan-type AvroProtocolMessageDefinitionArray array{request: AvroSchemaDefinitionArray, response: string} | ||
| */ |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the file structure and content
cat -n lang/php/lib/Protocol/AvroProtocolMessage.php | head -60Repository: martin-augment/avro
Length of output: 2467
🏁 Script executed:
# Continue reading the constructor to see response handling
cat -n lang/php/lib/Protocol/AvroProtocolMessage.php | sed -n '60,85p'Repository: martin-augment/avro
Length of output: 687
Make response optional in the PHPStan message shape.
The constructor uses array_key_exists('response', $avro) to conditionally process the response, allowing it to be absent from the array. The property is also nullable (?AvroSchema), but the type definition currently requires it.
Proposed adjustment
- * `@phpstan-type` AvroProtocolMessageDefinitionArray array{request: AvroSchemaDefinitionArray, response: string}
+ * `@phpstan-type` AvroProtocolMessageDefinitionArray array{request: AvroSchemaDefinitionArray, response?: string}Also applies to: 43-50
🤖 Prompt for AI Agents
In `@lang/php/lib/Protocol/AvroProtocolMessage.php` around lines 32 - 35, Update
the PHPStan type alias for the Avro protocol message to make response optional
to match the constructor's behavior: locate the AvroProtocolMessage class/type
definitions (symbols AvroProtocolMessageDefinitionArray and the other similar
alias around lines 43-50) and change the shape so response is optional (e.g.,
use response?: string or response: string|null) in both occurrences; ensure any
dependent phpstan-import-type references remain unchanged so the constructor's
array_key_exists('response', $avro) and the ?AvroSchema typed property are
consistent with the type.
| * @throws AvroException | ||
| * @return int the index of the given $symbol in the enum schema | ||
| */ | ||
| public function symbolIndex($symbol) | ||
| public function symbolIndex(string $symbol): int | ||
| { |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -nP '\bsymbolIndex\s*\(' -g'*.php' -B2 -A2Repository: martin-augment/avro
Length of output: 921
🏁 Script executed:
# Search more comprehensively for symbolIndex calls
rg 'symbolIndex' -g'*.php' -n
# Also check if there are any tests related to AvroEnumSchema
fd -e php -e phpt | xargs grep -l 'AvroEnumSchema\|symbolIndex' 2>/dev/null | head -20Repository: martin-augment/avro
Length of output: 443
🏁 Script executed:
# Check AvroIODatumReader.php for symbolIndex usage
rg -n 'symbolIndex' lang/php/lib/Datum/AvroIODatumReader.php -B2 -A2
# Also check the implementation in AvroEnumSchema.php
cat -n lang/php/lib/Schema/AvroEnumSchema.php | head -110 | tail -30Repository: martin-augment/avro
Length of output: 1149
🏁 Script executed:
# Check git log for recent changes to symbolIndex
git log --oneline -n 20 -- lang/php/lib/Schema/AvroEnumSchema.php
# Check if there's a CHANGELOG or version file
fd -e md -e txt | grep -iE '(changelog|version|release)' | head -10Repository: martin-augment/avro
Length of output: 636
🏁 Script executed:
# Get the diff for the commit that touched AvroEnumSchema.php
git show fc8392377 -- lang/php/lib/Schema/AvroEnumSchema.phpRepository: martin-augment/avro
Length of output: 3966
🏁 Script executed:
# Check if AvroEnumSchema.php existed in previous commits
git log --follow --all --oneline -- lang/php/lib/Schema/AvroEnumSchema.php
# Check the parent of fc8392377 to see the state before
git show fc8392377^:lang/php/lib/Schema/AvroEnumSchema.php 2>&1 | head -20Repository: martin-augment/avro
Length of output: 501
🏁 Script executed:
# Get the previous version from the commit before fc8392377
git show 3f0a45a28:lang/php/lib/Schema/AvroEnumSchema.php 2>&1 | grep -A10 "symbolIndex"Repository: martin-augment/avro
Length of output: 365
Document public API BC change: symbolIndex(string $symbol): int.
Parameter type was tightened from untyped to string and return type added. Non-string arguments now throw TypeError instead of AvroException, but the docblock still claims @throws AvroException. Internal call sites are compatible (only one found: AvroIODatumWriter.php:237). Update documentation to clarify the strict type requirement and the new exception behavior for external consumers.
🤖 Prompt for AI Agents
In `@lang/php/lib/Schema/AvroEnumSchema.php` around lines 94 - 98, Update the
AvroEnumSchema::symbolIndex docblock to reflect the tightened signature and new
exception behavior: change/add `@param` string $symbol and `@return` int, add
`@throws` \TypeError to indicate non-string arguments will raise a TypeError, and
keep/clarify `@throws` AvroException for cases where the symbol is not present;
include a short note that callers must pass a string to avoid TypeError (this
clarifies the public API BC change).
| [$x, $namespace] = $name->nameAndNamespace(); | ||
| $this->fields = self::parseFields($fields, $namespace, $schemata); |
There was a problem hiding this comment.
Unused variable $x.
The variable $x from the destructuring assignment is never used. Consider using an underscore prefix to indicate intentional discard, or restructure to only extract the needed value.
Proposed fix
- [$x, $namespace] = $name->nameAndNamespace();
+ [, $namespace] = $name->nameAndNamespace();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| [$x, $namespace] = $name->nameAndNamespace(); | |
| $this->fields = self::parseFields($fields, $namespace, $schemata); | |
| [, $namespace] = $name->nameAndNamespace(); | |
| $this->fields = self::parseFields($fields, $namespace, $schemata); |
🧰 Tools
🪛 PHPMD (2.15.0)
64-64: Avoid unused local variables such as '$x'. (undefined)
(UnusedLocalVariable)
🤖 Prompt for AI Agents
In `@lang/php/lib/Schema/AvroRecordSchema.php` around lines 64 - 65, The
destructuring assigns an unused variable $x from nameAndNamespace(); update the
assignment in AvroRecordSchema (around the call to nameAndNamespace()) to
discard the first element (e.g., use an underscore-prefixed variable like $_ or
a two‑slot list with the first slot ignored) so only $namespace is used, then
pass $namespace into self::parseFields($fields, $namespace, $schemata); keep
parseFields and nameAndNamespace() unchanged.
| use Apache\Avro\AvroIO; | ||
| use Apache\Avro\DataFile\AvroDataIO; | ||
| use Apache\Avro\IO\AvroFile; | ||
| use Apache\Avro\Schema\AvroSchema; | ||
| use PHPUnit\Framework\Attributes\DataProvider; | ||
| use PHPUnit\Framework\TestCase; | ||
|
|
||
| class InterOpTest extends TestCase | ||
| { | ||
| private string $projection_json; | ||
| private string $projectionJson; | ||
| private AvroSchema $projection; | ||
|
|
||
| public function setUp(): void | ||
| { | ||
| $interop_schema_file_name = AVRO_INTEROP_SCHEMA; | ||
| $this->projection_json = file_get_contents($interop_schema_file_name); | ||
| $this->projection = AvroSchema::parse($this->projection_json); | ||
| $this->projectionJson = file_get_contents($interop_schema_file_name); | ||
| $this->projection = AvroSchema::parse($this->projectionJson); |
There was a problem hiding this comment.
Guard file_get_contents() before assigning to a typed property.
If the schema file is missing, file_get_contents() returns false, causing a TypeError without a clear test failure message. Add an explicit check.
🛠️ Proposed fix
- $this->projectionJson = file_get_contents($interop_schema_file_name);
- $this->projection = AvroSchema::parse($this->projectionJson);
+ $contents = file_get_contents($interop_schema_file_name);
+ if ($contents === false) {
+ $this->fail("Could not read schema file: {$interop_schema_file_name}");
+ }
+ $this->projectionJson = $contents;
+ $this->projection = AvroSchema::parse($this->projectionJson);🤖 Prompt for AI Agents
In `@lang/php/test/InterOpTest.php` around lines 23 - 38, In InterOpTest::setUp,
guard the file_get_contents call before assigning to the typed property
$projectionJson and before parsing into $projection: call
file_get_contents(AVRO_INTEROP_SCHEMA) into a local variable, check if the
result === false (or !is_string), and if so call $this->fail("Unable to read
AVRO_INTEROP_SCHEMA: file missing or unreadable") or throw a clear exception;
otherwise assign the string to $this->projectionJson and then call
AvroSchema::parse($this->projectionJson) to populate $this->projection.
There was a problem hiding this comment.
Code Review
This pull request primarily focuses on enhancing type safety and code clarity across various Avro PHP library files. Key changes include updating PHPDoc @returns tags to @return, adding explicit type hints to method parameters and return types, and renaming numerous variables and properties from snake_case to camelCase for consistency. Specifically, compression/decompression logic in AvroDataIOReader and AvroDataIOWriter was refactored into dedicated private helper methods, complete with robust error handling and extension checks. Additionally, redundant @param PHPDoc tags were removed where type hints were already present in function signatures, and declare(strict_types=1) was added to AvroIOSchemaMatchException.php. Review comments highlighted the removal of some @param tags that still corresponded to parameters in the function signature, and raised concerns about removing explicit is_string() checks in AvroStringIO's constructor and write method, suggesting potential clarity issues or compatibility concerns in non-strict environments, despite the introduction of strict types.
|
|
||
| /** | ||
| * @param string $str | ||
| * @param string $joiner string used to join |
There was a problem hiding this comment.
| * @param string $str | ||
| * @returns string[] array of hex representation of each byte of $str | ||
| * @return string[] array of hex representation of each byte of $str | ||
| */ |
| * @param string $format format to represent bytes | ||
| * @returns string[] array of each byte of $str formatted using $format | ||
| * @return string[] array of each byte of $str formatted using $format | ||
| */ |
| * @returns string of bytes of $str represented in decimal format | ||
| * @return string of bytes of $str represented in decimal format | ||
| * @uses decArray() | ||
| */ |
| public static function decArray($str) | ||
| public static function decArray(string $str): array | ||
| { | ||
| return self::bytesArray($str, '%3d'); |
| /** | ||
| * @param \GMP $g resource | ||
| * @return \GMP resource 64-bit two's complement of input. | ||
| */ |
| * @param int $shift number of bits to shift right | ||
| * @returns resource $g shifted right $shift bits | ||
| * @return \GMP $g shifted right $shift bits | ||
| */ |
| /** @phpstan-ignore if.alwaysFalse */ | ||
| if ($this->isEof()) { |
There was a problem hiding this comment.
The /** @phpstan-ignore if.alwaysFalse */ annotation is present here. While it might be necessary for PHPStan, it indicates a potential logical path that is marked as unreachable by static analysis but might be reachable at runtime, or a misunderstanding of the isEof() behavior. It would be beneficial to either clarify why this ignore is needed or refactor the logic to avoid it if possible.
| */ | ||
| public function __construct(string $str = '') | ||
| { | ||
| $this->is_closed = false; | ||
| $this->string_buffer = ''; | ||
| $this->current_index = 0; | ||
|
|
||
| if (is_string($str)) { | ||
| $this->string_buffer .= $str; | ||
| } else { | ||
| throw new AvroIOException( | ||
| sprintf('constructor argument must be a string: %s', gettype($str)) | ||
| ); | ||
| } | ||
| $this->isClosed = false; | ||
| $this->stringBuffer = $str; | ||
| $this->currentIndex = 0; |
There was a problem hiding this comment.
The original constructor included a runtime check for $str being a string and threw an AvroIOException if it wasn't. While declare(strict_types=1); and the type hint string $str enforce this at the engine level, removing the explicit if (is_string($str)) check removes a defensive layer that might be useful in environments where strict types are not always enforced or for clearer error messages specific to AvroIOException.
|
|
||
| throw new AvroIOException( | ||
| sprintf( | ||
| 'write argument must be a string: (%s) %s', | ||
| gettype($bytes), | ||
| var_export($bytes, true) | ||
| ) | ||
| ); | ||
| return $this->appendStr($bytes); |
There was a problem hiding this comment.
Similar to the constructor, the write method previously had an is_string($bytes) check. Although the parameter is now strictly typed, removing this explicit check might reduce the clarity of error messages if a non-string somehow reaches this point (e.g., via reflection or older PHP versions without strict types).
3634: To review by AI