Skip to content

Conversation

@shivendra-dev54
Copy link
Contributor

Fixes #757

Reason for this PR

Spark and Java tests currently rely on the GAR_TEST_DATA environment variable.
This PR adds fallback logic to locate the testing directory from the project
root or its parent when the environment variable is not set, avoiding test
failures in local environments.

What changes are included in this PR?

The following test files were updated to add fallback test-data resolution logic:

  1. maven-projects/java/src/test/java/org/apache/graphar/graphinfo/GraphInfoTest.java
  2. maven-projects/spark/graphar/src/test/scala/org/apache/graphar/BaseTestSuite.scala

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@codecov-commenter
Copy link

codecov-commenter commented Jan 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.17%. Comparing base (23c46c2) to head (204e3aa).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #824      +/-   ##
============================================
- Coverage     77.18%   77.17%   -0.02%     
  Complexity      607      607              
============================================
  Files            85       84       -1     
  Lines          8858     8854       -4     
  Branches       1043     1043              
============================================
- Hits           6837     6833       -4     
  Misses         1781     1781              
  Partials        240      240              
Flag Coverage Δ
pyspark 95.11% <ø> (ø)
spark 81.32% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@shivendra-dev54
Copy link
Contributor Author

@yangxk1
@Thespica
i have added the logic mentioned in the issue #757
Also why is python pipeline failing ???

@yangxk1
Copy link
Contributor

yangxk1 commented Jan 11, 2026

Hi @shivendra-dev54, python sdk is built using pybind11 and depends on c++ library

@shivendra-dev54
Copy link
Contributor Author

shivendra-dev54 commented Jan 11, 2026

@yangxk1 Thanks for the clarification.
Just to confirm do the current changes meet the requirements, or is anything else needed from my side?
I added the logic where it was not present, Am i missing something ?

@yangxk1
Copy link
Contributor

yangxk1 commented Jan 11, 2026

Let me see why this is happening, please wait a moment

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds fallback logic for locating test data in Spark and Java tests when the GAR_TEST_DATA environment variable is not set. The implementation checks the environment variable first, then a system property, and finally searches for the testing directory in common relative paths.

Changes:

  • Added resolveTestData() method in BaseTestSuite.scala to locate test data with fallback logic
  • Added resolveTestData() method in GraphInfoTest.java to locate test data with fallback logic

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
maven-projects/spark/graphar/src/test/scala/org/apache/graphar/BaseTestSuite.scala Replaces direct environment variable check with a new resolveTestData() method that tries environment variable, system property, and relative path candidates
maven-projects/java/src/test/java/org/apache/graphar/graphinfo/GraphInfoTest.java Adds a static resolveTestData() method that implements the same fallback logic for Java tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return dir.getAbsolutePath();
}
}
throw new IllegalStateException("GAR_TEST_DATA not found");
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception type differs from the one used in TestUtil.java, which uses RuntimeException. Consider using RuntimeException instead of IllegalStateException for consistency across the codebase, or document why a different exception type is appropriate here.

Suggested change
throw new IllegalStateException("GAR_TEST_DATA not found");
throw new RuntimeException("GAR_TEST_DATA not found");

Copilot uses AI. Check for mistakes.
)
.map(_.getAbsolutePath)
.getOrElse(
throw new IllegalArgumentException("GAR_TEST_DATA not found")
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception type differs from the one used in TestUtil.java, which uses RuntimeException. Consider using RuntimeException instead of IllegalArgumentException for consistency across the codebase, or document why a different exception type is appropriate here.

Suggested change
throw new IllegalArgumentException("GAR_TEST_DATA not found")
throw new RuntimeException("GAR_TEST_DATA not found")

Copilot uses AI. Check for mistakes.
Comment on lines 45 to 50
return dir.getAbsolutePath();
}
}
throw new IllegalStateException("GAR_TEST_DATA not found");
}
return path;
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation approach differs from TestUtil.java which performs additional verification after finding a path. TestUtil.java re-validates the chosen path to ensure both the directory and the marker file exist. This double-checking approach is more robust and catches edge cases. Consider adding a final validation step after selecting a path to ensure it's valid before returning it.

Suggested change
return dir.getAbsolutePath();
}
}
throw new IllegalStateException("GAR_TEST_DATA not found");
}
return path;
path = dir.getAbsolutePath();
break;
}
}
if (path == null) {
throw new IllegalStateException("GAR_TEST_DATA not found");
}
}
java.io.File baseDir = new java.io.File(path).getAbsoluteFile();
java.io.File markerFile =
new java.io.File(baseDir, "ldbc_sample/csv/ldbc_sample.graph.yml");
if (!baseDir.isDirectory() || !markerFile.exists()) {
throw new IllegalStateException("GAR_TEST_DATA not found");
}
return baseDir.getAbsolutePath();

Copilot uses AI. Check for mistakes.
Comment on lines 32 to 50
def resolveTestData(): String = {
Option(System.getenv("GAR_TEST_DATA"))
.orElse(Option(System.getProperty("gar.test.data")))
.getOrElse {
val candidates = Seq("../../testing", "../testing", "testing")
candidates
.map(p => new java.io.File(p).getAbsoluteFile)
.find(d =>
new java.io.File(d, "ldbc_sample/csv/ldbc_sample.graph.yml")
.exists()
)
.map(_.getAbsolutePath)
.getOrElse(
throw new IllegalArgumentException("GAR_TEST_DATA not found")
)
}
}
testData = System.getenv("GAR_TEST_DATA")

testData = resolveTestData()
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation approach differs from TestUtil.java which performs additional verification after finding a path. TestUtil.java re-validates the chosen path to ensure both the directory and the marker file exist. This double-checking approach is more robust and catches edge cases. Consider adding a final validation step after selecting a path to ensure it's valid before returning it.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivendra-dev54 Please consider this comment, it looks good

Comment on lines 44 to 45
if (new java.io.File(dir, "ldbc_sample/csv/ldbc_sample.graph.yml").exists()) {
return dir.getAbsolutePath();
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation logic is inconsistent with the existing implementation in TestUtil.java. The current code only checks if the file exists, but TestUtil.java also verifies that the directory exists and is actually a directory. This could lead to false positives if the parent path doesn't exist. Consider adding an additional check to verify that the parent directory exists and is a directory before checking for the specific file.

Suggested change
if (new java.io.File(dir, "ldbc_sample/csv/ldbc_sample.graph.yml").exists()) {
return dir.getAbsolutePath();
if (dir.exists() && dir.isDirectory()) {
java.io.File graphFile =
new java.io.File(dir, "ldbc_sample/csv/ldbc_sample.graph.yml");
if (graphFile.exists()) {
return dir.getAbsolutePath();
}

Copilot uses AI. Check for mistakes.
Comment on lines 40 to 41
new java.io.File(d, "ldbc_sample/csv/ldbc_sample.graph.yml")
.exists()
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation logic is inconsistent with the existing implementation in TestUtil.java. The Scala code only checks if the file exists, but TestUtil.java also verifies that the directory exists and is actually a directory. This could lead to false positives if the parent path doesn't exist. Consider adding an additional check to verify that the parent directory exists and is a directory before checking for the specific file.

Suggested change
new java.io.File(d, "ldbc_sample/csv/ldbc_sample.graph.yml")
.exists()
d.exists() && d.isDirectory &&
new java.io.File(d, "ldbc_sample/csv/ldbc_sample.graph.yml")
.isFile

Copilot uses AI. Check for mistakes.
)
.map(_.getAbsolutePath)
.getOrElse(
throw new IllegalArgumentException("GAR_TEST_DATA not found")
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message is less informative than the one used in TestUtil.java. TestUtil.java provides a more detailed message that explains both the environment variable option and the directory structure requirement. Consider updating the message to match the clarity of TestUtil.java's error message which states: "GAR_TEST_DATA not found or invalid. Please set GAR_TEST_DATA environment variable to point to the testing directory or ensure the testing directory exists with ldbc_sample/csv/ldbc_sample.graph.yml".

Suggested change
throw new IllegalArgumentException("GAR_TEST_DATA not found")
throw new IllegalArgumentException(
"GAR_TEST_DATA not found or invalid. Please set GAR_TEST_DATA environment variable to point to the testing directory or ensure the testing directory exists with ldbc_sample/csv/ldbc_sample.graph.yml"
)

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivendra-dev54, You can consider adopting this suggestion, it makes the error message clearer

@yangxk1
Copy link
Contributor

yangxk1 commented Jan 11, 2026

@shivendra-dev54 This problem seems to be solved. In addition.
And I am sorry that my conclusion just now was incorrect. The problem that just appeared is the pyspark module not python module. It depends on the spark module, so changes to spark will affect pyspark.

@shivendra-dev54
Copy link
Contributor Author

@yangxk1
ohk thank you for the help ✨
is there anything else you want me to do in this or this logic is enough,
Like does this solve the issue #757

@shivendra-dev54
Copy link
Contributor Author

@yangxk1
I have updated the code with the changes you suggested

@yangxk1
Copy link
Contributor

yangxk1 commented Jan 12, 2026

Great! In the java/ module, a better approach is to create a public method, although it will only be used in one place. But this is not the key issue.

Copy link
Contributor

@yangxk1 yangxk1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you~ @shivendra-dev54

@yangxk1 yangxk1 merged commit ccf9a65 into apache:main Jan 12, 2026
5 checks passed
@yangxk1
Copy link
Contributor

yangxk1 commented Jan 12, 2026

Oh!!! I forgot to check the PR name, please remember to remind me next time

@shivendra-dev54
Copy link
Contributor Author

@yangxk1
I will,
is there any format you can share me ?

@yangxk1
Copy link
Contributor

yangxk1 commented Jan 12, 2026

here https://github.com/apache/incubator-graphar/blob/main/CONTRIBUTING.md#title, This will make it easier to classify PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[maven-project] get test data by default path, rather than env variable

3 participants