Skip to content

PLUGIN-1936: Read data in a streaming manner#131

Open
sgarg-CS wants to merge 5 commits intodata-integrations:developfrom
cloudsufi:feature/PLUGIN-1936
Open

PLUGIN-1936: Read data in a streaming manner#131
sgarg-CS wants to merge 5 commits intodata-integrations:developfrom
cloudsufi:feature/PLUGIN-1936

Conversation

@sgarg-CS
Copy link
Contributor

@sgarg-CS sgarg-CS commented Nov 10, 2025

PLUGIN-1936

Issue : OutOfMemory (OOM) Errors seen in the ServiceNow Connector frequently when transfering large number of records

RCA: The entire response is converted into string and the JSON is parsed in memory leading to OOM issues.

Proposed Fix: Read the records incrementally i.e. one by one in a streaming manner.

Changes :

  • Refactored the existing method: prepareResponse :in the RESTAPIResponse class to return a reusable InputStream in the RESTAPIResponse object.
  • Added openNextPage to initialise the Gson's JsonReader and read the result array in the response and closeCurrentPage to close the JsonReader
  • nextKeyValue() : Refactored the existing logic to stream one record at a time.
  • Changed the return type of the following methods from Map<String, String> to RESTAPIResponse
    • fetchData()
    • fetchTableRecordsRetryableMode()
    • fetchTableRecords()

Testing
image

@sgarg-CS sgarg-CS marked this pull request as ready for review November 25, 2025 04:20
}

/**
* Check whether the result is empty or not.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also explain how are we checking is result is empty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It specifically looks for a top-level key named result. Once found, it opens the associated array and checks for the presence of a first element. If it does contain at least one element, then it return false, otherwise true.

Updated the method comment. ba5eb53

return GSON.fromJson(ja, type);
}

public List<Map<String, String>> parseResponseToResultListOfMap(InputStream in) throws ServiceNowAPIException {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we creating a List of map? Instead we can directly read it as a JsonObject, something like:

gson.fromJson(reader, ServiceNowRecordObject.class);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored the code to parse the the RestAPIResponse into result. ba5eb53

APIResponse apiResponse = GSON.fromJson(new JsonReader(new InputStreamReader(in, StandardCharsets.UTF_8)), APIResponse.class);
return apiResponse.getResult();

The calling method : serviceNowTableAPIClient.parseResponseToResultListOfMap in ServiceNowConnector.getTableData expects a result of type: List<Map<String, String>>

Can you check the APIResponse class once? When you say ServiceNowRecordObject class, do you expect that class to hold the result array returned by the ServiceNow Table REST API ? Please clarify.

String startDate, String endDate, int offset,
int limit) throws ServiceNowAPIException {
final List<Map<String, String>> results = new ArrayList<>();
//final List<Map<String, String>> results = new ArrayList<>();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed ba5eb53

final RestAPIResponse[] restAPIResponse = new RestAPIResponse[1];
Callable<Boolean> fetchRecords = () -> {
results.addAll(fetchTableRecords(tableName, valueType, startDate, endDate, offset, limit));
// results.addAll(fetchTableRecords(tableName, valueType, startDate, endDate, offset, limit));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed ba5eb53

Callable<Boolean> fetchRecords = () -> {
results.addAll(fetchTableRecords(tableName, valueType, startDate, endDate, offset, limit));
// results.addAll(fetchTableRecords(tableName, valueType, startDate, endDate, offset, limit));
restAPIResponse[0] = fetchTableRecords(tableName, valueType, startDate, endDate, offset, limit);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please explain what are we trying to do here? why restAPIResponse is an array

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Java, when we use a variable from an outer scope inside a lambda expression or a Callable, that variable must be effectively final. This means its reference cannot change after it is initialised. so, I was creating an array earlier for RestAPIResponse and returning only the first element. I've now replaced it with AtomicReference. ba5eb53

private final ServiceNowMultiSourceConfig multiSourcePluginConf;
private ServiceNowTableAPIClientImpl restApi;
private final Gson gson = new Gson();
private final Type mapType = new TypeToken<Map<String, String>>() { }.getType();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use an object instead of map

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to replace TypeToken<Map<String, String>>() with TypeToken<Object>() here? Please confirm.

return false;
}
this.jsonReader = new JsonReader(new InputStreamReader(in, StandardCharsets.UTF_8));
this.jsonReader.setLenient(true);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the ServiceNow Table REST API provides standard-compliant JSON that does not strictly require jsonreader.setlenient(true) for basic parsing. However, there are specific scenarios where enabling leniency becomes necessary or is a recommended safeguard.

If the API call fails (e.g., 401 Unauthorised, 404 Not Found, or a ServiceNow "maintenance" page), the server might return an HTML body instead of JSON. A strict parser will fail immediately but a lenient one may provide a more readable error or allow you to handle the string manually.

JsonToken top;
try {
top = jsonReader.peek();
LOG.info("Peeking JSON token for table {}: {}", tableName, top);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove, it should be a debug log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Changed it to DEBUG ba5eb53

}

public void closeCurrentPage() {
LOG.info("Closing current page for table {}", tableName);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, had added this logger during dev testing to troubleshoot an issue. Removed it now. ba5eb53

return restAPIResponse;
}

private boolean openNextPage() throws IOException, ServiceNowAPIException {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks redundant, as same function is present in src/main/java/io/cdap/plugin/servicenow/source/ServiceNowMultiRecordReader.java

can we abstract these out in a common file. Same comment for nextKeyValue()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the implementation of nextKey(), openNextPage() and closeCurrentPage() to the ServiceNowBaseRecordReader class ba5eb53

}

/**
* Determines if the "result" array in a ServiceNow REST API response is empty. It specifically looks for a top-level

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please format it as per java doc.
/** One line summary
*

  • Additional details

  • params
    *@return
    */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatted 6115173

int limit) throws ServiceNowAPIException {
final List<Map<String, String>> results = new ArrayList<>();
// Using AtomicReference to capture a value inside a lambda that needs to be accessed outside.
AtomicReference<RestAPIResponse> responseRef = new AtomicReference<>();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a overkill, it's better to abstract the logic in a function instead of lambda to simplify the code here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored the code by separating out the logic in a private method. PTAL. 6115173

String tableName) {
List<Map<String, String>> result = parseResponseToResultListOfMap(restAPIResponse.getResponseBody());
String tableName) throws ServiceNowAPIException {
List<Map<String, String>> result = parseResponseToResultListOfMap(restAPIResponse.getBodyAsStream());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a Map, we should use well defined Java Objects.
Since we are already refactoring the code we can always change the caller functions, I don't see that as a blocker

List<Map<String, String>> result = parseResponseToResultListOfMap(restAPIResponse.getResponseBody());
String tableName) throws ServiceNowAPIException {
List<Map<String, String>> result = parseResponseToResultListOfMap(restAPIResponse.getBodyAsStream());
// List<Map<String, String>> result = parseResponseToResultListOfMap(restAPIResponse.getResponseBody());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can still see the comment

return GSON.fromJson(ja, type);
}

public List<Map<String, String>> parseResponseToResultListOfMap(InputStream in) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use well defined Objects here instead of Map

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it's not clear to me what's the ask here. Can you please elaborate more what you mean by well defined objects ? Are you looking for a POJO class here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly this Map is used to store the response of the API, if yes then response can be represented by a POJO class instead of a generic map

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is used to store the ServiceNow API Response. Can we use this POJO class ?

inputStream, MAX_PAGE_BYTES + 1); // +1 to detect overflow
responseBody = IOUtils.toByteArray(boundedInputStream);
LOG.info("RAW JSON: {}", new String(responseBody, StandardCharsets.UTF_8));
if (responseBody.length > MAX_PAGE_BYTES) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loading the complete page defeats the whole purpose of this change altogether. We are making this change to avoid OOM issues when we load the entire page into memory and loading it into a bounded buffer is not solving that problem.

Regarding multiple access of the input stream: We can always refactor out code(validation etc.) such that all the processing happens record by record instead of the whole page.
Basically when the connector request for a record we can do all the processing/validation is required, instead of doing it beforehand. This would solve the problem of multiple access of InputStream.


JsonObject responseJSON = jsonParser.parse(apiResponse.getResponseBody()).getAsJsonObject();
JsonObject responseJSON = jsonParser.parse(
new InputStreamReader(apiResponse.getBodyAsStream(), StandardCharsets.UTF_8)).getAsJsonObject();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we switching to stream for POST methods also? Can't we restrict it to read/GET operations only?

}

/**
* The refactored nextKeyValue() — uses Gson JsonReader to stream one record at a time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rephrase the comment here, refactored doesn't provide any context to the reader, the purpose of comment is not to provide the history of the change but explain the logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased. PTAL. 6115173

closeCurrentPage();
LOG.debug("Fetching data for table {} at offset {}", tableName, split.getOffset());
RestAPIResponse resp = fetchData();
LOG.debug("Fetched data for table {} at offset {}", tableName, split.getOffset());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need these many debug logs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, these were more for dev-testing. I can remove them. 4b89ca5

@sgarg-CS sgarg-CS force-pushed the feature/PLUGIN-1936 branch 5 times, most recently from 5a062f7 to 47c753b Compare February 16, 2026 14:31
@sgarg-CS sgarg-CS force-pushed the feature/PLUGIN-1936 branch from 26c0bb1 to c24f96b Compare February 16, 2026 14:41
Copy link
Contributor

@itsankit-google itsankit-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the compilation failure in cdap-e2e-tests:

Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile (default-testCompile) on project servicenow-plugins: Compilation failure
[ERROR] /tmp/servicenow-plugins/servicenow-plugins/plugin/src/e2e-test/java/io/cdap/plugin/servicenowsink/actions/ServiceNowSinkPropertiesPageActions.java:[66,78] incompatible types: com.google.gson.JsonObject cannot be converted to java.util.Map<java.lang.String,java.lang.String>

@itsankit-google itsankit-google self-requested a review February 16, 2026 16:07
Copy link
Contributor

@itsankit-google itsankit-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return GSON.fromJson(ja, type);
}

public List<JsonObject> parseResponseToResultListOfMap(InputStream in) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is returning List<JsonObject> but the name of method is parseResponseToResultListOfMap, please fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the method accordingly. 016cdfb

pom.xml Outdated
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.5</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why using 2.5 when newer versions are available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused now. Removed it. 016cdfb

pom.xml Outdated
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please move the version to properties section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants