Skip to content

add flinksql document parsing for block-level submissions#3314

Draft
Dave Shoup (shouples) wants to merge 3 commits intomainfrom
djs/flinksql-parser
Draft

add flinksql document parsing for block-level submissions#3314
Dave Shoup (shouples) wants to merge 3 commits intomainfrom
djs/flinksql-parser

Conversation

@shouples
Copy link
Contributor

Summary of Changes

image

Click-testing instructions

Optional: Any additional details or context that should be provided?

Pull request checklist

Please check if your PR fulfills the following (if applicable):

Tests

  • Added new
  • Updated existing
  • Deleted existing

Release notes

  • Does anything in this PR need to be mentioned in the user-facing CHANGELOG?

Copilot AI review requested due to automatic review settings February 28, 2026 04:37
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Flink SQL document parsing so the extension can identify “executable blocks” (e.g., SET/USE statements paired with the next executable statement) and drive block-level “Submit Statement” CodeLens behavior.

Changes:

  • Introduces a Flink SQL parser that splits statements safely (ignoring semicolons in strings/comments) and groups them into executable blocks.
  • Updates the Flink Statements submission command to optionally submit a provided parsed block instead of the whole document.
  • Updates the Flink SQL CodeLens provider to create CodeLenses per executable block and highlight the active block; adds/updates unit tests.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
src/documentParsing/flinkSql.ts New Flink SQL statement/block parsing utilities (splitting, classification, grouping, block lookup).
src/documentParsing/flinkSql.test.ts New unit tests for parsing/classification/block grouping and edge cases.
src/commands/flinkStatements.ts Extends submit command to accept a parsed block and submit block text.
src/codelens/flinkSqlProvider.ts Generates CodeLenses per parsed block, caches parsed blocks, and adds editor decorations for current block.
src/codelens/flinkSqlProvider.test.ts Updates tests for new listener count and verifies per-block CodeLens arguments.

export enum StatementType {
/** {@see https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/set/} */
SET = "SET",
/** {@see https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/use/ */
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSDoc {@see ...} tag on the USE enum member is malformed (missing the closing }), which can break doc tooling and IDE rendering. Fix the tag so the comment is valid JSDoc.

Suggested change
/** {@see https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/use/ */
/** {@see https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/use/} */

Copilot uses AI. Check for mistakes.
Comment on lines +81 to +83
} catch (error) {
logger.error("Error parsing Flink SQL document", error);
return [];
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the catch path, this uses logger.error(...) and then returns []. For consistency with the rest of the extension’s error capture (stack traces + context), prefer calling logError(error, ...) with the document URI in extra, while keeping the graceful empty-result fallback.

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +119
const textUpToPosition = content.substring(0, position);
const lines = textUpToPosition.split("\n");
const line = lines.length - 1;
const column = lines[line].length;
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

positionAt() is O(n) per call (substring + split) and is invoked for every statement start/end. Since parsing can run frequently (codelens + selection changes), consider using openDocument.positionAt() from getEditorOrFileContents() when available, or precomputing line start offsets once per parse to avoid repeated full-string scans.

Suggested change
const textUpToPosition = content.substring(0, position);
const lines = textUpToPosition.split("\n");
const line = lines.length - 1;
const column = lines[line].length;
// Clamp position to valid range to avoid out-of-bounds issues.
const clampedPosition = Math.max(0, Math.min(position, content.length));
let line = 0;
let lastLineStart = 0;
for (let index = 0; index < clampedPosition; index += 1) {
// '\n'
if (content.charCodeAt(index) === 10) {
line += 1;
lastLineStart = index + 1;
}
}
const column = clampedPosition - lastLineStart;

Copilot uses AI. Check for mistakes.
Comment on lines +154 to +162

return {
text: statementText,
type: classifyStatementType(statementText),
range: new Range(positionAt(content, actualStart), positionAt(content, endPos)),
};
}

/**
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Range end for a statement is created from the raw endPos, even though statementText is trimmed. If the document ends with trailing whitespace/newlines (common), this can make block/decoration ranges extend onto blank lines. Consider trimming trailing whitespace for the range end (while still preserving any attached trailing -- comment).

Suggested change
return {
text: statementText,
type: classifyStatementType(statementText),
range: new Range(positionAt(content, actualStart), positionAt(content, endPos)),
};
}
/**
const adjustedEnd = getTrimmedStatementEnd(content, startPos, endPos);
return {
text: statementText,
type: classifyStatementType(statementText),
range: new Range(positionAt(content, actualStart), positionAt(content, adjustedEnd)),
};
}
/**
* Compute the end position for a statement range, trimming trailing whitespace and newlines
* while preserving any trailing single-line `--` comment that is attached to the statement.
*
* @param content - Full document content
* @param startPos - Start character position of the statement
* @param endPos - Original end character position of the statement
* @returns Adjusted end character position for the statement range
*/
function getTrimmedStatementEnd(content: string, startPos: number, endPos: number): number {
const slice = content.substring(startPos, endPos);
if (!slice.length) {
return endPos;
}
// If there's a trailing single-line comment on the last line, keep the entire comment.
const trailingCommentMatch = slice.match(/--[^\n]*$/);
if (trailingCommentMatch && typeof trailingCommentMatch.index === "number") {
return startPos + trailingCommentMatch.index + trailingCommentMatch[0].length;
}
// Otherwise, trim trailing whitespace (including newlines) from the slice.
const trimmedSliceLength = slice.replace(/\s+$/u, "").length;
return startPos + trimmedSliceLength;
}
/**

Copilot uses AI. Check for mistakes.
Comment on lines +134 to +137
// TODO: handle SET/USE statements through the language client in a follow-up PR
const executableStatements: ParsedStatement[] = block.statements.filter(
(statement) => classifyStatementType(statement.text) === StatementType.EXECUTABLE,
);
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

block.statements already include a type field from the parser, but this code re-classifies each statement from text. Filtering via statement.type === StatementType.EXECUTABLE avoids duplicated logic and prevents any future divergence between parsing-time classification and submission-time classification.

Copilot uses AI. Check for mistakes.
Comment on lines +521 to +524

// unclosed string regex fails to match, so the ' is treated as code and the
// semicolon is found normally. the language server handles the syntax error.
assert.ok(blocks.length >= 0);
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is always true and doesn’t validate behavior, so the test won’t catch regressions. Please assert the expected parsed blocks/statements for the unclosed-string input (e.g., expected block count and statement text).

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +534 to +537
// unclosed comment regex fails to match, so /* is treated as code and both
// semicolons are found. the language server handles the syntax error.
assert.ok(blocks.length >= 1);
});
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is too weak to validate behavior (>= 1 allows many incorrect outcomes). Please assert the concrete expected parsed blocks/statements for the unclosed multi-line comment input so the test provides real regression coverage.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +56 to +57
// subtle background coloring for the statement block, to help the user visualize what will be
// sent when they click the "Submit Statement" codelens
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decoration is described as showing what will be sent on "Submit Statement", but submission currently filters out SET/USE statements from a block. For blocks with config statements, the highlighted range will not match what is actually submitted. Either submit the full block text, or adjust the decoration/UI to reflect that config statements are ignored (or warn the user).

Suggested change
// subtle background coloring for the statement block, to help the user visualize what will be
// sent when they click the "Submit Statement" codelens
// Subtle background coloring for the parsed statement block, to help the user understand the
// block boundaries. Note: configuration statements (e.g. SET/USE) within the highlighted block
// are not submitted as part of the "Submit Statement" command.

Copilot uses AI. Check for mistakes.
Comment on lines +42 to +44
// cache of parsed blocks per document URI
private readonly documentBlocksCache = new Map<string, ExecutableBlock[]>();

Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documentBlocksCache entries are invalidated on edits but never removed when a document is closed. Over long sessions opening many Flink SQL documents, this Map can grow without bound. Add a workspace.onDidCloseTextDocument handler (or similar) to delete closed documents from the cache.

Copilot uses AI. Check for mistakes.
Comment on lines +129 to +133
const uriKey = document.uri.toString();
let blocks = this.documentBlocksCache.get(uriKey);
if (!blocks) {
blocks = await parseFlinkSqlDocument(document.uri);
this.documentBlocksCache.set(uriKey, blocks);
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateBlockDecorations() is async and invoked via void from event handlers. If parsing is slow and selections change quickly, earlier calls can resolve after later ones and overwrite decorations with stale results. Consider a per-document request id / cancellation guard so only the latest invocation updates decorations.

Copilot uses AI. Check for mistakes.
@sonarqube-confluent
Copy link

Quality Gate failed Quality Gate failed

Failed conditions
76.8% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube

@shouples Dave Shoup (shouples) added the on hold Blocked until additional functionality is available or more discussions have happened. label Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

on hold Blocked until additional functionality is available or more discussions have happened.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants