add flinksql document parsing for block-level submissions#3314
add flinksql document parsing for block-level submissions#3314Dave Shoup (shouples) wants to merge 3 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds Flink SQL document parsing so the extension can identify “executable blocks” (e.g., SET/USE statements paired with the next executable statement) and drive block-level “Submit Statement” CodeLens behavior.
Changes:
- Introduces a Flink SQL parser that splits statements safely (ignoring semicolons in strings/comments) and groups them into executable blocks.
- Updates the Flink Statements submission command to optionally submit a provided parsed block instead of the whole document.
- Updates the Flink SQL CodeLens provider to create CodeLenses per executable block and highlight the active block; adds/updates unit tests.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/documentParsing/flinkSql.ts | New Flink SQL statement/block parsing utilities (splitting, classification, grouping, block lookup). |
| src/documentParsing/flinkSql.test.ts | New unit tests for parsing/classification/block grouping and edge cases. |
| src/commands/flinkStatements.ts | Extends submit command to accept a parsed block and submit block text. |
| src/codelens/flinkSqlProvider.ts | Generates CodeLenses per parsed block, caches parsed blocks, and adds editor decorations for current block. |
| src/codelens/flinkSqlProvider.test.ts | Updates tests for new listener count and verifies per-block CodeLens arguments. |
| export enum StatementType { | ||
| /** {@see https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/set/} */ | ||
| SET = "SET", | ||
| /** {@see https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/use/ */ |
There was a problem hiding this comment.
The JSDoc {@see ...} tag on the USE enum member is malformed (missing the closing }), which can break doc tooling and IDE rendering. Fix the tag so the comment is valid JSDoc.
| /** {@see https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/use/ */ | |
| /** {@see https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/use/} */ |
| } catch (error) { | ||
| logger.error("Error parsing Flink SQL document", error); | ||
| return []; |
There was a problem hiding this comment.
In the catch path, this uses logger.error(...) and then returns []. For consistency with the rest of the extension’s error capture (stack traces + context), prefer calling logError(error, ...) with the document URI in extra, while keeping the graceful empty-result fallback.
| const textUpToPosition = content.substring(0, position); | ||
| const lines = textUpToPosition.split("\n"); | ||
| const line = lines.length - 1; | ||
| const column = lines[line].length; |
There was a problem hiding this comment.
positionAt() is O(n) per call (substring + split) and is invoked for every statement start/end. Since parsing can run frequently (codelens + selection changes), consider using openDocument.positionAt() from getEditorOrFileContents() when available, or precomputing line start offsets once per parse to avoid repeated full-string scans.
| const textUpToPosition = content.substring(0, position); | |
| const lines = textUpToPosition.split("\n"); | |
| const line = lines.length - 1; | |
| const column = lines[line].length; | |
| // Clamp position to valid range to avoid out-of-bounds issues. | |
| const clampedPosition = Math.max(0, Math.min(position, content.length)); | |
| let line = 0; | |
| let lastLineStart = 0; | |
| for (let index = 0; index < clampedPosition; index += 1) { | |
| // '\n' | |
| if (content.charCodeAt(index) === 10) { | |
| line += 1; | |
| lastLineStart = index + 1; | |
| } | |
| } | |
| const column = clampedPosition - lastLineStart; |
|
|
||
| return { | ||
| text: statementText, | ||
| type: classifyStatementType(statementText), | ||
| range: new Range(positionAt(content, actualStart), positionAt(content, endPos)), | ||
| }; | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
The Range end for a statement is created from the raw endPos, even though statementText is trimmed. If the document ends with trailing whitespace/newlines (common), this can make block/decoration ranges extend onto blank lines. Consider trimming trailing whitespace for the range end (while still preserving any attached trailing -- comment).
| return { | |
| text: statementText, | |
| type: classifyStatementType(statementText), | |
| range: new Range(positionAt(content, actualStart), positionAt(content, endPos)), | |
| }; | |
| } | |
| /** | |
| const adjustedEnd = getTrimmedStatementEnd(content, startPos, endPos); | |
| return { | |
| text: statementText, | |
| type: classifyStatementType(statementText), | |
| range: new Range(positionAt(content, actualStart), positionAt(content, adjustedEnd)), | |
| }; | |
| } | |
| /** | |
| * Compute the end position for a statement range, trimming trailing whitespace and newlines | |
| * while preserving any trailing single-line `--` comment that is attached to the statement. | |
| * | |
| * @param content - Full document content | |
| * @param startPos - Start character position of the statement | |
| * @param endPos - Original end character position of the statement | |
| * @returns Adjusted end character position for the statement range | |
| */ | |
| function getTrimmedStatementEnd(content: string, startPos: number, endPos: number): number { | |
| const slice = content.substring(startPos, endPos); | |
| if (!slice.length) { | |
| return endPos; | |
| } | |
| // If there's a trailing single-line comment on the last line, keep the entire comment. | |
| const trailingCommentMatch = slice.match(/--[^\n]*$/); | |
| if (trailingCommentMatch && typeof trailingCommentMatch.index === "number") { | |
| return startPos + trailingCommentMatch.index + trailingCommentMatch[0].length; | |
| } | |
| // Otherwise, trim trailing whitespace (including newlines) from the slice. | |
| const trimmedSliceLength = slice.replace(/\s+$/u, "").length; | |
| return startPos + trimmedSliceLength; | |
| } | |
| /** |
| // TODO: handle SET/USE statements through the language client in a follow-up PR | ||
| const executableStatements: ParsedStatement[] = block.statements.filter( | ||
| (statement) => classifyStatementType(statement.text) === StatementType.EXECUTABLE, | ||
| ); |
There was a problem hiding this comment.
block.statements already include a type field from the parser, but this code re-classifies each statement from text. Filtering via statement.type === StatementType.EXECUTABLE avoids duplicated logic and prevents any future divergence between parsing-time classification and submission-time classification.
|
|
||
| // unclosed string regex fails to match, so the ' is treated as code and the | ||
| // semicolon is found normally. the language server handles the syntax error. | ||
| assert.ok(blocks.length >= 0); |
There was a problem hiding this comment.
This assertion is always true and doesn’t validate behavior, so the test won’t catch regressions. Please assert the expected parsed blocks/statements for the unclosed-string input (e.g., expected block count and statement text).
| // unclosed comment regex fails to match, so /* is treated as code and both | ||
| // semicolons are found. the language server handles the syntax error. | ||
| assert.ok(blocks.length >= 1); | ||
| }); |
There was a problem hiding this comment.
This assertion is too weak to validate behavior (>= 1 allows many incorrect outcomes). Please assert the concrete expected parsed blocks/statements for the unclosed multi-line comment input so the test provides real regression coverage.
| // subtle background coloring for the statement block, to help the user visualize what will be | ||
| // sent when they click the "Submit Statement" codelens |
There was a problem hiding this comment.
The decoration is described as showing what will be sent on "Submit Statement", but submission currently filters out SET/USE statements from a block. For blocks with config statements, the highlighted range will not match what is actually submitted. Either submit the full block text, or adjust the decoration/UI to reflect that config statements are ignored (or warn the user).
| // subtle background coloring for the statement block, to help the user visualize what will be | |
| // sent when they click the "Submit Statement" codelens | |
| // Subtle background coloring for the parsed statement block, to help the user understand the | |
| // block boundaries. Note: configuration statements (e.g. SET/USE) within the highlighted block | |
| // are not submitted as part of the "Submit Statement" command. |
| // cache of parsed blocks per document URI | ||
| private readonly documentBlocksCache = new Map<string, ExecutableBlock[]>(); | ||
|
|
There was a problem hiding this comment.
documentBlocksCache entries are invalidated on edits but never removed when a document is closed. Over long sessions opening many Flink SQL documents, this Map can grow without bound. Add a workspace.onDidCloseTextDocument handler (or similar) to delete closed documents from the cache.
| const uriKey = document.uri.toString(); | ||
| let blocks = this.documentBlocksCache.get(uriKey); | ||
| if (!blocks) { | ||
| blocks = await parseFlinkSqlDocument(document.uri); | ||
| this.documentBlocksCache.set(uriKey, blocks); |
There was a problem hiding this comment.
updateBlockDecorations() is async and invoked via void from event handlers. If parsing is slow and selections change quickly, earlier calls can resolve after later ones and overwrite decorations with stale results. Consider a per-document request id / cancellation guard so only the latest invocation updates decorations.
|


Summary of Changes
Click-testing instructions
Optional: Any additional details or context that should be provided?
Pull request checklist
Please check if your PR fulfills the following (if applicable):
Tests
Release notes