feat: add tableau-extract-calcs script and tableau-upstream slash command by GabyRangelB · Pull Request #3473 · TEAMSchools/teamster

GabyRangelB · 2026-03-17T23:10:46Z

Summary

Adds scripts/tableau-extract-calcs.py to extract calculated fields from Tableau workbooks (local .twbx/.twb or downloaded from Tableau Server). Supports exposure-aware mode that resolves the workbook LSID and depends_on models directly from models/exposures/tableau.yml.
Adds .claude/commands/tableau-upstream.md — a /tableau-upstream slash command that guides Claude through the full upstream workflow: extract calcs → identify target rpt_tableau__ model → translate Tableau formula syntax to BigQuery SQL → update model SQL and YAML → validate with dbt show.

Test plan

Run uv run scripts/tableau-extract-calcs.py --exposure gradebook_and_gpa_dashboard --list-only and confirm it prints the workbook LSID and depends_on models from tableau.yml without making network calls
Run the script against a local .twbx file and confirm the markdown table of calculated fields is printed correctly
Run /tableau-upstream in a new Claude session and confirm the guided workflow prompts correctly at each step

🤖 Generated with Claude Code

…mand Adds two artifacts to support moving Tableau calculated fields upstream into dbt models: - scripts/tableau-extract-calcs.py: extracts calculated fields from a .twbx/.twb file (local or downloaded from Tableau Server). Supports exposure-aware mode that looks up the workbook LSID and depends_on models directly from models/exposures/tableau.yml. - .claude/commands/tableau-upstream.md: /tableau-upstream slash command that guides Claude through the full workflow — run the extraction script, identify the target rpt_tableau__ model, translate Tableau formula syntax to BigQuery SQL, update the model SQL and YAML, and validate with dbt show. Closes #3472 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-17T23:11:01Z

Claude finished @GabyRangelB's task in 1m 10s —— View job

Code Review: PR #3473 — tableau-extract-calcs + tableau-upstream command

Tasks

Gather context and read changed files
Review scripts/tableau-extract-calcs.py
Review .claude/commands/tableau-upstream.md
Post final review

`scripts/tableau-extract-calcs.py`

Overall this is clean, well-structured, and follows the project's uv inline-script convention correctly. A few issues worth addressing:

Bug: extract_twb_bytes_from_bytes defined after it is called

extract_twb_bytes_from_bytes is called at lines 256 and 272 but is defined at line 278 — after main(). In Python this only works if main() is called at runtime (which it is via if __name__ == "__main__"), so it won't cause a NameError at import time. But it's confusing ordering. Move the function up near extract_twb_bytes (around line 32) so all XML-extraction helpers are co-located.

Unhandled StopIteration in zip traversal

# lines 30-31 and 281
twb_name = next(n for n in zf.namelist() if n.endswith(".twb"))

If the zip contains no .twb entry this raises a bare StopIteration. Should be:

twb_name = next((n for n in zf.namelist() if n.endswith(".twb")), None)
if twb_name is None:
    raise ValueError(f"No .twb file found in archive: {path}")

No signout call after Tableau Server auth

signin() acquires a server token but there is no corresponding call to DELETE /auth/signout. For an admin-level personal token this may not matter, but for a username/password session it leaves an active session on the server. Consider adding a signout() helper and calling it in a try/finally block.

stream=True on download but resp.content used

Line 89–91:

resp = requests.get(url, ..., stream=True, timeout=60)
resp.raise_for_status()
return resp.content

stream=True defers body loading, but resp.content then loads the entire response into memory anyway. Either use stream=True with chunked reading (appropriate for very large workbooks), or drop stream=True and keep resp.content. For typical .twbx sizes, dropping stream=True is simpler.

Credential handling

Passwords are collected via getpass.getpass() and passed as plain strings. This is fine for a CLI script, but worth noting: the password is briefly present in the signin() JSON payload in memory. No issue for typical use, just a heads-up if secrets scanning is a concern.

Minor: workbook_id can be None at line 244 check

get_workbook_id_from_exposure returns str | None. The if not workbook_id: check at line 244 correctly handles None and "". This is fine.

Minor: --list-only with --file silently ignored

If a user passes --file myfile.twbx --list-only, the --list-only flag is silently ignored (the file path hits the early return at line 226 before the list_only check). Consider either:

Wiring --list-only to be only valid with --exposure (via argparse group constraints), or
At minimum noting in the --list-only help string that it only applies to --exposure mode.

`.claude/commands/tableau-upstream.md`

Clean, well-structured command. The step-by-step flow is logical and the translation table covers the common cases. A few observations:

Step 2 hard-codes kipptaf path

src/dbt/kipptaf/models/extracts/tableau/rpt_tableau__<name>.sql

The script itself hard-codes EXPOSURES_PATH to kipptaf as well. This is fine since the Tableau extract workflow only applies to kipptaf, but worth a short comment in both places so future contributors know this is intentional.

Step 3 translation table — missing ATTR() and MIN()/MAX() as scalar

Tableau authors frequently use ATTR([Field]) (returns the single value of a dimension if unambiguous) and MIN([Field])/MAX([Field]) as row-level scalars (not aggregates). These translate to the plain column reference in BigQuery SQL ([Field]). Not blocking, but adding them would prevent confusion when encountered.

Step 5 validation sequence

The dagster-dbt project prepare-and-package command is listed as a prerequisite for dbt show, which is correct. However, the instruction says "Then via the dbt MCP tool or terminal" — it may be worth being explicit that when using the dbt MCP tool, prepare-and-package may already have been run by a previous step, so Claude should check rather than re-run it unconditionally (it can be slow).

No mention of dbt compile as a faster first check

Running dbt show on a model with new columns added is the right final check, but dbt compile --select <model> is faster for catching SQL syntax errors before hitting the warehouse. Consider adding it between editing the SQL and running dbt show.

Summary

The core logic is solid and the script is a genuine productivity tool. The main items to address before merging:

Bug: Move extract_twb_bytes_from_bytes before main() (function ordering)
Bug: Handle StopIteration in .twb zip name lookup
Improvement: Clarify --list-only only applies to --exposure mode
Nice-to-have: Drop stream=True or use it with chunked reading consistently

The slash command is ready to merge as-is; the notes above are optional improvements.

- Replace xml.etree.ElementTree with defusedxml to resolve Bandit B314 - Default --server and --site from TABLEAU_SERVER_ADDRESS / TABLEAU_SITE_ID env vars (already injected via 1Password for all team members) - Authenticate via TABLEAU_TOKEN_NAME + TABLEAU_PERSONAL_ACCESS_TOKEN env vars by default (no password prompt); fall back to --username/password only when PAT env vars are absent Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Tableau Server returns XML by default. Without the Accept header the signin and workbook list endpoints return XML, causing a JSONDecodeError when the script tries to parse the response. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Without --datasource, the script now prints a table of unique data sources and their calc counts, then exits with instructions to re-run with -d. This avoids flooding output with 983 fields across all sources when the user only wants one. Also fixes Bandit B314 cleanly by dropping the stdlib xml import entirely (defusedxml.ElementTree handles all parsing) and removing the type annotation from private helpers where the Element type was only needed as a hint. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… command Script: when run with no arguments, launches a guided prompt sequence — numbered exposure list, Tableau Server download, numbered datasource picker — so users don't need to know any CLI flags. Slash command: rewritten to drive the same workflow through Claude chat. Removed all workbook-specific examples (replaced with generic placeholders) so the command works for any exposure. Also fixed markdown lint warnings (table alignment, fenced code block languages). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Design for smart classification of Tableau calculated fields into READY / NEEDS WORK / LOD / SKIP categories, with semantic layer recommendations for LOD expressions and slash command UX updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Design for Step 4b in /tableau-upstream: trace calculated field refs through the dbt DAG, recommend closest-to-extract insertion point, draft BigQuery SQL at that layer, and let the analyst choose how far upstream to push each field. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Covers field categorization (READY/NEEDS WORK/LOD/SKIP), --model flag, LOD mart-matching, grouped script output, and /tableau-upstream slash command updates (Step 4b lineage tracing, output format preference, LOD/NEEDS WORK handling). Reviewed and approved via automated spec review loop. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…iterals upstream - Re-prompt with valid range when user enters an out-of-bounds number for workbook or datasource selection (Steps 1 and 2) - Before marking bare org-name string literals as SKIP, check int_extracts__student_enrollments for a district/equivalent column and propose adding it to the extract instead Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

GabyRangelB · 2026-03-18T01:25:44Z

@claude please review this PR for vulnerabilities and/or exposed sensitive information

github-actions · 2026-03-18T01:25:57Z

Claude finished @GabyRangelB's task in 1m 12s —— View job

Security Review: PR #3473

Tasks

Gather context
Read scripts/tableau-extract-calcs.py for security issues
Read .claude/commands/tableau-upstream.md for sensitive info
Read docs files for sensitive info
Post final security review

Security Review Results

No hardcoded credentials or sensitive information found. The PR is clean in that regard. Below is a full assessment of security-relevant patterns.

`scripts/tableau-extract-calcs.py`

✅ Strengths

No hardcoded secrets. All credentials (TABLEAU_TOKEN_NAME, TABLEAU_PERSONAL_ACCESS_TOKEN, TABLEAU_SERVER_ADDRESS, TABLEAU_SITE_ID) are read from environment variables. Correct pattern for a CLI tool.
defusedxml used for XML parsing (line 18). This is the right choice — it protects against XML bomb (billion laughs), XXE (XML External Entity), and related XML injection attacks. A plain xml.etree.ElementTree parse of untrusted .twbx files would be exploitable.
TLS verification not disabled. No verify=False on any requests call, so certificate validation is active by default.
No shell execution. The script never uses subprocess, os.system, or eval. No command injection surface.
Password collected via getpass (line 168), not via sys.argv or an env var — avoids leaking credentials into shell history or ps output.

⚠️ Findings

1. No HTTPS enforcement on server URL — medium risk for internal tool

Lines 121, 139, 182, 195: The server URL is taken from --server or $TABLEAU_SERVER_ADDRESS and used directly in requests without validating that it starts with https://. If an operator accidentally configures http://, the PAT or password is transmitted in plaintext over the network.

# line 121 — no scheme check before sending credentials
url = f"{server}/api/{TABLEAU_API_VERSION}/auth/signin"

Recommend adding a guard in get_auth_token or main:

if not server.startswith("https://"):
    raise SystemExit("Server URL must use HTTPS to protect credentials.")

2. Auth token never signed out — low risk

signin_pat / signin_password (lines 119–150) return a session token but there is no corresponding DELETE /auth/signout call anywhere. For PAT-based auth the token is short-lived and scoped, so the practical impact is low. For username/password sessions it leaves an indefinitely active session on the server until it expires server-side. A try/finally signout would be cleaner hygiene.

3. Tableau API filter value not sanitized — very low risk for internal use

Line 198:

params={"filter": f"name:eq:{workbook_name}"},

Tableau's REST API filter syntax uses commas and colons as delimiters. A workbook name containing these characters (e.g. "foo:eq:bar,name:neq") could potentially manipulate the filter query. Since workbook_name comes from the operator's own CLI argument and the tool is used internally, the practical risk is negligible — but worth noting if the script is ever called with user-supplied input.

4. yaml.safe_load used correctly — no issue

Line 213: yaml.safe_load (not yaml.load) is used to parse the exposures YAML. This is the safe pattern and avoids arbitrary code execution via crafted YAML.

`.claude/commands/tableau-upstream.md` and `docs/` files

No credentials, tokens, server addresses, or internal system names that aren't already documented elsewhere in the repo.
The docs reference generic model naming conventions (rpt_tableau__*), BigQuery dataset names, and dbt MCP tool names — none of these are sensitive.
No PII, no API keys, no internal hostnames.

Summary

Issue	Severity	Location
No HTTPS enforcement before sending credentials	Medium	`get_auth_token`, lines 164–169
Session token not signed out after use	Low	`signin_pat`/`signin_password`, lines 119–150
Tableau API filter value not sanitized	Very low / informational	`find_workbook_id_by_name`, line 198
Hardcoded credentials	None	—
Exposed secrets in docs or command file	None	—
XML injection risk	None (mitigated by `defusedxml`)	—

The only item worth addressing before merge is the HTTPS enforcement check — it's a one-liner guard that prevents accidental credential exposure if the env var or flag is misconfigured. The rest are informational.

…adable names - Add _build_calc_name_map() to map Calculation_* and *(copy)_* XML names to their human captions within a datasource element - Add _resolve_calc_refs() to substitute those IDs in formulas before display - Wire both into _extract_fields() so all output (and future classification) sees human-readable field names instead of opaque Tableau internal IDs - Add Task 0 to the implementation plan covering this with TDD steps Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Enforce HTTPS before sending credentials: get_auth_token() now raises SystemExit if the server URL does not start with https:// - Sign out session token after use: add signout() and wrap all three download paths (interactive, --exposure, --workbook) in try/finally so the token is always invalidated, even on error - Sanitize Tableau API filter value: strip ':' and ',' from workbook names in find_workbook_id_by_name() to prevent filter injection Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

scripts/tableau-extract-calcs.py

…anatory comment CodeQL flagged the bare `except: pass` as an empty handler with no explanation. Keep best-effort semantics but name the exception, add a comment clarifying the intent, and print to stderr when TABLEAU_DEBUG is set. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cbini · 2026-03-30T22:08:00Z

Superseded by PR #3537 — tableau-analyze-workbook.py replaces tableau-extract-calcs.py with viz usage tracking, 4-bucket verdict system, and tests. See #3500.

GabyRangelB requested a review from a team as a code owner March 17, 2026 23:10

GabyRangelB and others added 8 commits March 17, 2026 23:20

fix: sort exposure list alphabetically in interactive mode

880d3b4

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

GabyRangelB requested a review from a team as a code owner March 18, 2026 00:29

GabyRangelB and others added 2 commits March 18, 2026 00:34

fix: add language tags to fenced code blocks in categorization spec

81d853d

GabyRangelB and others added 2 commits March 18, 2026 13:50

github-code-quality bot found potential problems Mar 18, 2026

View reviewed changes

scripts/tableau-extract-calcs.py Fixed Show fixed Hide fixed

cbini closed this Mar 30, 2026

cbini mentioned this pull request Mar 30, 2026

feat: tableau-to-dbt upstream workflow tool #3472

Closed

cbini deleted the claude/feat/tableau-upstream-tool branch March 30, 2026 22:08

cbini restored the claude/feat/tableau-upstream-tool branch March 31, 2026 14:32

cbini mentioned this pull request Mar 31, 2026

feat: star-schema-advisor slash command #3500

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add tableau-extract-calcs script and tableau-upstream slash command#3473

feat: add tableau-extract-calcs script and tableau-upstream slash command#3473
GabyRangelB wants to merge 14 commits intomainfrom
claude/feat/tableau-upstream-tool

GabyRangelB commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

GabyRangelB commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

cbini commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

GabyRangelB commented Mar 17, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: PR #3473 — tableau-extract-calcs + tableau-upstream command

scripts/tableau-extract-calcs.py

.claude/commands/tableau-upstream.md

Summary

Uh oh!

GabyRangelB commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Security Review: PR #3473

Security Review Results

scripts/tableau-extract-calcs.py

✅ Strengths

⚠️ Findings

.claude/commands/tableau-upstream.md and docs/ files

Summary

Uh oh!

Uh oh!

cbini commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Mar 17, 2026 •

edited

Loading

`scripts/tableau-extract-calcs.py`

`.claude/commands/tableau-upstream.md`

github-actions bot commented Mar 18, 2026 •

edited

Loading

`scripts/tableau-extract-calcs.py`

`.claude/commands/tableau-upstream.md` and `docs/` files