Shared VS Code workspace for the Data Engineering team with pre-configured AI assistant instructions, MCP integrations, and linked repositories.
Required Python version: 3.11.14
Check your current version:
python --versionIf you don't have Python 3.11.14, install it using pyenv:
# Install pyenv (if not already installed)
brew install pyenv
# Install Python 3.11.14
pyenv install 3.11.14
# If you get "already exists" - it's already installed, skip to next step
# Set it as the local version for this workspace
cd mrge-data-team-ai-vscode
pyenv local 3.11.14
# Verify
python --version # Should show 3.11.14Important: After setting pyenv local, restart your terminal or run:
# Reload shell configuration
exec $SHELL
# Verify again
python --versionAlready have 3.11.14 installed? If pyenv install 3.11.14 says "already exists", you just need to:
cd mrge-data-team-ai-vscode
pyenv local 3.11.14
exec $SHELL
python --version # Verify it shows 3.11.14git clone --recurse-submodules git@gitlab.codility.net:data-engineering-team/mrge-data-team-ai-vscode.git
cd mrge-data-team-ai-vscodeIf you already cloned without --recurse-submodules:
git submodule update --init --recursiveThe workspace uses Poetry for unified dependency management across all data platform projects.
Automated setup (recommended):
# One command to set up everything
make setupNote: The setup will automatically check your Python version. If you don't have 3.11.14, you'll get clear instructions on how to install it.
This will:
- ✅ Check Python version (3.11.14 required)
- ✅ Install Poetry (if not already installed)
- ✅ Create virtual environment
- ✅ Install all dependencies (including dev tools)
- ✅ Set up pre-commit hooks
- ✅ Run initial code quality checks
Manual setup (if you prefer step-by-step):
# Install Poetry (if not already installed)
make poetry-install-tool
# Create virtual environment and install all dependencies
make poetry-env-create
# Install pre-commit hooks
make pre-commit-install
# Activate the environment
poetry shellVerify setup:
make poetry-env-infocode .When VS Code opens, install these extensions if you don't have them already:
- GitHub Copilot (
GitHub.copilot) - GitHub Copilot Chat (
GitHub.copilot-chat) - dbt (
dbtlabs.vscode-dbt) - Official dbt extension for syntax highlighting and IntelliSense - dbt Power User (
innoverio.vscode-dbt-power-user) - dbt modeling, lineage, and documentation - Markdown Preview Enhanced (
shd101wyy.markdown-preview-enhanced) - Enhanced markdown preview
VS Code will automatically prompt you to install these when you open the workspace.
VS Code Configuration:
The workspace includes these pre-configured settings in .vscode/settings.json:
- Autosave enabled: Files save automatically 1 second after you stop typing
- dbt syntax highlighting: All
.sqlfiles in dbt projects usejinja-sqlsyntax - Python environment: Points to the Poetry virtual environment at
.venv/bin/python - dbt project paths: Pre-configured for both
data-platform-etlanddata-platform-dagster-groupdbt projects
The workspace comes with pre-configured MCP servers in .vscode/mcp.json:
| Server | What it does | Auth | Setup Instructions |
|---|---|---|---|
| Atlassian | Jira + Confluence access | Browser OAuth (auto-prompt) | None - authenticates on first use |
| GitHub | Repository operations, PR/Issue management | Fine-grained PAT | See GitHub MCP Setup below |
| Databricks | SQL queries, Unity Catalog access | Token auth | See Databricks MCP Setup below |
The GitHub MCP server requires a Fine-grained Personal Access Token:
- Create token: Go to https://github.com/settings/personal-access-tokens/new
- Configure:
- Token name: "MRGE Data Team MCP" (or similar)
- Resource Owner: "mrge-group"
- Expiration: Choose your preference (90 days, 1 year, etc.)
- Repository access: Select "All repositories"
- Repository permissions:
- Read access to: actions, attestations api, code, codespaces metadata, deployments, merge queues, metadata, pages, and repository hooks
- Read and Write access to: commit statuses, discussions, and pull requests
- Generate token and copy it
- Set environment variable:
# Add to your ~/.zshrc or ~/.bashrc export MRGE_GITHUB_PERSONAL_ACCESS_TOKEN="github_pat_..."
- Reload shell:
exec $SHELL
- Reload VS Code: (⌘⇧P → "Developer: Reload Window") for the MCP server to connect.
Note: The MCP configuration directly uses MRGE_GITHUB_PERSONAL_ACCESS_TOKEN environment variable.
The Databricks MCP server requires two environment variables:
# Add to your ~/.zshrc or ~/.bashrc
export HOST_NAME="dbc-3ccb0e4a-5869.cloud.databricks.com"
export DATABRICKS_TOKEN="dapi..." # Get from Databricks Settings → Developer → Access TokensAfter adding these, reload your shell:
exec $SHELLThen reload VS Code (⌘⇧P → "Developer: Reload Window") for the MCP server to connect.
Note: These are the same environment variables used for dbt development (see dbt.md).
Open Copilot Chat (⌘⇧I on macOS) and ask questions about the codebase. The AI agent has context about:
- ETL job structure and patterns
- Data architecture (layers, tables, configs)
- Airflow DAGs and schedules
- Coralogix observability (app names, subsystems)
- Jira/Confluence integration
- Infrastructure (Terraform, Atlantis)
Pull the latest workspace config and submodule changes:
# Update workspace config
git pull
# Update all submodules to their latest branch tips
make update
# Update Python dependencies
make poetry-env-update
# Or all in one go
git pull && make update && make poetry-env-updateThe workspace provides a unified Poetry environment with all dependencies from data-platform-etl and other active projects.
# View all available commands
make help
# Create environment (first time)
make poetry-env-create
# Install/sync dependencies (after git pull)
make poetry-install
# Update all dependencies to latest versions
make poetry-env-update
# Export requirements.txt (for Docker, MWAA, etc.)
make poetry-export
# Activate virtual environment
poetry shell
# Run commands without activating shell
poetry run <command>
# Show environment info
make poetry-env-info
# Clean and rebuild environment
make poetry-clean
make poetry-env-createThe environment includes pre-configured code quality tools:
# Format code
make format # black + isort
# Lint code
make lint # black, isort, flake8 (check only)
# Type check
make typecheck # mypy
# Run tests
make test # pytest
# Run all checks
make check-all # lint + typecheck + test# Start Jupyter Lab
make jupyter
# Start IPython
make ipythonAll tool configurations are in pyproject.toml:
- black: 120 char line length, Python 3.11 target
- isort: black-compatible profile
- pytest: Auto-coverage reporting
- mypy: Type checking with lenient settings
The workspace uses Python 3.11.14 (specified in .python-version). Make sure you have this version installed:
# Check version
python --version
# Install with pyenv
pyenv install 3.11.14
pyenv local 3.11.14.github/
├── copilot-instructions.md # Core AI instructions (always loaded)
└── copilot-docs/ # Detailed docs (loaded on demand by AI)
├── dbt.md # dbt development workflow
├── databricks.md # Databricks Unity Catalog & querying
├── airflow.md # DAG schedules & orchestration
├── aws-cli.md # AWS CLI usage & authentication
├── github.md # GitHub workflows, PRs, repository info
├── atlassian.md # Jira + Confluence MCP usage
└── pull-requests.md # PR creation & description guidelines
.vscode/
└── mcp.json # MCP server configs (Atlassian, GitHub, Databricks)
Submodules:
├── de-etl-jobs/ # Main ETL repository
├── airflow/ # DAGs & job configs
├── codility/ # Monolith source code
├── solution-similarity/ # Similarity inference API
└── infra-core/ # Terraform IaC
main. Direct pushes to main are blocked.
To update AI instructions or workspace config:
- Create a feature branch:
git checkout -b feat/update-instructions - Edit files in
.github/or.vscode/ - Commit and push:
git push -u origin feat/update-instructions - Create a Pull Request to merge into
main - After merge, teammates run
git pullto get the updates
Branch naming conventions:
feat/*- New featuresfix/*- Bug fixeschore/*- Maintenance (dependencies, configs, submodule updates)docs/*- Documentation updates