Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
c1b86aa
add devcontainer
Oct 25, 2022
6e303d5
create tests folder
maye-msft Oct 25, 2022
c58e80f
add test for streaming job
maye-msft Oct 25, 2022
ec688cd
add local debug settings
maye-msft Oct 25, 2022
81f2dce
add pytest.ini
maye-msft Oct 25, 2022
407a004
add streaming job assertion
maye-msft Oct 26, 2022
4b329a2
Merge pull request #4 from gary918/dev
maye-msft Oct 26, 2022
8ec6422
Merge pull request #7 from Azure/ci/yem/5-unit-test-of-batch-pipeline
maye-msft Oct 26, 2022
5fafb42
Merge pull request #8 from Azure/ci/yem/6-unit-test-of-streaming-pipe…
maye-msft Oct 26, 2022
7405ac7
Create python-app.yml
gary918 Oct 27, 2022
674d0bc
Update python-app.yml
gary918 Oct 28, 2022
c6f84ee
add devcontainer
Oct 25, 2022
aaafd5b
create tests folder
maye-msft Oct 25, 2022
e456d28
add test for streaming job
maye-msft Oct 25, 2022
945c996
add streaming job assertion
maye-msft Oct 26, 2022
feb3870
Merge branch 'dev' of https://github.com/gary918/config-driven-data-p…
Oct 28, 2022
44cb86a
Create pylint.yml
gary918 Oct 31, 2022
5b9476e
Merge pull request #1 from gary918/dev-1
gary918 Oct 31, 2022
b81f0bf
Merge branch 'dev' of https://github.com/gary918/config-driven-data-p…
Oct 31, 2022
c1783e3
update python-app.yml
Oct 31, 2022
7a229e0
remove pylint.yml
Oct 31, 2022
4afc6f8
update python-app.yml
Oct 31, 2022
844655e
add container options
Oct 31, 2022
f797d1b
fix permission error
Oct 31, 2022
e48dafd
update sudo pytest
Oct 31, 2022
efbae5f
use python -m
Oct 31, 2022
0298e5c
use image
Oct 31, 2022
b283058
use python
Oct 31, 2022
b9a3bc8
chmod +rw
Oct 31, 2022
70ec7a6
change FileStore
Oct 31, 2022
7c01171
add pytest
Oct 31, 2022
e7d16ba
add coverage
Oct 31, 2022
df9d6dc
add cov term
Oct 31, 2022
d960707
add pytest-cov
Oct 31, 2022
c653c21
add cov xml
Oct 31, 2022
0d0efe4
add cov html
Oct 31, 2022
6d05d71
add cov comment
Oct 31, 2022
985b53a
add cov comment
Oct 31, 2022
b06e9b6
add cov comment
Oct 31, 2022
3d6ebdf
cov term
Oct 31, 2022
2506671
add badge
Nov 1, 2022
9298e63
badge for dev
Nov 1, 2022
48afd67
change name to CI
Nov 1, 2022
efa736f
html to xml
Nov 1, 2022
7dcae76
xml to html
Nov 1, 2022
d18f130
add CI pipeline
Nov 1, 2022
9cd1ed8
add trigger of PR to dev
Nov 1, 2022
f88346e
remove permission
Nov 1, 2022
132b8ec
Merge pull request #10 from gary918/dev
maye-msft Nov 3, 2022
76a74ee
add trigger to feat and ci branches
maye-msft Nov 15, 2022
4273f1d
-fix UI issue
maye-msft Nov 24, 2022
6ddce7b
add config ui for ADLS Gen2 and JDBC
maye-msft Dec 2, 2022
e2748cd
Merge pull request #13 from Azure/feat/yem/12-support-ingestion-of-adls
gary918 Dec 2, 2022
35f1b54
- add dockerfile to run Flask
maye-msft Dec 5, 2022
3881f90
Adding the missing parameter to the run_pipeline in _init_.py
buhongw7583c Dec 6, 2022
0c544b8
-let devcontainer use the dockfile in sec folder
maye-msft Dec 6, 2022
0786e02
quick fix ADLS_g_2 and add ADLS config json
Nick287 Dec 7, 2022
e3689a9
Merge pull request #24 from buhongw7583c/task#23_Hong
buhongw7583c Dec 7, 2022
c5d67df
Merge pull request #22 from Azure/feat/yem/21-add-docker-file-to-run-…
maye-msft Dec 8, 2022
eff0bbc
Merge pull request #28 from Azure/feat/bwa/14-validate-ingestion-from…
Nick287 Dec 8, 2022
00ab591
Add pipeline for processing parking sensor data (#31)
gary918 Dec 12, 2022
ca86ac8
implement the pipeline export function (#20)
maye-msft Dec 12, 2022
cf1905e
change the UI layout
maye-msft Dec 12, 2022
b9445e1
add parking-sensor dashboard and related changes to deploy
cchenshu Dec 12, 2022
06030b3
add task dependency
maye-msft Dec 12, 2022
28000c1
Add multi-line SQL\Py support (#38)
maye-msft Dec 12, 2022
3e4a3ed
create a markdown file for parking sensors demo
cchenshu Dec 13, 2022
bec5bba
create a markdown file for parking sensors demo
cchenshu Dec 14, 2022
1026b99
Merge pull request #34 from Azure/feat/chenshu/16-parking-sensor-demo
cchenshu Dec 14, 2022
515dd26
feat: add new pipeline creating document (#40)
cchenshu Dec 21, 2022
9f49491
fix: update mermaid related CDN urls
thurstonchen Mar 22, 2023
9188e25
Merge pull request #50 from Azure/fix/mermaid-rendering-issue
gary918 Mar 22, 2023
32becdc
feat: enable jobs to run on synapse (#36)
siliang-j-1225 Sep 13, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"name": "PySpark Sample",
"dockerFile": "../Dockerfile",
"context": "..",
"extensions": [
"ms-python.python",
"njpwerner.autodocstring"
],
"onCreateCommand": "pip install -r ./requirements_dev.txt"
}
3 changes: 3 additions & 0 deletions .devcontainer/settings.vscode.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"python.pythonPath": "/usr/local/bin/python"
}
44 changes: 44 additions & 0 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python

name: CI

on:
workflow_dispatch:
push:
branches: [ "dev", "feat/**", "ci/**" ]
pull_request:
branches: [ "main","dev" ]

permissions:
contents: read

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.10
uses: actions/setup-python@v3
with:
python-version: "3.10"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8 pytest
if [ -f requirements_dev.txt ]; then pip install -r requirements_dev.txt; fi
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names (ignore F821)
flake8 . --count --select=E9,F63,F7,F82 --ignore=F821 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
# flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
# pytest and show the report in term and save it as html files
pytest --cov=src/ tests/ --ignore=integration --cov-report term --cov-report html:code-coverage
- name: Archive code coverage results
uses: actions/upload-artifact@v3
with:
name: code-coverage-report
path: ./code-coverage
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -130,3 +130,6 @@ dmypy.json
spark-warehouse/
data/storage/
tmp/
FileStore/
stg_data1/
.DS_Store
34 changes: 32 additions & 2 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,25 @@
"request": "launch",
"program": "src/main.py",
"console": "integratedTerminal",
"args": ["--config-path", "./example/pipeline_fruit.json", "--working-dir", "./tmp", "--show-result", "True", "--build-landing-zone", "True"],
"args": ["--config-path", "./example/pipeline_fruit.json", "--working-dir", "./tmp", "--show-result", "--build-landing-zone"],
"justMyCode": true
},
{
"name": "Python: main.py parking sensor",
"type": "python",
"request": "launch",
"program": "src/main.py",
"console": "integratedTerminal",
"args": ["--config-path", "./example/pipeline_parking_sensors.json", "--working-dir", "./tmp", "--show-result", "--cleanup-database"],
"justMyCode": true
},
{
"name": "Python: main.py ADLS",
"type": "python",
"request": "launch",
"program": "src/main.py",
"console": "integratedTerminal",
"args": ["--config-path", "./example/pipeline_fruit_batch_ADLS.json", "--working-dir", "./tmp", "--show-result", "--cleanup-database"],
"justMyCode": true
},
{
Expand All @@ -19,7 +37,7 @@
"request": "launch",
"program": "src/main.py",
"console": "integratedTerminal",
"args": ["--config-path", "./example/pipeline_fruit_parallel.json", "--working-dir", "./tmp", "--show-result", "True", "--await-termination", "30", "--build-landing-zone", "True", "--cleanup-database", "True"],
"args": ["--config-path", "./example/pipeline_fruit_parallel.json", "--working-dir", "./tmp", "--show-result", "--await-termination", "30", "--build-landing-zone", "--cleanup-database"],
"justMyCode": true
},
{
Expand All @@ -34,6 +52,18 @@
"args": ["run", "--no-debugger", "--no-reload"],
"jinja": true,
"justMyCode": true
},
{
"name": "Python: pytest",
"type": "python",
"request": "launch",
"module": "pytest",
"cwd": "${workspaceRoot}",
"env": {
},
"args": ["tests"],
"jinja": true,
"justMyCode": true
}
]
}
32 changes: 32 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
FROM python:3.9.13

# Install pylint
RUN pip install --upgrade pip && \
pip install pylint

# Install git, process tools
RUN apt-get update && apt-get -y install git procps

# Install OpenJDK-11
RUN apt-get update && \
apt-get install -y openjdk-11-jdk && \
apt-get install -y ant && \
apt-get clean;


ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64/
RUN export JAVA_HOME

WORKDIR /usr/src/app
ENV FLASK_APP=./src/app.py
ENV FLASK_RUN_HOST=0.0.0.0
#Server will reload itself on file changes if in dev mode
ENV FLASK_ENV=development
# Add path for pytest
ENV PYTHONPATH /workspaces/config-driven-data-pipeline/src

COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
COPY ./src ./src
COPY ./web ./web
CMD ["flask", "run"]
24 changes: 20 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# Config-Driven Data Pipeline

[![pypi](https://img.shields.io/pypi/v/cddp.svg)](https://pypi.org/project/cddp)
[![pypi](https://img.shields.io/pypi/v/cddp.svg)](https://pypi.org/project/cddp)
[![CI](https://github.com/Azure/config-driven-data-pipeline/actions/workflows/CI.yml/badge.svg?branch=dev)](https://github.com/Azure/config-driven-data-pipeline/actions/workflows/CI.yml)
[![Docker pulls](https://img.shields.io/docker/pulls/mayemsft/cddp-docker.svg)](https://hub.docker.com/r/mayemsft/cddp-docker)



## Why this solution

Expand Down Expand Up @@ -67,15 +71,15 @@ Spark SQL are used in the standardization block and the serving block, one is me
Run the batch mode pipeline in local PySpark environment:

```bash
python src/main.py --config-path ./example/pipeline_fruit_batch.json --working-dir ./tmp --show-result True --build-landing-zone True --cleanup-database True
python src/main.py --config-path ./example/pipeline_fruit_batch.json --working-dir ./tmp --show-result --build-landing-zone --cleanup-database
```

Here is [another example](example/pipeline_fruit_streaming.json) of streaming based data pipeline.

Run the streaming mode pipeline in local PySpark environment:

```bash
python src/main.py --config-path ./example/pipeline_fruit_streaming.json --working-dir ./tmp --await-termination 60 --show-result True --build-landing-zone True --cleanup-database True
python src/main.py --config-path ./example/pipeline_fruit_streaming.json --working-dir ./tmp --await-termination 60 --show-result --build-landing-zone --cleanup-database
```

After running the pipeline, the result will show in the console.
Expand All @@ -89,7 +93,19 @@ After running the pipeline, the result will show in the console.
3| Orange| 28.0
6| Banana| 17.0
2| Peach| 39.0


## Run the CDDP UI with Docker

```bash
docker pull mayemsft/cddp-docker:latest
```

```bash
docker run -d -p 8080:5000 mayemsft/cddp-docker:latest
```



## Reference

- [Medallion Architecture – Databricks](https://www.databricks.com/glossary/medallion-architecture)
Expand Down
File renamed without changes.
Loading