Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions .github/workflows/publish-docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# This workflow builds and publishes the latest docs to
# the `gh-pages` branch.
# For more details: https://github.com/marketplace/actions/deploy-to-github-pages
name: Publish docs

on:
release:
types: [created]
workflow_dispatch:

jobs:
build-and-deploy:
runs-on: ubuntu-latest
permissions:
contents: write
pages: write
defaults:
run:
shell: bash -l {0}
steps:
- uses: actions/checkout@v2
with:
# fetch all tags so `versioneer` can properly determine current version
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.11'
- name: Install pandoc
uses: pandoc/actions/setup@v1
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install -r requirements-ml.txt
pip install -r requirements-reports.txt
pip install -r requirements-docs.txt
pip install -e .

- name: Build
run: |
cd _docs/docs
python update_documentation.py
- name: Publish
uses: JamesIves/github-pages-deploy-action@v4
with:
branch: gh-pages
folder: _docs/docs/LATEST/html
52 changes: 52 additions & 0 deletions .github/workflows/publish-package.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# This workflow publishes the package to pypi.
# For more details:
# https://docs.github.com/en/actions/guides/building-and-testing-python#publishing-to-package-registries
name: Publish to PyPi

on:
release:
types: [created]
workflow_dispatch:

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
# fetch all tags so `versioneer` can properly determine current version
with:
fetch-depth: 0
- name: Check if current commit is tagged
# fails and cancels release if the current commit is not tagged
run: |
git describe --exact-match --tags
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.11'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
if [ -f requirements-ml.txt ]; then pip install -r requirements-ml.txt; fi
if [ -f requirements-reports.txt ]; then pip install -r requirements-reports.txt; fi
pip install setuptools wheel twine
- name: Build
env:
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
TWINE_REPOSITORY: pypi
run: |
python setup.py sdist bdist_wheel
- name: Test build
# fails and cancels release if the built package fails to import
run: |
pip install dist/*.whl
python -c 'import dataprofiler; print(dataprofiler.__version__)'
- name: Publish
env:
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
TWINE_REPOSITORY: pypi
run: |
twine upload dist/*
38 changes: 0 additions & 38 deletions .github/workflows/publish-python-package.yml

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,14 @@ on:
pull_request:
branches:
- 'main'
- 'feature/**'
- 'dev'

jobs:
build:

runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.8, 3.9, "3.10"]
python-version: ["3.10", "3.11"]

steps:
- uses: actions/checkout@v4
Expand All @@ -38,4 +36,4 @@ jobs:
pre-commit run --all-files
- name: Test with pytest
run: |
DATAPROFILER_SEED=0 pytest --forked --cov=dataprofiler --cov-fail-under=80
DATAPROFILER_SEED=0 pytest --cov=dataprofiler --cov-fail-under=80
21 changes: 12 additions & 9 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ repos:
rev: 22.3.0
hooks:
- id: black
exclude: (versioneer.py|dataprofiler/_version.py|_docs/)
types: [file, python]
language_version: python3
# Isort: sort import statements
Expand All @@ -15,6 +16,7 @@ repos:
rev: 5.12.0
hooks:
- id: isort
exclude: _docs/
language_version: python3
# Flake8: complexity and style checking
# https://flake8.pycqa.org/en/latest/user/using-hooks.html
Expand All @@ -23,39 +25,39 @@ repos:
hooks:
- id: flake8
additional_dependencies: [flake8-docstrings]
exclude: (^docs/|^dataprofiler/tests/|^.*/__init__.py)
exclude: (^docs/|^dataprofiler/tests/|^.*/__init__.py|_docs/)
language_version: python3
# General fixers: format files for white spaces and trailing new lines, warn on debug statements
# https://github.com/pre-commit/pre-commit-hooks#hooks-available
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.0.1
hooks:
- id: trailing-whitespace
exclude: (^dataprofiler/tests/data/|^dataprofiler/tests/speed_tests/data/)
exclude: (^dataprofiler/tests/data/|^dataprofiler/tests/speed_tests/data/|_docs/)
- id: debug-statements
- id: end-of-file-fixer
exclude: (^dataprofiler/tests/data/)
exclude: (^dataprofiler/tests/data/|_docs/)
# Mypy: Optional static type checking
# https://github.com/pre-commit/mirrors-mypy
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.982
hooks:
- id: mypy
exclude: (^dataprofiler/tests/|^resources/|^examples|venv*/)
exclude: (^dataprofiler/tests/|^resources/|^examples|venv*/|versioneer.py|dataprofiler/_version.py|_docs/)
language_version: python3
additional_dependencies: # Keep up-to-date with the respective requirement files
[
# requirements.txt
h5py>=2.10.0,
wheel>=0.33.1,
numpy>=1.22.0,
numpy<2.0.0,
pandas>=1.1.2,
python-dateutil>=2.7.5,
pytz>=2020.1,
pyarrow>=1.0.1,
chardet>=3.0.4,
fastavro>=1.0.0.post1,
python-snappy>=0.5.4,
python-snappy>=0.7.1,
charset-normalizer>=1.3.6,
psutil>=4.0.0,
scipy>=1.4.1,
Expand All @@ -80,7 +82,7 @@ repos:

# requirements-ml.txt
scikit-learn>=0.23.2,
'keras>=2.4.3,<3.0.0',
'keras>=2.4.3,<=3.4.0',
rapidfuzz>=2.6.1,
"tensorflow>=2.6.4,<2.15.0; sys.platform != 'darwin'",
"tensorflow>=2.6.4,<2.15.0; sys_platform == 'darwin' and platform_machine != 'arm64'",
Expand All @@ -93,7 +95,6 @@ repos:

# requirements-test.txt
coverage>=5.0.1,
dask>=2.29.0,
fsspec>=0.3.3,
pytest>=6.0.1,
pytest-cov>=2.8.1,
Expand All @@ -108,7 +109,7 @@ repos:
rev: "0.48"
hooks:
- id: check-manifest
additional_dependencies: ['h5py', 'wheel', 'future', 'numpy', 'pandas',
additional_dependencies: ['h5py', 'wheel', 'future', 'numpy<2.0.0', 'pandas',
'python-dateutil', 'pytz', 'pyarrow', 'chardet', 'fastavro',
'python-snappy', 'charset-normalizer', 'psutil', 'scipy', 'requests',
'networkx','typing-extensions', 'HLL', 'datasketches', 'boto3']
Expand All @@ -118,11 +119,13 @@ repos:
hooks:
- id: pyupgrade
args: ["--py38-plus"]
exclude: (versioneer.py|dataprofiler/_version.py| _docs/)
# Autoflake - cleanup unused variables and imports
- repo: https://github.com/PyCQA/autoflake
rev: v2.0.0
hooks:
- id: autoflake
exclude: _docs/
args:
- "--in-place"
- "--ignore-pass-statements"
3 changes: 3 additions & 0 deletions .whitesource
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"settingsInheritedFrom": "capitalone/whitesource-config"
}
16 changes: 15 additions & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
global-exclude .DS_Store
global-exclude */__pycache__/*

include *.txt
include CODEOWNERS
Expand All @@ -16,4 +17,17 @@ recursive-include resources *.json
recursive-include resources *.pb
recursive-include resources *.py

recursive-include dataprofiler/labelers/embeddings/ *.txt
recursive-include dataprofiler/labelers/embeddings *.txt
include versioneer.py
include dataprofiler/_version.py
include .whitesource

recursive-exclude _docs *.html
recursive-exclude _docs *.cfg
exclude _docs/LICENSE
recursive-exclude _docs *.md
recursive-exclude _docs *.nojekyll
recursive-exclude _docs *.png
recursive-exclude _docs *.py
recursive-exclude _docs *.rst
recursive-exclude _docs Makefile
59 changes: 59 additions & 0 deletions _docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
Visit our [documentation page.](https://capitalone.github.io/DataProfiler)

### How to properly write documentation:

#### Packages
In any package directory, overall package comments can be made in the
\_\_init\_\_.py of the directory. At the top of the \_\_init\_\_.py,
include your comments in between triple quotations.

#### Classes
In any class file, include overall class comments at the top of the file
in between triple quotes and/or in the init function.

#### Functions
reStructuredText Docstring Format is the standard. Here is an example:

def format_data(self, predictions, verbose=False):
"""
Formats word level labeling of the Unstructured Data Labeler as you want

:param predictions: A 2D list of word level predictions/labeling
:type predictions: Dict
:param verbose: A flag to determine verbosity
:type verbose: Bool
:return: JSON structure containing specified formatted output
:rtype: JSON

:Example:
Look at this test. Don't forget the double colons to make a code block::
This is a codeblock
Type example code here
"""

### How to update the documentation:


1. Set up your local environment
```bash
# install sphinx requirements
# install the requirements from the feature branch
pip install pandoc &&
pip install -r requirements.txt &&
pip install -r requirements-ml.txt &&
pip install -r requirements-reports.txt &&
pip install -r requirements-docs.txt &&
pip install -e .

```
2. And finally, from the root of `DataProfiler`, run the following commands to generate the sphinx documentation:
```bash
cd _docs/docs
python update_documentation.py

```

3. View new docs
```bash
open index.html
```
20 changes: 20 additions & 0 deletions _docs/docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = buildcode

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
Loading