A formatting tool for your Databricks notebooks.
Install:
$ pip install blackbricksYou probably also want to have installed the databricks-cli, in order to use blackbricks directly on your notebooks.
$ pip install databricks-cli
$ databricks configure # Required in order to use `blackbricks` on remote notebooks.You can use blackbricks on Python notebook files stored locally, or directly on the notebooks stored in Databricks.
For the most part, blackbricks operates very similary to black.
$ blackbricks notebook1.py notebook2.py # Formats both notebooks.
$ blackbricks notebook_directory/ # Formats every notebook under the directory (recursively).An important difference is that blackbricks will ignore any file that does not contain the # Databricks notebook source header on the first line. Databricks adds this line to all Python notebooks. This means you can happily run blackbricks on a directory with both notebooks and regular Python files, and blackbricks won't touch the latter.
If you specify the -r or --remote flag, blackbricks will work directly on your notebooks stored in Databricks.
$ blackbricks --remote /Users/username/notebook.pyWhen working on remote files, you can not add whole directories.
$ blackbricks --help
Usage: blackbricks [OPTIONS] [FILENAMES]...
Formatting tool for Databricks python notebooks.
Python cells are formatted using `black`, and SQL cells are formatted by
`sqlparse`.
Local files (without the `--remote` option):
- Only files that look like Databricks (Python) notebooks will be
processed. That is, they must start with the header `# Databricks
notebook source`
- If you specify a directory as one of the file names, all files in that
directory will be added, including any subdirectory.
Remote files (with the `--remote` option):
- Make sure you have installed the Databricks CLI (``pip install
databricks_cli``)
- Make sure you have configured at least one profile (`databricks
configure`). Check the file `~/.databrickscfg` if you are not sure.
- File paths should start with `/`. Otherwise they are interpreted as
relative to `/Users/username`, where `username` is the username
specified in the Databricks profile used.
Arguments:
[FILENAMES]... Path to the notebook(s) to format.
Options:
-r, --remote If this option is used, all filenames are
treated as paths to notebooks on your
Databricks host (i.e. not local files).
[default: False]
-p, --profile NAME If using --remote, which Databricks profile
to use. [default: DEFAULT]
--line-length INTEGER How many characters per line to allow.
[default: 88]
--sql-upper / --no-sql-upper SQL keywords should be UPPERCASE or
lowercase. [default: True]
--indent-with-two-spaces / --no-indent-with-two-spaces
Use two spaces for indentation in Python
cells instead of Black's default of four.
Databricks uses two spaces. [default: True]
--check Don't write the files back, just return the
status. Return code 0 means nothing would
change.
--diff Don't write the files back, just output a
diff for each file on stdout.
--version Display version information and exit.
--help Show this message and exit.
Use pre-commit. Add a .pre-commit-config.yaml file
to your repo with the following content (changing/removing the args as you
wish):
repos:
- repo: https://github.com/bsamseth/blackbricks
rev: 0.6.0
hooks:
- id: blackbricks
args: [--line-length=120, --indent-with-two-spaces]Set the rev attribute to the most recent version of blackbricks.
The args are optional and can be used to set any of blackbricks options.
If you find blackbricks useful, feel free to say so with a star. If you think it is utterly broken, you are more than welcome to contribute improvements. Please open an issue first to discuss what you want added/fixed. Unless you are just adding tests. In that case your pull request is extremely likely to be merged right away.
Sure! Certain SQL statements might not be parsed and indented properly by sqlparse, and the result can be jumbled formatting. You can disable SQL formatting for a cell by adding -- nofmt to the very first line of a cell:
%sql -- nofmt
select this,
sql_will, -- be kept just
like_this
from if_that_is.what_you_needFirst, make sure you have set up databricks-cli on your system (see
installation), and that you have at least one profile setup in
~/.databrickscfg. As an example:
# File: ~/.databrickscfg
[DEFAULT]
host = https://dbc-b23456-a1243.cloud.databricks.com/
username = username@example.com
password = dapi12345678901234567890
[OTHERPROFILE]
host = https://dbc-c54321-d234.cloud.databricks.com
username = name.user@example.com
password = dapi09876543211234567890You should use access tokens instead of your actual password.
You can then do:
$ blackbricks --remote /Users/username@example.com/notebook.py # Uses DEFAULT profile.
$ blackbricks --remote notebook.py # Equivalent to the above.
$ blackbricks --remote --profile OTHERPROFILE /Users/name.user@example.com/notebook.py
$ blackbricks --remote --profile OTHERPROFILE notebook.py # Equivalent to the above.This means you had an old version of click installed from before, and your installation didn't upgrade it automatically. Updating your installation should do the trick, e.g. pip install -U blackbricks or similar depending on your installation method of choice.