Skip to content

Commit 3f85e35

Browse files
authored
Merge pull request #15 from KnowWhereGraph/file-chunking
Support file chunking and add support for s2 coverings
2 parents 1e43902 + e1c5abd commit 3f85e35

28 files changed

+1301
-536
lines changed

README.md

Lines changed: 46 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -3,71 +3,83 @@ Tool for creating index-free s2 coverings, at any level
33

44
## Background
55

6-
[S2](http://s2geometry.io/) is a spatial grid system with hierarchy, designed to be easily indexed and queried. Knowledge graphs commonly make use of different geospatial indices for linking spatial data to areas.
6+
[S2](http://s2geometry.io/) is a spatial grid system with hierarchy, designed to be easily indexed and queried. Knowledge graphs commonly make use of geospatial indices as ways to connect geospatial data.
77

8-
At the moment, many graph databases don't have native support for making use of the s2 index system. That's where this tool comes into play.
98

10-
Rather than relying on geosparql functions (which in turn rely on geosparql support and indices), you can instead pre-materialize the relations between cells and query them though the KnowWhereGraph ontology.
9+
Rather than relying on geosparql functions (which in turn rely on geosparql support), you can instead
10+
1. Generate the global spatial index (create s2 cells, as rdf statements)
11+
2. Pre-materialize the relations between cells (connect s2 cells with RCC8 relations)
12+
3. Integrate your own geometries with the S2 cells (connect the geometry to s2 cells with RCC*)
1113

1214
This breaks the reliance on the need for the graph database to support s2 indexing and instead make use of the predicate index from the pre-materialized spatial relations.
1315

1416
## Cell Generation and Integration
1517

1618
There are two tools:
1719

18-
1. s2.py: This generates the S2 cell structure at a desired layer. For example, generating cells at level 3 and 4.
20+
1. s2.py: This generates the S2 cell structure at a desired layer, or for an existing set of geometries
1921
2. integrate.py: This performs s2 integrations against existing geometries. These may be your own geometries, they may be the output of the s2 tool.
2022

21-
## Running
22-
23-
### Docker
2423

24+
## Running
2525
The project dependencies can be difficult to install; docker images are provided so that the code can be run in different environments without needing to install dependencies. Rather than offering a docker image for each cript, both scripts are included in the image and they can be called externally
2626

27-
#### Generating S2 Cells
27+
### Generating S2 Cells for a Level
2828

29+
Given a target S2 level, the s2 generation script will generate s2 cells at the target level.
30+
31+
**Current Production image**
2932
```bash
30-
git clone https://github.com/KnowWhereGraph/s2-coverings.git
31-
cd s2-coverings
3233
docker run -v ./:/s2 ghcr.io/knowwheregraph/s2-coverings:main python3 src/s2.py --level <level>
3334
```
3435

35-
#### S2 Integration
36+
**Running locally**
37+
```commandline
38+
git clone
39+
cd s2-coverings
40+
docker build -t s2-coverings .
41+
docker run -v ./:/s2 s2-coverings python3 src/s2.py --level 2
42+
```
3643

44+
### Generating S2 Cells Over a Geometry
3745

38-
```bash
39-
docker run -v ./:/s2 ghcr.io/knowwheregraph/s2-coverings:main python3 src/integrate.py --path <path to geometries>
40-
```
46+
Given a folder of RDF that describes s2 cells under the geosparql ontology, it's possible to generate new RDF of all the
47+
S2 cells that overlap the geometry at a certain level.
4148

42-
A complete list of options can be found by running the help command on each tool. For example,
49+
The level is dictated by the min_level and max_level cli arguments. _These should be the same value_.
50+
51+
**Current Production image**
4352
```bash
44-
python3 src/s2.py --help
45-
options:
46-
-h, --help show this help message and exit
47-
--level LEVEL Level at which the s2 cells are generated for
48-
--format [FORMAT] The format to write the RDF in. Options are xml, n3, turtle, nt, pretty-xml, trix, trig, nquads, json-ld, hext
49-
--compressed [COMPRESSED]
50-
use the S2 hierarchy to write a compressed collection of relations at various levels
53+
docker run -v ./:/s2 ghcr.io/knowwheregraph/s2-coverings:main python3 src/s2.py --path <path_to_geometries> --output_path=output/ --min_level=5 --max_level=5
5154
```
52-
Results will be written to the `output/` folder. The results can then be loaded into your graph database and queried. For more information on querying with the KnowWhereGraph ontology, visit the [docs site](https://knowwheregraph.github.io/#/).
53-
54-
### Locally
55-
56-
Due to the steps involved with installing the s2 libray bindings and different approaches needed for each architecture - running outside of Docker isn't supported. If you're inspired, the Dockerfile has all necessary steps to install the requirements to run the tool.
5755

58-
## Development
56+
**Running locally**
57+
```commandline
58+
git clone
59+
cd s2-coverings
60+
docker build -t s2-coverings .
61+
docker run -v ./:/s2 s2-coverings python3 src/s2.py --geometry_path <path_to_geometries> --output_path=output/ --min_level=5 --max_level=5
62+
```
5963

60-
### Running Locally With Docker
64+
### Integrating S2 Cells With Another Layer
6165

62-
During development, you'll need a way to run the local codebase with your changes. To do this run the following,
66+
Given a folder of S2 cells, described with the geosparql ontology, it's possible to create the following spatial relations between the s2 cells
67+
on disk and the layer of your choice.
68+
The primary relation materialized through this process is `kwg-ont:sfWithin`. This effectively "connects" the S2 cells with the target layer.
6369

70+
**Current Production image**
6471
```bash
65-
docker compose up -d
66-
docker exec -it s2 bash
67-
python3 src/s2.py --level <s2_level>
72+
docker run -v ./:/s2 ghcr.io/knowwheregraph/s2-coverings:main python3 src/integrate.py --path <path to geometries>
6873
```
6974

70-
The source code in the container will stay up to date with the local filesystem, so there's no need to rebuild the image after each code change.
75+
**Using locally**
76+
```commandline
77+
git clone
78+
cd s2-coverings
79+
docker build -t s2-coverings .
80+
docker run -v ./:/s2 s2-coverings python3 src/integrate.py --path some_path_to_data
81+
82+
```
7183

7284
### Linting
7385

docker-compose.yaml

Lines changed: 0 additions & 10 deletions
This file was deleted.

src/integrate.py

Lines changed: 48 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
from __future__ import annotations
22

33
import argparse
4+
from pathlib import Path
45

5-
from lib.integrator import Integrator
6+
from lib.integration.integrator import Integrator
67

78
if __name__ == "__main__":
89
parser = argparse.ArgumentParser()
@@ -13,12 +14,56 @@
1314
nargs="?",
1415
default="./output",
1516
)
17+
parser.add_argument(
18+
"--output_path",
19+
help="The path to where the files will be written to. Default is ./output",
20+
type=Path,
21+
nargs="?",
22+
default="/output/",
23+
)
1624
parser.add_argument(
1725
"--compressed",
1826
help="use the S2 hierarchy to write a compressed collection of relations at various levels",
1927
type=bool,
2028
nargs="?",
21-
default=True,
29+
default=False,
30+
)
31+
parser.add_argument(
32+
"--tolerance",
33+
help="Tolerance used during spatial operations. Defaults to 1e-2",
34+
type=float,
35+
nargs="?",
36+
default=1e-2,
37+
)
38+
parser.add_argument(
39+
"--min_level",
40+
help="The level where generation starts",
41+
type=int,
42+
nargs="?",
43+
default=1,
44+
)
45+
parser.add_argument(
46+
"--max_level",
47+
help="The level where generation ends",
48+
type=int,
49+
nargs="?",
50+
default=1,
51+
)
52+
parser.add_argument(
53+
"--format",
54+
help="The format to write the RDF in. Options are xml, n3, turtle, nt, pretty-xml, trix, trig, nquads, "
55+
"json-ld, hext",
56+
type=str,
57+
nargs="?",
58+
default="ttl",
2259
)
2360
args = parser.parse_args()
24-
Integrator(args.compressed, args.path)
61+
Integrator(
62+
args.compressed,
63+
Path(args.path),
64+
args.output_path,
65+
args.tolerance,
66+
args.min_level,
67+
args.max_level,
68+
args.format,
69+
)

src/lib/config.py

Lines changed: 0 additions & 10 deletions
This file was deleted.

src/lib/geo/__init__.py

Whitespace-only changes.
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
from s2geometry import S2RegionCoverer
2+
3+
4+
class ConstrainedS2RegionCoverer(S2RegionCoverer):
5+
"""
6+
An S2RegionCoverer, commonly used throughout the cli
7+
"""
8+
9+
def __init__(self, min_level, max_level):
10+
super().__init__()
11+
self.set_max_level(max_level)
12+
self.set_min_level(min_level)

0 commit comments

Comments
 (0)