If you are beginning your journey with [Senzing], please start with [Senzing Quick Start guides].
You are in the [Senzing Garage] where projects are "tinkered" on. Although this GitHub repository may help you understand an approach to using Senzing, it's not considered to be "production ready" and is not considered to be part of the Senzing product. Heck, it may not even be appropriate for your application of Senzing!
Transform JSON output from the Senzing SDK for use with graph technologies, semantics, and downstream LLM integration.
This library uses poetry for demos:
poetry updateOtherwise, to use the library:
pip install sz_semanticsFor the gRCP server, if you don't already have Senzing and its gRPC server otherwise installed pull the latest Docker container:
docker pull senzing/serve-grpc:latestMask the PII values within Senzing JSON output with tokens which can be substituted back later. For example, mask PII values before calling a remote service (such as an LLM-based chat) then unmask returned text after the roundtrip, to maintain data privacy.
import json
from sz_semantics import Mask
data: dict = { "ENTITY_NAME": "Robert Smith" }
sz_mask: Mask = Mask()
masked_data: dict = sz_mask.mask_data(data)
masked_text: str = json.dumps(masked_data)
print(masked_text)
unmasked: str = sz_mask.unmask_text(masked_text)
print(unmasked)For an example, run the demo1.py script with a data file which
captures Senzing JSON output:
poetry run python3 demo1.py data/get.jsonThe two lists Mask.KNOWN_KEYS and Mask.MASKED_KEYS enumerate
respectively the:
- keys for known elements which do not require masking
- keys for PII elements which require masking
Any other keys encountered will be masked by default and reported as warnings in the logging. Adjust these lists as needed for a given use case.
For work with large numbers of entities, subclass KeyValueStore to
provide a distributed key-value store (other than the Python built-in
dict default) to use for scale-out.
To use SzClient to simplify access to the Senzing SDK, first launch
the serve-grpc container and run it in the background:
docker run -it --publish 8261:8261 --rm senzing/serve-grpcFor example code which runs entity resolution on the "truthset" collection of datasets:
import pathlib
import tomllib
from sz_semantics import SzClient
with open(pathlib.Path("config.toml"), mode = "rb") as fp:
config: dict = tomllib.load(fp)
data_sources: typing.Dict[ str, str ] = {
"CUSTOMERS": "data/truth/customers.json",
"WATCHLIST": "data/truth/watchlist.json",
"REFERENCE": "data/truth/reference.json",
}
sz: SzClient = SzClient(config, data_sources)
sz.entity_resolution(data_sources)
for ent_json in sz.sz_engine.export_json_entity_report_iterator():
print(ent_json)For a demo of running entity resolution on the "truthset", run the
demo2.py script:
poetry run python3 demo2.pyThis produces the export.json file which is JSONL representing the
results of a "get entity" call on each resolved entity.
Note: to show the redo processing, be sure to restart the container
each time before re-running the demo2.py script -- although the
entity resolution results will be the same even without a container
restart.
Starting with a small SKOS-based taxonomy
in the domain.ttl file, parse the Senzing
entity resolution (ER) results to generate an
RDFlib semantic graph.
In other words, generate the "backbone" for constructing an Entity Resolved Knowledge Graph, as a core component of a semantic layer.
The example code below serializes the thesaurus generated from
Senzing ER results as "thesaurus.ttl" combined with the Senzing
taxonomy definitions, which can be used for constructing knowledge
graphs:
import pathlib
from sz_semantics import Thesaurus
thesaurus: Thesaurus = Thesaurus()
thesaurus.load_source(Thesaurus.DOMAIN_TTL)
export_path: pathlib.Path = pathlib.Path("data/truth/export.json")
with open(export_path, "r", encoding = "utf-8") as fp_json:
for line in fp_json:
for rdf_frag in thesaurus.parse_iter(line, language = "en"):
thesaurus.load_source_text(
Thesaurus.RDF_PREAMBLE + rdf_frag,
format = "turtle",
)
thesaurus_path: pathlib.Path = pathlib.Path("thesaurus.ttl")
thesaurus.save_source(thesaurus_path, format = "turtle")For an example, run the demo3.py script to process the JSON file
data/truth/export.json which captures Senzing ER exported results:
poetry run python3 demo3.py data/truth/export.jsonCheck the resulting RDF definitions in the generated thesaurus.ttl
file.
License and Copyright
Source code for sz_semantics plus any logo, documentation, and
examples have an Apache license
which is succinct and simplifies use in commercial applications.
All materials herein are Copyright © 2025 Senzing, Inc.
Kudos to @brianmacy, @jbutcher21, @docktermj, @cj2001, @jesstalisman-ia, and the kind folks at GraphGeeks for their support.
