Skip to content

Clairemc3/json-anonymiser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JSON Anonymisation Tools

A collection of shell scripts for anonymising and processing JSON data using jq. These tools allow you to replace sensitive data with fake alternatives while preserving the original JSON structure.

Table of Contents

Prerequisites

  • bash (version 4.0 or later)
  • jq (JSON processor) - Install jq
  • grep, wc, tr (standard Unix utilities)

Scripts Overview

Script Purpose Use Case
anonymise Generic property anonymisation Replace any property values with fake data
anonymise_emails Email-specific anonymisation Replace email addresses using regex pattern matching
reduce Array reduction Limit the number of items in JSON arrays
filter Array filtering Filter arrays based on reference relationships

Installation

  1. Clone or download the scripts to your desired directory
  2. Make scripts executable:
chmod +x anonymise anonymise_emails reduce filter

Core Scripts

anonymise - Generic Property Anonymisation

Replaces any property values in JSON with fake data from a text file.

Syntax

bash anonymise INPUT_FILE OUTPUT_FILE PROPERTY_PATH FAKE_DATA_FILE

Parameters

  • INPUT_FILE: Path to input JSON file
  • OUTPUT_FILE: Path to output JSON file (can be same as input for in-place editing)
  • PROPERTY_PATH: JSON property path to anonymise
  • FAKE_DATA_FILE: Text file containing fake replacement values (one per line)

Supported JSON Structures

  1. Nested properties in objects with arrays:

    bash anonymise data.json output.json people.FirstName fakeData/firstNames.txt

    For JSON like: {"people": [{"FirstName": "John"}, {"FirstName": "Jane"}]}

  2. Direct arrays of objects:

    bash anonymise data.json output.json name fakeData/firstNames.txt

    For JSON like: [{"name": "John"}, {"name": "Jane"}]

  3. Simple property arrays:

    bash anonymise data.json output.json names fakeData/firstNames.txt

    For JSON like: {"names": ["John", "Jane", "Charlie"]}

Examples

# Anonymise people first names
bash anonymise input.json output.json people.FirstName fakeData/firstNames.txt

# Anonymise last names in-place
bash anonymise data.json data.json people.LastName fakeData/lastNames.txt

# Anonymise email addresses
bash anonymise input.json output.json emails.EMailAddress fakeData/emails.txt

# Anonymise a flat array structure
bash anonymise flat.json flat.json name fakeData/firstNames.txt

anonymise_emails - Email-Specific Anonymisation

Specifically designed for anonymising email addresses using regex pattern matching.

Syntax

bash anonymise_emails INPUT_FILE OUTPUT_FILE

Parameters

  • INPUT_FILE: Path to input JSON file
  • OUTPUT_FILE: Path to output JSON file

Features

  • Uses regex pattern to find email addresses anywhere in the JSON
  • Automatically uses fakeData/emails.txt for replacements
  • Provides detailed summary of anonymisation

Example

bash anonymise_emails input.json anonymised_output.json

reduce - Reduce Array Size

Limits the number of items in specified JSON arrays.

Syntax

bash reduce INPUT_FILE OUTPUT_FILE PROPERTY_PATH LIMIT

Parameters

  • INPUT_FILE: Path to input JSON file
  • OUTPUT_FILE: Path to output JSON file
  • PROPERTY_PATH: Path to the array property (supports dot notation)
  • LIMIT: Maximum number of items to keep (positive integer)

Examples

# Keep only first 50 people
bash reduce input.json output.json people 50

# Keep only 10 items from nested property
bash reduce input.json output.json data.people 10

# Reduce in-place
bash reduce large_file.json large_file.json records 100

filter - Filter Arrays by Reference

Filters one array based on relationships with another array.

Syntax

bash filter INPUT_FILE OUTPUT_FILE FILTER_PATH REFERENCE_PATH

Parameters

  • INPUT_FILE: Path to input JSON file
  • OUTPUT_FILE: Path to output JSON file
  • FILTER_PATH: Path to array and property to filter (e.g., addresses.person_id)
  • REFERENCE_PATH: Path to reference array and property (e.g., people.id)

Example

# Keep only addresses that have matching person IDs in people array
bash filter input.json output.json addresses.person_id people.id

Fake Data Files

The fakeData/ directory contains replacement data:

fakeData/firstNames.txt

Contains 40 common first names for anonymisation.

fakeData/lastNames.txt

Contains 40 common last names for anonymisation.

fakeData/emails.txt

Contains 40 safe fake email addresses for email anonymisation.

Creating Custom Fake Data Files

Create text files with one item per line:

# Example: fakeData/cities.txt
New York
London
Tokyo
Paris
Sydney

Then use with the anonymise script:

bash anonymise data.json output.json addresses.city fakeData/cities.txt

In-Place Editing

All scripts support in-place editing by using the same file for input and output:

bash anonymise data.json data.json people.FirstName fakeData/firstNames.txt

Troubleshooting

Common Issues

  1. "Command not found" errors

    # Make scripts executable
    chmod +x anonymise anonymise_emails reduce filter
  2. "jq: command not found"

    Install jq

Validation

Test your JSON structure first:

# Check if JSON is valid
jq empty input.json

# Check property exists
jq '.people[0].FirstName' input.json

# Check array structure
jq 'type' input.json

Debugging

Useful jq commands for debugging manually:

# Test property extraction
jq '.people[].FirstName' input.json

# Test unique values
jq '[.people[].FirstName] | unique' input.json

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages