Skip to content

Privacy by design code scanner for data flow mapping across all storage mediums, third party and AI integrations, including Shadow AI

Notifications You must be signed in to change notification settings

hounddogai/hounddog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HoundDog.ai - Privacy-by-Design Code Scanner

What is it?

Shift Left on Privacy. No Retrofitting. No Headaches.

If your company builds applications, do not let privacy be an afterthought. Most privacy teams spend hours chasing data maps. HoundDog.ai automates this process completely. No more blind spots from privacy tools that miss hidden AI or third party integrations, and no more chasing app owners for the latest data flows.

HoundDog.ai’s static code scanner embeds privacy from IDE to CI. It maps sensitive data flows across AI and third party integrations (including shadow AI), detects privacy risks before code is deployed, and generates audit ready Records of Processing Activities and Privacy Impact Assessments prefilled with detected data flows and risks.

Download it to your machine and try it for free. You can view the output in the CLI console and export it to a markdown file. See a sample report here.

Our scanner can be used as a CLI that installs locally to scan cloned code repositories, or as IDE plugins that flag sensitive data leak issues as code is being written. The IDE plugins are available for VSCode, Cursor, JetBrains, and Eclipse. The HoundDog.ai Cloud Platform (offered as part of the paid plan) also provides Source Code Management Platform Integrations - connecting directly to GitHub, GitLab, and Bitbucket (both cloud and enterprise versions) to automatically scan code, block PRs, and leave actionable PR comments.

Features (Free vs. Paid)

Free Paid
Supported Languages Python, JavaScript, TypeScript

Languages covered in the free plan +

Java, C#, Go, OpenAPI, GraphQL, SQL

Data Elements 100+ sensitive data elements with extensive coverage of auth tokens, PII, PHI, and CHD

Data elements covered in the free plan +

  • User-defined data elements - add custom patterns to detect sensitive data unique to your organization.
  • [Coming Soon] AI-detected data elements, enabled through integration with any LLM model running in your environment.
Data Sinks Risky Mediums in Traditional Apps:
  • Logs
  • Files
  • JWT tokens
  • Local storage
  • Cookies
Privacy Risks in AI Applications:
  • Prompt analysis – tracking the types of sensitive data exposed in OpenAI, Anthropic, and Gemini prompts
  • Prompt logging
  • Saving prompts to temporary files

Data sinks in the free plan +

  • Other Third-Party Integrations (SDK + API) – more than 100 integrations covering monitoring, sales/marketing, web analytics, etc.
  • [Coming Soon] AI-detected data sinks leveraging an integration with any LLM model running within your environment
Features

Sensitive Data Leak Vulnerabilities
Identify when sensitive data is exposed in risky mediums, often due to entire user objects or tainted variables leaking into sinks. Includes AI-specific cases like LLM prompts capturing excessive data.

Sensitive Data Map
View all sensitive data elements detected in the scanned codebase, along with their sensitivity levels and number of occurrences.

IDE Plugins
Detect sensitive data leak issues as code is being written. Available for VS Code, JetBrains, and Eclipse.

Data Flow Intelligence

  • Visualize the flow of sensitive data across all storage mediums and third-party integrations providing evidence-based data mapping that eliminates guesswork and minimizes errors.

Automated Privacy Compliance

  • Automate the creation of Records of Processing Activities (RoPA), Privacy Impact Assessments (PIA), and Data Protection Impact Assessments (DPIA) reports all pre-populated with data flows and privacy risks detected by the scanner.
  • Catch data processing agreement (DPA) violations caused by sensitive data oversharing with third-party integrations early avoiding costly production issues.
  • Get real-time alerts when new types of sensitive data elements are introduced to the codebase, categorized by sensitivity level.

Developer Workflow Integration

  • Integrate with GitHub, GitLab, Bitbucket (Cloud & Enterprise).
  • Automatically scan code, block non-compliant PRs, and get actionable comments.

Enterprise-Ready Platform

  • SAML/SSO (Okta, Entra ID, Google).
  • Email & Slack alerting, Jira integration, and SIEM-compatible audit logs.
  • SOC 2-Compliant platform.

Requirements

For standalone binary:

  • Operating System: Linux, macOS, Windows
  • CPU Architecture: AMD64 (x86-64), ARM64
  • Shell: Bash, Zsh, Fish (Linux/macOS), or PowerShell (Windows)
  • Memory: 2GB+ of free memory

For Docker image:

  • Docker Engine (Linux) or Docker Desktop (Windows/macOS)
  • Memory: 4GB+ allocated to Docker

We recommend at least 4 CPU cores and 8GB of memory for optimal performance.

Installation

Run the commands below in your terminal to install the scanner or to upgrade to the latest version.

Linux and macOS

To install in user directory at ~/.hounddog/bin/hounddog:

curl -fsSL https://raw.githubusercontent.com/hounddogai/hounddog/main/install.sh | sh

To install system-wide at /usr/local/bin/hounddog:

curl -fsSL https://raw.githubusercontent.com/hounddogai/hounddog/main/install.sh | sudo sh

Windows

To install in user directory at %LocalAppData%\hounddog\bin\hounddog.exe:

irm https://raw.githubusercontent.com/hounddogai/hounddog/main/install.ps1 | iex

To install system-wide at C:\Program Files\hounddog\bin\hounddog.exe, run the same command in an elevated PowerShell session (run as administrator):

irm https://raw.githubusercontent.com/hounddogai/hounddog/main/install.ps1 | iex

Manual Download

Download the standalone binary from our releases page.

Usage

Free Version

To scan a directory using the standalone binary:

hounddog scan [DIRPATH] [OPTIONS]

To scan a directory using the Docker image:

docker run --pull=always -it --rm -v <DIRPATH>:/data hounddogai/hounddog hounddog scan [OPTIONS]

Use --help to see all available command-line options:

# For standalone binary
hounddog scan --help

# For Docker image
docker run --pull=always -it --rm hounddogai/hounddog hounddog scan --help

HoundDog.ai respects your .gitignore file. To ignore additional files or folders, create a .hounddogignore file at the root of the target repository using the .gitignore pattern format.

Paid Version

To use the paid features, export the API key (generated from the HoundDog.ai Cloud Platform) before running the hounddog scan command.

export HOUNDDOG_API_KEY="your_hounddog_api_key_here"

If you are using the Docker image, you must provide the -e option in the docker run command to pass the environment variable from your host to the Docker container:

docker run -v <path>:/data -e HOUNDDOG_API_KEY=$HOUNDDOG_API_KEY hounddogai/hounddog hounddog scan

Please refer to our documentation for using a HoundDog API key to unlock paid features.

Quickstart + Markdown Reports

For quick demonstration, we provide a test application with deliberate privacy flaws.

First, clone the repository:

git clone https://github.com/hounddogai/hounddog-test-python-app.git

Scan it with the --output-format=markdown option to generate an offline Markdown report:

hounddog scan hounddog-test-python-app --output-format=markdown

Open the generated file hounddog-test-python-app/hounddog-report-{timestamp}.md on your browser. We recommend using the Markdown Viewer Chrome extension with mermaid and toc settings enabled. See this for more details.

See a sample report here.

Uninstallation

Linux and macOS

If installed in user directory at ~/.hounddog/bin/hounddog:

rm -r ~/.hounddog

If installed system-wide at /usr/local/bin/hounddog:

sudo rm /usr/local/bin/hounddog

Windows

If installed in user directory at %LocalAppData%\hounddog\bin\hounddog.exe:

Remove-Item -Recurse -Force "$env:LocalAppData\hounddog"

If installed system-wide, run in elevated PowerShell session (run as administrator):

Remove-Item -Recurse -Force "$env:ProgramFiles\hounddog"

Use Cases

Details

Early prevention of sensitive data leaks in logs (and other risky mediums)

Sponsoring Team

  • Data Security
  • Privacy

Team Owning the Solution

  • Application Security (given their role in managing other code scanners in the CI pipelines)

The Challenge

When sensitive data leaks into logs (or other risky mediums), it’s a clear violation of:

  • GDPR, CCPA, and similar privacy laws for PII
  • HIPAA for PHI
  • PCI DSS for CHD

Relying on DLP is reactive, unreliable, and painfully slow. Teams often spend weeks scrubbing logs, tracing exposure across downstream systems, and patching the code after the fact.

The Solution

  • HoundDog.ai analyzes code early in the development lifecycle to catch sensitive data exposure in risky mediums such as logs, files, local storage, and cookies. Most issues are caused by entire user objects or tainted variables leaking into risky data sinks, often due to unintentional developer mistakes or AI-generated code.
  • For AI applications, the scanner also detects leaks in AI-specific mediums like prompt logs, temporary files, and LLM prompts that capture more sensitive data than intended. This proactive approach reduces dependence on reactive tools like DLP or downstream sanitization of LLM inputs and outputs.
  • Enables data minimization from the earliest stages of development, preventing issues before they reach production.

Evidence-based data mapping for all internally-built applications

Sponsoring Team

  • Privacy

Team Owning the Solution

  • Application Security (given their role in managing other code scanners in the CI pipelines)

The Challenge

  • Data mapping, documenting all types of data collected, processed, and shared, is the cornerstone of all major privacy frameworks.
  • Today, many companies rely on manual surveys and spreadsheets for data collection, leading to incomplete and outdated data maps that fail to reflect the latest code changes.
  • Data privacy platforms still rely on reactive data collection, with discovery mechanisms that depend heavily on sampling and surface-level scans, making them prone to missing critical data flows.
  • These platforms require prior knowledge of all third-party tools in use, making them blind to shadow AI and third-party integrations introduced directly in the code by developers.
  • Operating post-deployment and disconnected from code-level changes, these tools create a significant lag in identifying and mitigating risks.

The Solution

  • HoundDog.ai analyzes code early to deliver evidence-based data mapping at the speed of development.
  • Privacy teams can accurately document sensitive data flows across all storage mediums (e.g., logs, files, local storage, databases) as well as AI and third-party integrations (APIs and SDKs).
  • Real-time alerts notify teams when new sensitive data elements are introduced in the code, allowing time to review and address issues before they reach production.
  • Seamless integration across the development lifecycle (IDE, CI/CD) enables privacy by design at scale.
  • Automates the generation of RoPA, PIA, and DPIA reports, pre-populated with detected data flows and privacy risks - eliminating manual data collection via surveys and spreadsheets.

Sensitive Data Leak Prevention: Tool Comparison

Methods Pros & Cons Typical Coverage
HoundDog.ai

Pros:

  • Early detection across all stages of development from IDE to CI.
  • Extensive out-of-the-box coverage with support for 100+ sensitive data types (PII, PHI, PIFI, CHD, etc.), risky data sinks (hundreds of third-party SDKs), and sanitization gaps (flags only unsanitized data to reduce noise).
  • Deep coverage of AI-specific flows, including unsanitized inputs to and outputs from LLMs.
  • Highly extensible with custom data types and granular allowlists to enforce data policies and uphold DPAs.
  • [Coming Soon] AI-powered and integrated with any LLM running within the environment to extend coverage with minimal tuning.

Cons:

  • May miss data generated only at runtime

Traditional Risky Mediums:

  • Logs
  • Files
  • Local Storages
  • Cookies
  • Third-Party (API + SDK)

AI-Specific Risky Mediums:

  • Prompt Logs
  • Temp Files
  • Prompt I/O

DIY SAST

Pros:

  • Customizable - rules can be tailored to specific data types

Cons:

  • Very time consuming, as it requires significant effort to create and maintain rules.
  • Brittle RegEx patterns are hard to scale and need frequent updates as the codebase evolves.
  • Lacks context around data sensitivity and sanitization.
  • Poor at tracking data sinks - typically limited to logs.
  • Fails to scale effectively across large or complex environments.

Traditional Risky Mediums:

  • Logs
  • Rarely covers other mediums

AI-Specific Risky Mediums:

  • Prompt Logs (with effort)

DLP

Pros:

  • Detects sensitive data in transit or at rest across network and storage layers.

Cons

  • Reactive rather than preventative - typically catches issues after data exposure has occurred.
  • Remediation is slow and operationally intensive, often taking weeks: teams must scrub logs or storage, stop data ingestion, and work backward to trace the source of the leak with little context.
  • Lacks code-level visibility, making it difficult to pinpoint the exact logic or source responsible.
  • Limited insight into business logic, SDKs, or AI-specific data handling.

Traditional Risky Mediums:

  • Logs
  • Files

AI-Specific Risky Mediums:

  • None

License

View license information for HoundDog.ai's software.

Contact

If you need any help or would like to send us feedback, please create a GitHub issue or shoot us an email at support@hounddog.ai.

About

Privacy by design code scanner for data flow mapping across all storage mediums, third party and AI integrations, including Shadow AI

Topics

Resources

Stars

Watchers

Forks