Skip to content

Commit 262b227

Browse files
Merge pull request #60 from Jalen-Stephens/55-feature-modify-c2patool-for-integration-into-ml-model
2 parents 42c3cfb + 4f6ff61 commit 262b227

File tree

8 files changed

+595
-210
lines changed

8 files changed

+595
-210
lines changed

citations.md

Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3518,3 +3518,146 @@ Used AI to diagnose and patch key backend issues affecting C2PA tool invocation
35183518
### **Attribution Statement**
35193519

35203520
Portions of this commit were generated with assistance from OpenAI ChatGPT (GPT-5) on November 19, 2025. All AI-generated recommendations and code were reviewed, tested, and validated by the development team prior to inclusion.
3521+
3522+
3523+
### **Commit / Ticket Reference**
3524+
3525+
* **Commit:** `[feat] Prepare C2PAtool for integration into ML model (#55)`
3526+
* **Ticket:** `#55 — [Feature] Modify C2PATool for Future ML Model Integration`
3527+
* **NOTE:** `Also created integration tests for C2paToolInvoker replacing previous unit tests`
3528+
* **Date:** November 22, 2025
3529+
* **Team Member:** Isaac Schmidt
3530+
3531+
---
3532+
3533+
### **AI Tool Information**
3534+
3535+
* **Tool Used:** OpenAI ChatGPT (GPT-5.1 Thinking)
3536+
* **Access Method:** ChatGPT Web (.edu academic access)
3537+
* **Configuration:** Default model settings
3538+
* **Cost:** $0 (no paid API calls)
3539+
3540+
---
3541+
3542+
### **Purpose of AI Assistance**
3543+
3544+
Used AI to design and implement a C2PA metadata extraction layer that is future-proof for ML integration and resilient to tool/manifest failures. Assistance included:
3545+
3546+
* Defining a stable, ML-friendly output schema for C2PA metadata with primitive fields:
3547+
* `c2pa_hasManifest`, `c2pa_manifestCount`, `c2pa_claimGenerator`,
3548+
`c2pa_claimGeneratorIsAI`, `c2pa_errorFlag`, `c2pa_errorMessage`.
3549+
* Refactoring `C2paToolInvoker` from a raw JSON-returning method to an API that always returns a fully-populated metadata object instead of throwing on common failure cases (e.g., “no claim found”).
3550+
* Designing soft-failure semantics so missing manifests and CLI errors are represented as numeric flags instead of exceptions, making the pipeline safe for later logistic regression / feature-vector work.
3551+
* Planning how this C2PA metadata will become the first feature block in a larger computer-vision + ML pipeline, with OpenCV-derived features to be appended later.
3552+
3553+
---
3554+
3555+
### **Prompts / Interaction Summary**
3556+
3557+
* “Can AI produce images with metadata indicating the image was taken on a camera?”
3558+
* “Does C2patool work with HEIC images / does iPhone use C2PA data / does Instagram retain C2PA data?”
3559+
* “How would I generate a test case for a 'present but invalid' manifest? Also which exit code can I expect from C2patool for this response?”
3560+
* “Ensure that it fulfills this ticket description. It will be integrated into a Logarithmic regression model in the next iteration.”
3561+
* “Modify C2paToolInvoker so every invocation returns ML-ready metadata instead of throwing on ‘no claim found’.”
3562+
* “Generate an AI image that will fail C2patool via invalid manifest” → guidance on tampering a valid C2PA-signed image.
3563+
* “Create the unit test” → requested JUnit 5 tests using the repo-local `./tools/c2patool/c2patool` binary.
3564+
* “This is the output I received once uploading an AI generated image with a valid manifest…” → diagnosing why `c2pa_claimGenerator` was null and how to read `claim_generator_info`.
3565+
3566+
---
3567+
3568+
### **Resulting Artifacts**
3569+
3570+
* **New ML-ready C2PA metadata schema** implemented in `C2paToolInvoker`:
3571+
* Introduced `C2paMetadata` value type with fields:
3572+
* `int c2pa_hasManifest`
3573+
* `int c2pa_manifestCount`
3574+
* `String c2pa_claimGenerator`
3575+
* `int c2pa_claimGeneratorIsAI`
3576+
* `int c2pa_errorFlag`
3577+
* `String c2pa_errorMessage`
3578+
* Added factory methods:
3579+
* `C2paMetadata.noManifest()` for the soft “no claim found” case.
3580+
* `C2paMetadata.error(String message)` for hard CLI/JSON failures.
3581+
* **Refactored C2PA invocation logic**:
3582+
* Replaced the old `extractManifest(File)` (throwing `IOException` on errors) with `extractMetadata(File)` that:
3583+
* Invokes `./tools/c2patool/c2patool` with `-d` for detailed JSON.
3584+
* Interprets non-zero exit codes with `"no claim found"` as a **soft success** (no manifest, no error).
3585+
* Converts all other CLI/IO/JSON issues into `c2pa_errorFlag = 1` and a populated `c2pa_errorMessage`.
3586+
* Logs raw JSON from c2patool at debug level for local debugging without exposing it to the ML layer.
3587+
* **JSON parsing and claim generator extraction**:
3588+
* Implemented a JSON parser that:
3589+
* Counts manifests via the top-level `manifests` object to populate `c2pa_manifestCount`.
3590+
* Uses `active_manifest` to identify the primary manifest, with a fallback to the first manifest.
3591+
* Extracts the generator from the modern field:
3592+
* `claim.claim_generator_info.name`
3593+
* Falls back to legacy `claim.claim_generator` if present.
3594+
* Applies a configurable keyword list to set `c2pa_claimGeneratorIsAI` (e.g., matches “ChatGPT”, “DALL·E”, “midjourney”, “stable diffusion”, “gpt”, etc. via lowercase substring matching).
3595+
* **Integration with analysis pipeline (`AnalyzeService`)**:
3596+
* Updated `runExtractionAndFinalize` to:
3597+
* Call `c2paToolInvoker.extractMetadata(tempFile)` instead of returning raw manifest JSON.
3598+
* Serialize `C2paMetadata` to JSON via `ObjectMapper` and store it in `AnalysisReport.details`.
3599+
* Treat C2PA-related issues as:
3600+
* **DONE** with soft “no manifest” metadata when appropriate.
3601+
* **FAILED** only for IO-level or unexpected exceptions (e.g., download errors), reusing the existing `handleGenericFailure` path.
3602+
* **Test scaffolding and integration tests**:
3603+
* Designed JUnit 5 tests (`C2paToolInvokerIntegrationTest`) that:
3604+
* Use repo-local c2patool at `./tools/c2patool/c2patool` (no system install required).
3605+
* Expect test images under `src/test/resources/c2pa/`:
3606+
* `valid_ai.png` — AI-generated image with a valid C2PA manifest.
3607+
* `no_manifest.jpg` — ordinary image with no C2PA provenance.
3608+
* Generate a tampered image in a temporary directory by flipping a byte in the file to simulate “manifest present but invalid”.
3609+
* Each test asserts that:
3610+
* Valid AI image → `c2pa_hasManifest = 1`, `c2pa_manifestCount >= 1`, `c2pa_errorFlag = 0`, non-null `c2pa_claimGenerator`, and `c2pa_claimGeneratorIsAI = 1`.
3611+
* Tampered AI image → still reports `c2pa_hasManifest = 1` and a consistent schema; reserved for future “manifestValid” flag extension.
3612+
* No-manifest image → `c2pa_hasManifest = 0`, `c2pa_manifestCount = 0`, `c2pa_errorFlag = 0`, and `c2pa_errorMessage = null`.
3613+
3614+
---
3615+
3616+
### **Verification**
3617+
3618+
* **Local functional verification**:
3619+
* Ran the updated analysis pipeline with:
3620+
* AI-generated image containing a valid C2PA manifest → observed JSON like:
3621+
```json
3622+
{
3623+
"c2pa_hasManifest": 1,
3624+
"c2pa_manifestCount": 2,
3625+
"c2pa_claimGenerator": "ChatGPT",
3626+
"c2pa_claimGeneratorIsAI": 1,
3627+
"c2pa_errorFlag": 0,
3628+
"c2pa_errorMessage": null
3629+
}
3630+
```
3631+
* Same image after byte-level tampering → manifest still detected as present, schema stable, reserved for future “manifest validity” feature.
3632+
* A plain image with no C2PA data → confirmed it returns:
3633+
```json
3634+
{
3635+
"c2pa_hasManifest": 0,
3636+
"c2pa_manifestCount": 0,
3637+
"c2pa_claimGenerator": null,
3638+
"c2pa_claimGeneratorIsAI": 0,
3639+
"c2pa_errorFlag": 0,
3640+
"c2pa_errorMessage": null
3641+
}
3642+
```
3643+
* Verified that no exceptions are thrown to the caller for C2PA-specific issues; all errors are converted into numeric flags.
3644+
* **Build and test flow**:
3645+
* Maven build and tests:
3646+
```bash
3647+
mvn clean test
3648+
```
3649+
* Confirmed:
3650+
* `C2paToolInvoker` runs successfully using `./tools/c2patool/c2patool`.
3651+
* `AnalyzeService` stores the new C2PA metadata schema in `AnalysisReport.details`.
3652+
* “No manifest” and other C2PA edge cases no longer cause FAILED reports unless there is a true IO or unexpected runtime error.
3653+
* **Manual inspection / logging**:
3654+
* Reviewed debug logs containing raw c2patool JSON output to confirm:
3655+
* `manifests` and `active_manifest` are parsed correctly.
3656+
* `claim_generator_info.name` is correctly mapped to `c2pa_claimGenerator`.
3657+
* AI keyword matching behaves as expected (e.g., “ChatGPT” → `c2pa_claimGeneratorIsAI = 1`).
3658+
3659+
---
3660+
3661+
### **Attribution Statement**
3662+
3663+
Portions of this commit were generated and refined with assistance from OpenAI ChatGPT (GPT-5.1 Thinking) on November 22, 2025. All AI-generated code, tests, and design recommendations were reviewed, adapted, and validated by the developer (Isaac Schmidt) before being committed to the repository.

0 commit comments

Comments
 (0)