You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: citations.md
+143Lines changed: 143 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3518,3 +3518,146 @@ Used AI to diagnose and patch key backend issues affecting C2PA tool invocation
3518
3518
### **Attribution Statement**
3519
3519
3520
3520
Portions of this commit were generated with assistance from OpenAI ChatGPT (GPT-5) on November 19, 2025. All AI-generated recommendations and code were reviewed, tested, and validated by the development team prior to inclusion.
3521
+
3522
+
3523
+
### **Commit / Ticket Reference**
3524
+
3525
+
***Commit:**`[feat] Prepare C2PAtool for integration into ML model (#55)`
3526
+
***Ticket:**`#55 — [Feature] Modify C2PATool for Future ML Model Integration`
3527
+
***NOTE:**`Also created integration tests for C2paToolInvoker replacing previous unit tests`
3528
+
***Date:** November 22, 2025
3529
+
***Team Member:** Isaac Schmidt
3530
+
3531
+
---
3532
+
3533
+
### **AI Tool Information**
3534
+
3535
+
***Tool Used:** OpenAI ChatGPT (GPT-5.1 Thinking)
3536
+
***Access Method:** ChatGPT Web (.edu academic access)
3537
+
***Configuration:** Default model settings
3538
+
***Cost:** $0 (no paid API calls)
3539
+
3540
+
---
3541
+
3542
+
### **Purpose of AI Assistance**
3543
+
3544
+
Used AI to design and implement a C2PA metadata extraction layer that is future-proof for ML integration and resilient to tool/manifest failures. Assistance included:
3545
+
3546
+
* Defining a stable, ML-friendly output schema for C2PA metadata with primitive fields:
* Refactoring `C2paToolInvoker` from a raw JSON-returning method to an API that always returns a fully-populated metadata object instead of throwing on common failure cases (e.g., “no claim found”).
3550
+
* Designing soft-failure semantics so missing manifests and CLI errors are represented as numeric flags instead of exceptions, making the pipeline safe for later logistic regression / feature-vector work.
3551
+
* Planning how this C2PA metadata will become the first feature block in a larger computer-vision + ML pipeline, with OpenCV-derived features to be appended later.
3552
+
3553
+
---
3554
+
3555
+
### **Prompts / Interaction Summary**
3556
+
3557
+
* “Can AI produce images with metadata indicating the image was taken on a camera?”
3558
+
* “Does C2patool work with HEIC images / does iPhone use C2PA data / does Instagram retain C2PA data?”
3559
+
* “How would I generate a test case for a 'present but invalid' manifest? Also which exit code can I expect from C2patool for this response?”
3560
+
* “Ensure that it fulfills this ticket description. It will be integrated into a Logarithmic regression model in the next iteration.”
3561
+
* “Modify C2paToolInvoker so every invocation returns ML-ready metadata instead of throwing on ‘no claim found’.”
3562
+
* “Generate an AI image that will fail C2patool via invalid manifest” → guidance on tampering a valid C2PA-signed image.
3563
+
* “Create the unit test” → requested JUnit 5 tests using the repo-local `./tools/c2patool/c2patool` binary.
3564
+
* “This is the output I received once uploading an AI generated image with a valid manifest…” → diagnosing why `c2pa_claimGenerator` was null and how to read `claim_generator_info`.
3565
+
3566
+
---
3567
+
3568
+
### **Resulting Artifacts**
3569
+
3570
+
***New ML-ready C2PA metadata schema** implemented in `C2paToolInvoker`:
3571
+
* Introduced `C2paMetadata` value type with fields:
3572
+
*`int c2pa_hasManifest`
3573
+
*`int c2pa_manifestCount`
3574
+
*`String c2pa_claimGenerator`
3575
+
*`int c2pa_claimGeneratorIsAI`
3576
+
*`int c2pa_errorFlag`
3577
+
*`String c2pa_errorMessage`
3578
+
* Added factory methods:
3579
+
*`C2paMetadata.noManifest()` for the soft “no claim found” case.
3580
+
*`C2paMetadata.error(String message)` for hard CLI/JSON failures.
3581
+
***Refactored C2PA invocation logic**:
3582
+
* Replaced the old `extractManifest(File)` (throwing `IOException` on errors) with `extractMetadata(File)` that:
3583
+
* Invokes `./tools/c2patool/c2patool` with `-d` for detailed JSON.
3584
+
* Interprets non-zero exit codes with `"no claim found"` as a **soft success** (no manifest, no error).
3585
+
* Converts all other CLI/IO/JSON issues into `c2pa_errorFlag = 1` and a populated `c2pa_errorMessage`.
3586
+
* Logs raw JSON from c2patool at debug level for local debugging without exposing it to the ML layer.
3587
+
***JSON parsing and claim generator extraction**:
3588
+
* Implemented a JSON parser that:
3589
+
* Counts manifests via the top-level `manifests` object to populate `c2pa_manifestCount`.
3590
+
* Uses `active_manifest` to identify the primary manifest, with a fallback to the first manifest.
3591
+
* Extracts the generator from the modern field:
3592
+
*`claim.claim_generator_info.name`
3593
+
* Falls back to legacy `claim.claim_generator` if present.
3594
+
* Applies a configurable keyword list to set `c2pa_claimGeneratorIsAI` (e.g., matches “ChatGPT”, “DALL·E”, “midjourney”, “stable diffusion”, “gpt”, etc. via lowercase substring matching).
3595
+
***Integration with analysis pipeline (`AnalyzeService`)**:
3596
+
* Updated `runExtractionAndFinalize` to:
3597
+
* Call `c2paToolInvoker.extractMetadata(tempFile)` instead of returning raw manifest JSON.
3598
+
* Serialize `C2paMetadata` to JSON via `ObjectMapper` and store it in `AnalysisReport.details`.
3599
+
* Treat C2PA-related issues as:
3600
+
***DONE** with soft “no manifest” metadata when appropriate.
3601
+
***FAILED** only for IO-level or unexpected exceptions (e.g., download errors), reusing the existing `handleGenericFailure` path.
3602
+
***Test scaffolding and integration tests**:
3603
+
* Designed JUnit 5 tests (`C2paToolInvokerIntegrationTest`) that:
3604
+
* Use repo-local c2patool at `./tools/c2patool/c2patool` (no system install required).
3605
+
* Expect test images under `src/test/resources/c2pa/`:
3606
+
*`valid_ai.png` — AI-generated image with a valid C2PA manifest.
3607
+
*`no_manifest.jpg` — ordinary image with no C2PA provenance.
3608
+
* Generate a tampered image in a temporary directory by flipping a byte in the file to simulate “manifest present but invalid”.
3609
+
* Each test asserts that:
3610
+
* Valid AI image → `c2pa_hasManifest = 1`, `c2pa_manifestCount >= 1`, `c2pa_errorFlag = 0`, non-null `c2pa_claimGenerator`, and `c2pa_claimGeneratorIsAI = 1`.
3611
+
* Tampered AI image → still reports `c2pa_hasManifest = 1` and a consistent schema; reserved for future “manifestValid” flag extension.
* Same image after byte-level tampering → manifest still detected as present, schema stable, reserved for future “manifest validity” feature.
3632
+
* A plain image with no C2PA data → confirmed it returns:
3633
+
```json
3634
+
{
3635
+
"c2pa_hasManifest": 0,
3636
+
"c2pa_manifestCount": 0,
3637
+
"c2pa_claimGenerator": null,
3638
+
"c2pa_claimGeneratorIsAI": 0,
3639
+
"c2pa_errorFlag": 0,
3640
+
"c2pa_errorMessage": null
3641
+
}
3642
+
```
3643
+
* Verified that no exceptions are thrown to the caller for C2PA-specific issues; all errors are converted into numeric flags.
3644
+
* **Build and test flow**:
3645
+
* Maven build and tests:
3646
+
```bash
3647
+
mvn clean test
3648
+
```
3649
+
* Confirmed:
3650
+
* `C2paToolInvoker` runs successfully using `./tools/c2patool/c2patool`.
3651
+
* `AnalyzeService` stores the new C2PA metadata schema in `AnalysisReport.details`.
3652
+
* “No manifest” and other C2PA edge cases no longer cause FAILED reports unless there is a true IO or unexpected runtime error.
3653
+
* **Manual inspection / logging**:
3654
+
* Reviewed debug logs containing raw c2patool JSON output to confirm:
3655
+
* `manifests` and `active_manifest` are parsed correctly.
3656
+
* `claim_generator_info.name` is correctly mapped to `c2pa_claimGenerator`.
3657
+
* AI keyword matching behaves as expected (e.g., “ChatGPT” → `c2pa_claimGeneratorIsAI = 1`).
3658
+
3659
+
---
3660
+
3661
+
### **Attribution Statement**
3662
+
3663
+
Portions of this commit were generated and refined with assistance from OpenAI ChatGPT (GPT-5.1 Thinking) on November 22, 2025. All AI-generated code, tests, and design recommendations were reviewed, adapted, and validated by the developer (Isaac Schmidt) before being committed to the repository.
0 commit comments