Skip to content

Commit 84b0038

Browse files
Merge pull request #74 from Jalen-Stephens/66-feature-implementation-of-logistic-regression-ml-model-for-confidence-score
66 feature implementation of logistic regression ml model for confidence score
2 parents 1d8c2d9 + e75e64c commit 84b0038

File tree

18 files changed

+681
-18
lines changed

18 files changed

+681
-18
lines changed

bin/.DS_Store

4 KB
Binary file not shown.

citations.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,47 @@
1+
### **Commit / Ticket Reference**
2+
- **Commit:** test: add coverage for model loader and logistic regression service
3+
- **Ticket:** none
4+
- **Date:** 2026-02-17
5+
- **Team Member:** Jalen Stephens
6+
7+
---
8+
9+
### **AI Tool Information**
10+
- **Tool Used:** OpenAI ChatGPT (GPT-5) via Codex CLI
11+
- **Access Method:** Local Codex CLI (sandboxed; no paid API calls)
12+
- **Configuration:** Default model settings
13+
- **Cost:** $0 (course-provided access)
14+
15+
---
16+
17+
### **Purpose of AI Assistance**
18+
Added unit tests to raise branch/instruction coverage for model loading and logistic regression inference, including cache validation, path resolution, invalid weight handling, C2PA flag behavior, and sigmoid branches.
19+
20+
---
21+
22+
### **Prompts / Interaction Summary**
23+
- “give me a commit message and fill out a template in citations.md for the work we did”
24+
- “can you write test for these to increase branch and instruction coverage please”
25+
26+
---
27+
28+
### **Resulting Artifacts**
29+
- `src/test/java/dev/coms4156/project/metadetect/service/ModelLoaderTest.java`
30+
- `src/test/java/dev/coms4156/project/metadetect/service/LogisticRegressionServiceTest.java`
31+
- `src/test/resources/model/test-model.json`
32+
33+
---
34+
35+
### **Verification**
36+
- `./mvnw -q -Dtest=ModelLoaderTest,LogisticRegressionServiceTest test`
37+
38+
---
39+
40+
### **Attribution Statement**
41+
> Portions of this work were generated with assistance from OpenAI ChatGPT (GPT-5) on 2026-02-17. All AI-generated content was reviewed and finalized by the development team.
42+
43+
---
44+
145
### **Commit / Ticket Reference**
246
- **Commit:** fix(storage): encode Supabase paths and normalize project base URL
347
- **Ticket:** N/A (prod bugfix)
@@ -3062,3 +3106,76 @@ Expanded CI coverage and optional live E2E hook:
30623106
> Portions of this work were generated with assistance from OpenAI ChatGPT (GPT-5) on 2026-02-17. All AI-generated content was reviewed and finalized by the development team.
30633107

30643108
---
3109+
3110+
### **Commit / Ticket Reference**
3111+
- **Commit:** [feat] Implemented and Trained Logistic Regression Model
3112+
- **Ticket:** (#66) Implementation of Logistic Regression ML Model for Confidence Score
3113+
- **Date:** 12/1/2025
3114+
- **Team Member:** Isaac Schmidt
3115+
3116+
---
3117+
3118+
### **AI Tool Information**
3119+
- **Tool Used:** OpenAI ChatGPT (GPT-5.1)
3120+
- **Access Method:** ChatGPT Web (.edu academic access)
3121+
- **Configuration:** Default model settings
3122+
- **Cost:** $0 (no paid API calls)
3123+
3124+
---
3125+
3126+
### **Purpose of AI Assistance**
3127+
AI assistance was used to design, structure, and validate the machine-learning component of the MetaDetect system. This included help with:
3128+
- Creating a feature extraction–based ML pipeline for AI-image detection
3129+
- Designing the workflow for offline model training (without including Python code in the repository)
3130+
- Advising on the correct model type, dataset preparation, cross-validation strategy, and model export
3131+
- Generating the Java inference architecture (ModelLoader, LogisticRegressionModel, AnalyzeService integration)
3132+
- Debugging dataset preparation issues and ensuring compatibility between training-time features and runtime inference
3133+
3134+
---
3135+
3136+
### **Prompts / Interaction Summary**
3137+
Key interactions included:
3138+
- Requesting recommendations for ML models appropriate for OpenCV feature vectors
3139+
- Asking how to train a logistic regression model offline and export weights for Java inference
3140+
- Debugging DatasetBuilder and CSV formatting issues to generate valid ML training data
3141+
- Setting up cross-validation for model evaluation
3142+
- Requesting a final AnalyzeService integration that correctly combines C2PA overrides with ML fallback
3143+
- Asking how and where `model.json` should be loaded in the service layer
3144+
- Requesting fixes and refactoring for ModelLoader, LogisticRegressionService, and FeatureExtractor interactions
3145+
- Clarifying model runtime behavior, including how C2PA features interact with ML predictions
3146+
3147+
---
3148+
3149+
### **Resulting Artifacts**
3150+
The following deliverables were created or refined with AI assistance:
3151+
- **DatasetBuilder.java** — Generates ML-ready feature CSVs from raw images and metadata
3152+
- **train_model.py (offline use only)** — Script used externally to train the logistic regression model
3153+
- **export_model.py (offline use only)** — Exports trained LR weights to a Java-readable `model.json`
3154+
- **model.json** — Serialized logistic regression weights and bias used in production
3155+
- **LogisticRegressionModel.java** — Runtime inference implementation compatible with exported weights
3156+
- **ModelLoader.java** — Loads `model.json` from classpath and constructs the inference model
3157+
- **LogisticRegressionService.java** — Bridges feature extraction and ML prediction
3158+
- **Updated AnalyzeService.java** — Integrates C2PA logic + ML fallback with clear override hierarchy
3159+
- Various debugging utilities, architectural recommendations, and corrections to CSV parsing logic
3160+
3161+
---
3162+
3163+
### **Verification**
3164+
AI-assisted work was validated by:
3165+
- Manual inspection and testing of DatasetBuilder output
3166+
- Successful cross-validation runs on ~80,000 training samples
3167+
- Confirming stable and consistent LR validation metrics across folds
3168+
- Verifying that exported weights from Python produced correct inference behavior in Java
3169+
- Manually testing AnalyzeService end-to-end with multiple categories of images:
3170+
- Images with valid AI manifests
3171+
- Images with valid camera manifests
3172+
- Images with no C2PA manifest
3173+
- Images with corrupted or tampered manifests
3174+
- Ensuring the Java inference pipeline correctly loads model.json from classpath and returns deterministic probability scores
3175+
3176+
---
3177+
3178+
### **Attribution Statement**
3179+
> Portions of this commit or configuration were generated with assistance from OpenAI ChatGPT (GPT-5) on 12/1/2025. All AI-generated content was reviewed, verified, and finalized by the development team.
3180+
3181+
---

pom.xml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,17 @@
226226
</configuration>
227227
</plugin>
228228

229+
<!-- ML Model -->
230+
<plugin>
231+
<groupId>org.codehaus.mojo</groupId>
232+
<artifactId>exec-maven-plugin</artifactId>
233+
<version>3.1.0</version>
234+
<configuration>
235+
<cleanupDaemonThreads>false</cleanupDaemonThreads>
236+
</configuration>
237+
</plugin>
238+
239+
229240
<!-- PMD -->
230241
<plugin>
231242
<groupId>org.apache.maven.plugins</groupId>

src/main/java/dev/coms4156/project/metadetect/config/SecurityConfig.java

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
import org.springframework.context.annotation.Configuration;
99
import org.springframework.core.annotation.Order;
1010
import org.springframework.http.HttpMethod;
11+
import org.springframework.http.HttpStatus;
1112
import org.springframework.security.config.Customizer;
1213
import org.springframework.security.config.annotation.web.builders.HttpSecurity;
1314
import org.springframework.security.config.annotation.web.configuration.EnableWebSecurity;
@@ -19,6 +20,7 @@
1920
import org.springframework.security.oauth2.jwt.JwtValidators;
2021
import org.springframework.security.oauth2.jwt.NimbusJwtDecoder;
2122
import org.springframework.security.web.SecurityFilterChain;
23+
import org.springframework.security.web.authentication.HttpStatusEntryPoint;
2224
import org.springframework.web.cors.CorsConfiguration;
2325
import org.springframework.web.cors.CorsConfigurationSource;
2426
import org.springframework.web.cors.UrlBasedCorsConfigurationSource;
@@ -54,6 +56,8 @@ public SecurityFilterChain apiSecurityFilterChain(HttpSecurity http) throws Exce
5456
// Everything else under /api/** requires auth
5557
.anyRequest().authenticated()
5658
)
59+
.exceptionHandling(e -> e.authenticationEntryPoint(
60+
new HttpStatusEntryPoint(HttpStatus.UNAUTHORIZED)))
5761
.oauth2ResourceServer(oauth -> oauth.jwt(Customizer.withDefaults()));
5862

5963
return http.build();
@@ -90,12 +94,27 @@ public SecurityFilterChain webSecurityFilterChain(HttpSecurity http) throws Exce
9094
"/js/**",
9195
"/images/**",
9296
"/fonts/**",
93-
"/webjars/**"
97+
"/webjars/**",
98+
99+
// Swagger / OpenAPI docs
100+
"/swagger-ui.html",
101+
"/swagger-ui/**",
102+
"/v3/api-docs/**",
103+
"/api-docs/**"
104+
).permitAll()
105+
106+
// Public non-API endpoints (health/auth pages used by tests + clients)
107+
.requestMatchers(
108+
"/health",
109+
"/actuator/**",
110+
"/auth/**"
94111
).permitAll()
95112

96-
// Everything else (non-API) is allowed
97-
.anyRequest().permitAll()
113+
// Everything else (non-API) requires authentication
114+
.anyRequest().authenticated()
98115
);
116+
http.exceptionHandling(e -> e.authenticationEntryPoint(
117+
new HttpStatusEntryPoint(HttpStatus.UNAUTHORIZED)));
99118

100119
return http.build();
101120
}

src/main/java/dev/coms4156/project/metadetect/dto/Dtos.java

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,9 @@ public record AnalysisManifestResponse(
6767
public record AnalyzeConfidenceResponse(
6868
String analysisId,
6969
String status,
70-
Double score // nullable until we implement a real scorer
70+
Double confidenceScore,
71+
boolean c2paUsed,
72+
String modelVersion
7173
) { }
7274

7375
/**

src/main/java/dev/coms4156/project/metadetect/service/AnalyzeService.java

Lines changed: 31 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,15 @@
22

33
import static dev.coms4156.project.metadetect.model.AnalysisReport.ReportStatus;
44

5+
import com.fasterxml.jackson.databind.JsonNode;
56
import com.fasterxml.jackson.databind.ObjectMapper;
67
import dev.coms4156.project.metadetect.c2pa.C2paToolInvoker;
78
import dev.coms4156.project.metadetect.dto.Dtos;
89
import dev.coms4156.project.metadetect.model.AnalysisReport;
910
import dev.coms4156.project.metadetect.model.AnalysisReport.ReportStatus;
1011
import dev.coms4156.project.metadetect.model.Image;
1112
import dev.coms4156.project.metadetect.repository.AnalysisReportRepository;
13+
import dev.coms4156.project.metadetect.service.LogisticRegressionService.InferenceResult;
1214
import dev.coms4156.project.metadetect.service.errors.MissingStoragePathException;
1315
import dev.coms4156.project.metadetect.service.errors.NotFoundException;
1416
import java.io.File;
@@ -44,6 +46,7 @@ public class AnalyzeService {
4446
private final AnalysisReportRepository analysisRepo;
4547
private final SupabaseStorageService storage;
4648
private final UserService userService;
49+
private final LogisticRegressionService logisticRegressionService;
4750
private final Clock clock;
4851

4952
// Lightweight mapper for error JSON assembly.
@@ -64,12 +67,14 @@ public AnalyzeService(C2paToolInvoker c2paToolInvoker,
6467
AnalysisReportRepository analysisRepo,
6568
SupabaseStorageService storage,
6669
UserService userService,
70+
LogisticRegressionService logisticRegressionService,
6771
Clock clock) {
6872
this.c2paToolInvoker = c2paToolInvoker;
6973
this.imageService = imageService;
7074
this.analysisRepo = analysisRepo;
7175
this.storage = storage;
7276
this.userService = userService;
77+
this.logisticRegressionService = logisticRegressionService;
7378
this.clock = clock;
7479
}
7580

@@ -163,7 +168,9 @@ public Dtos.AnalyzeConfidenceResponse getConfidence(UUID analysisId) {
163168
return new Dtos.AnalyzeConfidenceResponse(
164169
report.getId().toString(),
165170
report.getStatus().name(),
166-
report.getConfidence() // null until a real scorer exists
171+
report.getConfidence(),
172+
deriveC2paUsed(report.getDetails()),
173+
logisticRegressionService.getModelVersion()
167174
);
168175
}
169176

@@ -207,11 +214,17 @@ private void runExtractionAndFinalize(UUID analysisId, String storagePath) {
207214
// 2) Run C2PA extraction into ML-ready metadata
208215
C2paToolInvoker.C2paMetadata meta = c2paToolInvoker.extractMetadata(tempFile);
209216

210-
// 3) Serialize metadata and mark COMPLETED
217+
// 3) Compute logistic-regression score using OpenCV + C2PA features
218+
InferenceResult inference = logisticRegressionService.predict(
219+
tempFile.getAbsolutePath(),
220+
meta
221+
);
222+
223+
// 4) Serialize metadata and mark COMPLETED with a confidence score
211224
String json = objectMapper.writeValueAsString(meta);
212225

213226
// The details field now stores the C2PA metadata schema, not raw manifest JSON.
214-
markCompleted(analysisId, json, /*confidence*/ null);
227+
markCompleted(analysisId, json, inference.confidenceScore());
215228

216229
} catch (IOException ioe) {
217230
// IO-level failures (download, JSON serialization) are genuine failures.
@@ -298,6 +311,21 @@ private Instant now() {
298311
return Instant.now(clock);
299312
}
300313

314+
private boolean deriveC2paUsed(String detailsJson) {
315+
if (!StringUtils.hasText(detailsJson)) {
316+
return false;
317+
}
318+
try {
319+
JsonNode node = objectMapper.readTree(detailsJson);
320+
int hasManifest = node.path("c2paHasManifest").asInt(0);
321+
int errorFlag = node.path("c2paErrorFlag").asInt(0);
322+
return hasManifest == 1 && errorFlag == 0;
323+
} catch (Exception e) {
324+
// If parsing fails, default to false so the field is conservative.
325+
return false;
326+
}
327+
}
328+
301329
/** Truncates a string to a maximum length, null-safe. */
302330
private static String truncate(String s, int max) {
303331
if (s == null) {

src/main/java/dev/coms4156/project/metadetect/service/FeatureExtractor.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
import org.opencv.core.Size;
1414
import org.opencv.imgcodecs.Imgcodecs;
1515
import org.opencv.imgproc.Imgproc;
16+
import org.springframework.stereotype.Service;
1617

1718

1819
/**
@@ -29,6 +30,7 @@
2930
* NOTE: C2PA metadata is obtained separately via C2paToolInvoker. This class does
3031
* not call C2PA directly, but is designed to combine its results into the final feature vector.
3132
*/
33+
@Service
3234
public class FeatureExtractor {
3335

3436
static {
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
package dev.coms4156.project.metadetect.service;
2+
3+
import dev.coms4156.project.metadetect.c2pa.C2paToolInvoker.C2paMetadata;
4+
import org.slf4j.Logger;
5+
import org.slf4j.LoggerFactory;
6+
import org.springframework.stereotype.Service;
7+
8+
/**
9+
* Performs logistic regression inference against the feature vector produced by
10+
* {@link FeatureExtractor}. The model weights and bias are loaded once from JSON via
11+
* {@link ModelLoader}.
12+
*/
13+
@Service
14+
public class LogisticRegressionService {
15+
16+
private static final Logger log = LoggerFactory.getLogger(LogisticRegressionService.class);
17+
18+
private final FeatureExtractor featureExtractor;
19+
private final ModelLoader modelLoader;
20+
21+
public LogisticRegressionService(FeatureExtractor featureExtractor, ModelLoader modelLoader) {
22+
this.featureExtractor = featureExtractor;
23+
this.modelLoader = modelLoader;
24+
}
25+
26+
/**
27+
* Generates an AI confidence score for the given image.
28+
*
29+
* @param imagePath path to the downloaded image on disk
30+
* @param c2pa pre-extracted C2PA metadata (never null in current pipeline)
31+
* @return inference result containing the probability, c2pa usage flag, and model version
32+
*/
33+
public InferenceResult predict(String imagePath, C2paMetadata c2pa) {
34+
ModelLoader.ModelParameters model = modelLoader.loadModel();
35+
double[] features = featureExtractor.extractAllFeatures(imagePath, c2pa);
36+
double z = dot(model.weights(), features) + model.bias();
37+
double probability = sigmoid(z);
38+
boolean c2paUsed = c2pa != null
39+
&& c2pa.getc2paHasManifest() == 1
40+
&& c2pa.getc2paErrorFlag() == 0;
41+
42+
return new InferenceResult(probability, c2paUsed, model.version());
43+
}
44+
45+
/** Returns the loaded model version to surface in responses. */
46+
public String getModelVersion() {
47+
return modelLoader.loadModel().version();
48+
}
49+
50+
private double dot(double[] weights, double[] features) {
51+
int len = Math.min(weights.length, features.length);
52+
if (weights.length != features.length) {
53+
log.warn("Model/feature length mismatch (w={}, f={}); truncating to {}", weights.length,
54+
features.length, len);
55+
}
56+
57+
double sum = 0.0;
58+
for (int i = 0; i < len; i++) {
59+
sum += weights[i] * features[i];
60+
}
61+
return sum;
62+
}
63+
64+
/** Stable sigmoid implementation to avoid overflow for large magnitudes. */
65+
private double sigmoid(double z) {
66+
if (z >= 0) {
67+
double exp = Math.exp(-z);
68+
return 1.0 / (1.0 + exp);
69+
}
70+
double exp = Math.exp(z);
71+
return exp / (1.0 + exp);
72+
}
73+
74+
/** Immutable inference result. */
75+
public record InferenceResult(double confidenceScore, boolean c2paUsed, String modelVersion) { }
76+
}

0 commit comments

Comments
 (0)