Changes made as part of CCR tutorial paper revisions #27

atomashevic · 2025-09-16T12:32:43Z

Adds emotion evaluation suite (R/evaluate_emotions.R, tests/testthat/test_evaluate_emotions.R, vignettes/evaluate_emotions.Rmd) with accuracy/F1/AUROC/ECE/Krippendorff metrics, plotting
support, and docs.
Introduces FindingEmo dataset tooling (R/datasets_findingemo.R, inst/python/download_findingemo.py, tests/testthat/test_findingemo.R, new inst/extdata/ assets) for download, caching,
quality checks, and annotation loading.
Implements a local RAG stack (R/rag.R, R/rag_retrievers.R, R/rag_json.R, R/rag_evaluation.R, R/model_registry.R, R/model_management.R) covering retriever registry, JSON parsing/
validation, evaluation, and model lifecycle helpers.
Extends VAD mapping and GPU/Python integration (R/map_discrete_to_vad.R, R/vad_scores.R, inst/python/image.py, inst/python/video.py, supporting tests) for smoother discrete→VAD
conversions and refreshed registries.

Quick test of uv install, text and image scores with:

Rscript -e 'devtools::test()'

- rag: concise table outputs (labels+conf), sort by confidence; gate per-doc logs behind retriever_params; suppress device message unless progress - bm25: use run_query in strict retry; add k1/b params; Unicode-aware tokenizer with override - hf_auth: avoid huggingface_hub login() when token present (suppress note) - demo: quieter (progress=FALSE) and rely on concise tables - build: ignore WARP.md in .Rbuildignore - misc: guard undefined embedding extraction in rag()

chatgpt-codex-connector

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

chatgpt-codex-connector · 2025-09-16T12:37:03Z

R/datasets_findingemo.R

+    if (!is.null(annotations_df)) {
+      # Ensure expected columns exist
+      if (!"image_path" %in% names(annotations_df)) {
+        # Try common alternatives
+        alt <- intersect(c("filepath", "path", "image"), names(annotations_df))
+        if (length(alt) > 0) {
+          annotations_df$image_path <- annotations_df[[alt[1]]]
+        }
+      }
+      if (!"index" %in% names(annotations_df)) {
+        # Fallback to row index if no explicit id
+        annotations_df$index <- seq_len(nrow(annotations_df))
+      }
+      if (!"emotion" %in% names(annotations_df)) {
+        # Try to find an emotion/label column
+        emo_col <- grep("emotion|label", names(annotations_df), value = TRUE, ignore.case = TRUE)
+        if (length(emo_col) > 0) annotations_df$emotion <- annotations_df[[emo_col[1]]]
+      }
+
+  # Derive file name (guard against missing image_path)
+  ipath <- annotations_df$image_path
+  ipath[is.null(ipath)] <- ""
+  ipath[is.na(ipath)] <- ""
+  annotations_df$image_file <- basename(ipath)


[P1] Handle annotations without image_path before deriving filenames

The download helper unconditionally accesses annotations_df$image_path and subsets it (ipath <- annotations_df$image_path; ipath[is.null(ipath)] <- ""). When the annotations returned from load_findingemo_annotations() do not contain an image_path (the bundled Python loader only provides columns such as image_id, valence, arousal), annotations_df$image_path is NULL and the subsetting call throws object of type 'NULL' is not subsettable. This means a successful dataset download will still fail during post-processing on any schema that lacks an image_path column. Guard against the column being absent by creating a safe default (e.g. derive names from image_id or skip matching) before trying to index.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2025-09-16T12:37:03Z

R/rag_evaluation.R

+  # Optional label set enforcement (lowercase normalization)
+  if (!is.null(labels_set) && length(predictions$pred) > 0) {
+    allowed <- tolower(labels_set)
+    pred_low <- tolower(predictions$pred)
+    keep <- pred_low %in% allowed
+    if (any(!keep)) {
+      # Keep rows matching allowed set; drop others with a warning
+      n_drop <- sum(!keep)
+      if (n_drop > 0) {
+        warning(sprintf("Filtered %d predictions not in labels_set", n_drop), call. = FALSE)
+      }
+      predictions <- predictions[keep, , drop = FALSE]
+    }
+    predictions$pred <- pred_low[keep]
+  }
+
+  # Create evaluation dataset
+  eval_data <- create_evaluation_dataset(predictions, ground_truth, id_col)
+
+  # Forward to evaluate_emotions (probability columns not available here)
+  results <- evaluate_emotions(
+    data = eval_data,
+    truth_col = "truth",
+    pred_col = "pred",
+    probs_cols = NULL,
+    classes = labels_set,


[P1] Normalise ground truth and class labels when enforcing labels_set

In validate_rag_predictions() predictions are forced to lowercase when labels_set is supplied, but labels_set itself and the ground_truth vector are left untouched while the lowercased predictions are evaluated against the original labels_set. As a result, if callers pass labels in their natural casing (e.g. c("Joy","Anger")) the predictions become "joy"/"anger" and evaluate_emotions() receives mismatched class levels (classes = labels_set). Those rows are dropped as NAs and metrics are computed on an empty or truncated dataset. Convert labels_set and the ground_truth column to the same normalised case as predictions (or avoid altering the case at all) before delegating to evaluate_emotions().

Useful? React with 👍 / 👎.

atomashevic added 30 commits August 12, 2025 10:10

working on findingemo dataset

9887ba0

script fix

72f5e8c

progress on findingemo and evaluation

9d5532e

roxygen update

9a166bf

VAD implemented with tests

2586e2f

Delete findingemo_eval_300 directory

86df682

Delete findingemo_eval_50 directory

d48c327

Delete test_findingemo_evaluation.R

194106d

Delete test_findingemo_evaluation_batch.R

f20aead

Delete test_findingemo_evaluation_large.R

6724ea6

Delete test_findingemo_evaluation_large_batch.R

4f09897

Delete test_findingemo_quick_eval_adj.R

0fa07ac

Added: model registry and support for BLIP and ALIGN models

c107997

First short at extended RAG

fd53c5e

rag update

e743032

fix json push

fe40245

new llms, hf auth, pyenv checks

a1c1c94

rag man

cadd1bc

hg auth fix

353d4b6

gemma remote code fix

01a68ee

bumped transformers to 4.51.0 with gemma support

79f874b

rag json fix

37c6f22

fix json push

54c8483

fix json push

41b7d64

rag finalized

c25fb1c

rag fix, enhance and test script

ef6a0ff

work on rag

a666366

Update RAG with more controls and updated the docs

2c7a4b5

done, ready for github test

1603afe

atomashevic added 25 commits September 9, 2025 11:36

environment creation fix

ad5410c

conda forge fix

ebe50bf

switch to uv from conda

86e1643

fix uv binary missing

daf4b7d

fix uv binary missing or pyenv/conda activations

e61a488

fix uv binary missing or pyenv/conda activations

859f563

fix uv binary missing or pyenv/conda activations

caa1392

fixing smaller issues with tokenizer and uv

3ba5694

freeze opencv

ff3bc0b

update ignore files

0042c98

Delete .claude/agents directory

882effa

Delete .create_metrics_table

e9359c4

Delete VAD.md

9fc79ab

Delete trump1.jpg

0c35373

small fixes, devtools checks OK

512ddfa

added new issues

e30204c

uv installer updated, github actions integrated

90f519e

small fix

5ee27e9

small workflow fix

83056d4

uv install updateć

0a59844

fix windows uv install

d68b763

hf auth fix

9097812

added basic transformer scores test

dc04b3f

added image scores basic test

42bc9da

Merge branch 'main' into CCR

92e0774

chatgpt-codex-connector bot reviewed Sep 16, 2025

View reviewed changes

atomashevic merged commit fe7d613 into main Sep 16, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changes made as part of CCR tutorial paper revisions #27

Changes made as part of CCR tutorial paper revisions #27

Uh oh!

atomashevic commented Sep 16, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Sep 16, 2025

Uh oh!

chatgpt-codex-connector bot Sep 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Changes made as part of CCR tutorial paper revisions #27

Changes made as part of CCR tutorial paper revisions #27

Uh oh!

Conversation

atomashevic commented Sep 16, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants