Skip to content

Create new_code.R#1

Open
nateschor wants to merge 1 commit intomainfrom
LLM-experiments
Open

Create new_code.R#1
nateschor wants to merge 1 commit intomainfrom
LLM-experiments

Conversation

@nateschor
Copy link
Owner

No description provided.

ggrepel,
kableExtra)

source("code/01_calculate_sentiment.R")
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodiumAI-Agent /review

@nateschor
Copy link
Owner Author

@CodiumAI-Agent /improve

@QodoAI-Agent
Copy link

PR Code Suggestions ✨

CategorySuggestion                                                                                                                                    Impact
General
Fix inconsistent object naming

The naming is inconsistent with the other filtered datasets (*_filtered), increasing
risk of referencing the wrong object later. Align the object name to maintain
consistency and prevent accidental misuse.

new_code.R [179]

-df_snowball_filter <- Remove_Stopwords(df_snowball_stop_words)
+df_snowball_filtered <- Remove_Stopwords(df_snowball_stop_words)
Suggestion importance[1-10]: 7

__

Why: Correcting df_snowball_filter to df_snowball_filtered matches the established naming pattern and prevents likely downstream reference errors; small change with clear benefit.

Medium
Stabilize label layer mapping

geom_text_repel uses a filtered data frame without guaranteed matching aesthetics,
which can cause label misplacement or errors if rows are missing. Explicitly map x
and y for the label layer and guard against empty data to avoid failures.

new_code.R [144-152]

 p_sentiment <- ggplot(df_sentiment, aes(x = song_number, y = Sentiment)) +
   geom_line() +
   geom_hline(color = "red", yintercept = 0) +
   geom_point() +
   facet_wrap(~ `Sentiment Source`, ncol = 1, scales = "free_y") +
   theme_minimal() +
   labs(x = "Song Number") +
-  geom_text_repel(data = df_sentiment %>% filter(song_number %in% c(10, 11, 23)),
-             aes(label = title))
+  geom_text_repel(
+    data = df_sentiment %>% filter(song_number %in% c(10, 11, 23)),
+    aes(x = song_number, y = Sentiment, label = title),
+    na.rm = TRUE
+  )
Suggestion importance[1-10]: 6

__

Why: Explicitly mapping x and y in geom_text_repel and adding na.rm = TRUE reduces risks of aesthetic mismatch or empty data issues; it's a solid, low-risk improvement with modest impact.

Low
Possible issue
Preserve text and id alignment

The regex stripping non-ASCII characters can mangle tokens and shrink text length,
causing element_id misalignment with df_raw. Preserve character integrity and
explicitly carry element_id through the pipeline to guarantee correct joins.

new_code.R [53-62]

 df_sentimentr <- df_raw %>% 
-  mutate(
-    replace = gsub('[^ -~]', '', lines)
+  transmute(
+    element_id = row_number(),
+    text = lines
   ) %>% 
-  pull(replace) %>% 
+  mutate(text = stringi::stri_trans_general(text, "Any-NFKC")) %>% 
+  pull(text) %>% 
   get_sentences() %>% 
   sentiment() %>% 
   as_tibble() %>% 
   select(element_id, sentiment)
Suggestion importance[1-10]: 5

__

Why: Highlighting potential ID misalignment due to regex stripping is reasonable, and carrying element_id explicitly improves robustness, though the proposed change alters preprocessing semantics and may not be strictly necessary; impact is moderate.

Low

@@ -0,0 +1,182 @@


pacman::p_load(tidyverse,
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ask suggest substitute packages

sentimentr,
topicmodels,
ggrepel,
kableExtra)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodiumAI-Agent /ask suggest substitute packages

@nateschor
Copy link
Owner Author

@CodiumAI-Agent /ask suggest substitute packages here https://github.com/nateschor/R-Hamilton/pull/1/files#diff-2df9c9583bf9967b2a0eecdf15fce59d45dd358fd0303f6f815442423df1cd31R8

@QodoAI-Agent
Copy link

Ask

suggest substitute packages here https://github.com/nateschor/R-Hamilton/pull/1/files

Answer:

Here are package substitution suggestions for the newly added code, focusing on reducing dependencies, improving stability, and aligning with common R workflows.

General recommendation: avoid pacman in production scripts. Prefer base install/load or renv for reproducibility.

  • pacman::p_load

    • Substitute with: base install/load logic or renv
      • Example:
        • if (!requireNamespace("tidyverse", quietly = TRUE)) install.packages("tidyverse")
        • library(tidyverse)
      • Or manage dependencies with renv::init() and renv::snapshot() for lockfile reproducibility.
  • tidyverse

    • You’re using readr, dplyr, tidyr, stringr, ggplot2, purrr, tibble. You can import selectively to reduce footprint:
      • readr, dplyr, tidyr, stringr, ggplot2, purrr, tibble
    • If you want even lighter:
      • data.table for fast IO and manipulation (fread/fwrite, :=, setorder) replacing many dplyr verbs.
      • But given current tidytext/topicmodels usage, sticking with tidyverse components is fine.
  • tidytext

    • Keep: you rely on unnest_tokens, get_sentiments, stop_words.
    • Alternative: quanteda + quanteda.textstats (tokens(), dfm(), stopwords()).
      • Pros: faster on large corpora, richer text infrastructure.
      • Cons: would require adapting pipelines (non-tidy verbs).
  • sentimentr

    • Alternative 1: sentiment.ai (transformer-based), better accuracy but heavier dependency.
    • Alternative 2: syuzhet (NRC, Afinn, Bing, Syuzhet methods in one package), overlaps with tidytext lexicons and offers sentence extraction.
    • Alternative 3: textdata + text2vec to build custom models if you want ML-based sentiment.
    • If you only need lexicon-based methods (Afinn/Bing/NRC), you could drop sentimentr and rely solely on tidytext, unless you specifically need the sentence parsing that sentimentr provides.
  • topicmodels

    • Alternatives:
      • stm (Structural Topic Model) for document-level covariates and better diagnostics.
      • textmineR (fast LDA, coherence, scoring utilities).
      • quanteda.textmodels::textmodel_lda if moving to quanteda ecosystem.
    • For simple LDA, topicmodels is fine; stm often provides more modern tooling.
  • ggrepel

    • Keep: it’s lightweight and solves overplotting elegantly.
    • Alternative: directlabels or ggtext for annotations, but ggrepel is the standard.
  • kableExtra

    • Alternatives:
      • gt for table rendering (HTML/LaTeX via gt + gtsave).
      • flextable (Word/PowerPoint) and HTML output.
      • knitr::kable + tinytex/xtable for simpler LaTeX tables if you want minimal deps.
    • If you only need a simple LaTeX table, knitr::kable with basic options may suffice.

Other ecosystem swaps if you want a coherent alternative stack:

  • Quanteda stack: quanteda, quanteda.textstats, quanteda.textplots, quanteda.textmodels to replace tidytext + topicmodels. Add stopwords package for multilingual stopword lists.
  • Visualization labels: ggrepel remains best-in-class within ggplot2.

Minimal-change recommendations for this PR:

  • Replace pacman::p_load with explicit library() calls and recommend renv.
  • Keep tidytext, ggrepel.
  • Consider dropping sentimentr if sentence-level sentiment is not essential; replicate with tidytext lexicons you already use.
  • Keep topicmodels unless you need covariate-aware topics; then consider stm.
  • If LaTeX table needs are simple, switch kableExtra to knitr::kable; otherwise keep.

Example header refactor without pacman:

  • needed <- c("readr","dplyr","tidyr","stringr","ggplot2","purrr","tibble","tidytext","sentimentr","topicmodels","ggrepel","kableExtra")
  • to_install <- needed[!sapply(needed, requireNamespace, quietly = TRUE)]
  • if (length(to_install)) install.packages(to_install)
  • lapply(needed, library, character.only = TRUE)
  • or use renv to manage versions and installation reproducibly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants