Artificial intelligence (AI) and large language models (LLMs)

End to end datascience

https://docs.skore.probabl.ai/stable/auto_examples/index.html

Automated prompt engineering

AI going rogue

https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/

health / mental health benchmarks

https://docs.medperf.org/what_is_medperf/
https://research.google/blog/advancing-medical-ai-with-med-gemini/
https://openai.com/index/healthbench/
Belli, L., Bentley, K., Alexander, W., Ward, E., Hawrilenko, M., Johnston, K., ... & Chekroud, A. (2025). Vera-mh concept paper. arXiv preprint arXiv:2510.15297. https://arxiv.org/abs/2510.15297
Bentley, K. H., Belli, L., Chekroud, A. M., Ward, E. J., Dworkin, E. R., Van Ark, E., ... & Hawrilenko, M. (2026). VERA-MH: Reliability and Validity of an Open-Source AI Safety Evaluation in Mental Health. arXiv preprint arXiv:2602.05088. https://arxiv.org/abs/2602.05088

Datasets

Daniel DiPietro, Vivek Hazari, Soroush Vosoughi (2022). Robin: A Novel Online Suicidal Text Corpus of Substantial Breadth and Scale. https://arxiv.org/abs/2209.05707

Text classification / qualitative coding with LLMs

Langextract https://github.com/google/langextract
https://www.comet.com
CodeLLM
Paid services (ideal for many qualitative research applications and for ease-of-use, but may not be super flexible either in terms of what model you can use and how you can use them):
- Atlas TI: https://atlasti.com/
- Nvivo
- Dedoose
- OpenAI Deep Research?

LLM APIs

openrouter: access all models using the same code (similar to litellm but I like it more)
lambda: rent time on GPU to download and run specific huggingface model pretty automatically (for models which might not be on openrouter)
embeddings: huggingface or litellm

Formatting outputs

Grammar: https://github.com/ggml-org/llama.cpp/blob/master/grammars/README.md
- only works with open source llama cpp (platform, not model) models
- restricts tokens before they are generated instead of post-processing.

AI Code Assistants

https://www.builder.io/blog/cursor-vs-windsurf-vs-github-copilot

AI Agents

AI for scientific disovery

https://platform.futurehouse.org/

AI and humor

https://pln-fing-udelar.github.io/semeval-2026-humor-gen/

AI sovereign, open source

https://publicai.co/utility

RAG

https://microsoft.github.io/graphrag/

AI for Scientific discovery

https://www.futurehouse.org/research-announcements/launching-futurehouse-platform-ai-agents

AI and values

https://www.nejm.org/doi/abs/10.1056/NEJMra2214183

AI privacy

Nasr et al (2023). Scalable Extraction of Training Data from (Production) Language Models. https://arxiv.org/abs/2201.10351
Vamosi, S., Platzer, M., & Reutterer, T. (2022). AI-based re-identification of behavioral clickstream data. arXiv preprint arXiv:2201.10351.

LLM and GPT style of writing

Over-use of the em dash: https://www.nytimes.com/2025/09/18/magazine/chatgpt-dash-hyphen-writing-communication.html?smid=nytcore-ios-share&referringSource=articleShare

Dashboards

grafana: good for monitoring time series and complex hardware or servers

AI, LLMs, and psychology/mental health

Assessing psychological constructs in text with LLMs

Low, D., Mair, P., Nock, M., & Ghosh, S. Text Psychometrics: Assessing Psychological Constructs in Text Using Natural Language Processing.
Törnberg, P. ChatGPT-4 outperforms experts and crowd workers in annotating political Twitter messages with zero-shot learning. arXiv [cs.CL] (2023).
Rathje, S., Mirea, D. M., Sucholutsky, I., Marjieh, R., Robertson, C. E., & Van Bavel, J. J. (2024). GPT is an effective tool for multilingual psychological text analysis. Proceedings of the National Academy of Sciences, 121(34), e2308950121.
Marshall, D. T. & Naff, D. B. The ethics of using artificial Intelligence in qualitative research. J. Empir. Res. Hum. Res. Ethics 19, 92–102 (2024).

LLMs and Psychology

Feuerriegel, S., Maarouf, A., Bär, D., Geissler, D., Schweisthal, J., Pröllochs, N., ... & Van Bavel, J. J. (2025). Using natural language processing to analyse text data in behavioural science. Nature Reviews Psychology, 4(2), 96-111.
Mihalcea, R., Biester, L., Boyd, R. L., Jin, Z., Perez-Rosas, V., Wilson, S., & Pennebaker, J. W. (2024). How developments in natural language processing help us in understanding human behaviour. Nature Human Behaviour, 8(10), 1877-1889.
Analyzing EMA and deep phenotyping data: https://mentalhealth.bmj.com/content/28/1/e301817#T1

AI and psychotherapy chatbots

First or one of the first RCTs for psychotherapy: Heinz, M. V., Mackin, D. M., Trudeau, B. M., Bhattacharya, S., Wang, Y., Banta, H. A., ... & Jacobson, N. C. (2025). Randomized trial of a generative ai chatbot for mental health treatment. NEJM AI, 2(4), AIoa2400802.
Chen, L., Preece, D. A., Sikka, P., Gross, J. J., & Krause, B. (2024). A framework for evaluating appropriateness, trustworthiness, and safety in mental wellness ai chatbots. arXiv preprint arXiv:2407.11387.
Stade, E. C., Toward Responsible Development and Evaluation of LLMs in Psychotherapy
Several LLMs have been shown to provide responses to clinical questions that are empathetic, accurate, and high quality:
- Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. 2023;183: 589–596.
- Giorgi S, Isman K, Liu T, Fried Z, Sedoc J, Curtis B. Evaluating generative AI responses to real-world drug-related questions. Psychiatry Res. 2024;339: 116058.
when you control for who the person thinks is answering (AI vs human response), people prefer responses from human even if an AI model authored it (and they think it was a human): Rubin, M., Li, J. Z., Zimmerman, F., Ong, D. C., Goldenberg, A., & Perry, A. (2025). Comparing the value of perceived human versus AI-generated empathy. Nature Human Behaviour, 1-15.
Stade, E. C., Stirman, S. W., Ungar, L. H., Boland, C. L., Schwartz, H. A., Yaden, D. B., ... & Eichstaedt, J. C. (2024). Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation. NPJ Mental Health Research, 3(1), 12.
Google: Lawrence HR, Schneider RA, Rubin SB, Mataric ́ MJ, McDuff DJ, Jones Bell M. The Opportunities and Risks of Large Language Models in Mental Health JMIR Ment Health 2024;11:e59479 doi: 10.2196/59479

LLMs and sychophancy

The problem of sychophancy (over-agreeing and not challenging enough):

Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S. R., ... & Perez, E. (2023). Towards understanding sycophancy in language models. arXiv preprint arXiv:2310.13548. - Malmqvist, L. (2025, June). Sycophancy in large language models: Causes and mitigations. In Intelligent Computing-Proceedings of the Computing Conference (pp. 61-74). Cham: Springer Nature Switzerland.

AI companions

https://arxiv.org/abs/2508.19258

Deployed Mental health chatbots

Psychedelic counselor: https://chatgpt.com/g/g-680126b5bc288191a17c8b01d4cc4773-psychedelic-hotline?mc_cid=f4f0eb10e7&mc_eid=bf933975ba
Kai AI: you chat with both a real human (on weekdays) and AI models via whatsapp or other messaging services. Audio or text. It automatically creates summaries, diaries, and dashboards. Automatically creating recommendations on what modules to work on (different subtypes of CBT, e.g., for insomnia; motivational interviewing; substance use). Use synthetic data to fine-tune SOTA models. 24/7 safety team. Constant prompt engineering to adapt to techniques and artificial therapist styles (young, shorter messages, etc).

Data Science

Python

Where to run code? IDE environments

VScode (open source) or Cursor (similar to VSCode but with paid AI Agent support)
Next-generation Python notebooks: https://marimo.io/ (instead of colab or jupyter lab)

Get Data

Open-source: Crawl website with LLM: https://github.com/unclecode/crawl4ai

Machine learning courses

https://substack.com/@hodgesj/note/c-143072067?r=25ypvo&utm_medium=ios&utm_source=notes-share-action

Design

Typst: instead of latex, https://typst.app/
Canva: for nondesigners
Figma: for designers. modular, save section in one page and it populates everywhere. Prototype function for a demo since all pages are linked.
Miro: mind map, etc. Example: https://miro.com/app/board/uXjVKqp1I6U=/

Prototyping Apps

Streamlit
Gradio
https://render.com/

Mental health

Reducing stigma

World Health Organization. (2024). Mosaic toolkit to end stigma and discrimination in mental health. In Mosaic toolkit to end stigma and discrimination in mental health. https://www.who.int/europe/publications/i/item/9789289061384

Suicide and Suicidal thoughts and behaviors

Suicide risk assessment

Donnelly (2025). Exploring the Potential of Large Language Models for Automated Safety Plan Scoring in Outpatient Mental Health Settings.

Speech processing and acoustic analysis

Speech to text / automated speech recognition

fast and open source: https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2

Speech as a tool in clinical trials

Screening tool: https://www.sciencedirect.com/science/article/abs/pii/S0165178124003901

FilesExpand file tree

resources.md

Latest commit

History