Add Conversational AI Safety section to README #46
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed Addition: Conversational AI Safety
Why This Category Is Needed
Conversational AI safety has emerged as a critical and distinct subdomain requiring specialized tools and approaches. Three developments validate this as a standalone category:
1. Common Sense Media AI Risk Assessment (2025)
Common Sense Media, in partnership with Stanford's Brainstorm Lab for Mental Health Innovation, released comprehensive risk assessments concluding that AI companion apps pose "unacceptable risks" to users under 18. Their testing of Character.AI, Replika, Nomi, and others revealed systemic failures in crisis detection, grooming prevention, and harmful content moderation. 73% of teens have now used AI companions, yet safety infrastructure remains critically underdeveloped.
2. Google's DICES Dataset (NeurIPS 2023)
Google Research released the DICES (Diversity In Conversational AI Evaluation for Safety) dataset—the first large-scale benchmark specifically designed for evaluating safety in conversational AI systems. DICES contains 1,340 adversarial human-bot conversations rated by 296 demographically diverse raters across 24 safety criteria. This dataset acknowledges that conversational AI safety requires distinct evaluation approaches from general content moderation.
3. Safety4ConvAI Workshop Series (2020-2024)
The academic community has formalized this domain through the Safety for Conversational AI (Safety4ConvAI) workshop series, now in its third iteration at LREC-COLING 2024. Organized by researchers from Heriot-Watt University, Bocconi University, Google, and Meta AI, the workshop focuses specifically on:
Detecting safety-critical situations in dialogue (self-harm, medical advice, crisis states)
Conversational abuse detection and mitigation
Privacy leaks in conversational contexts
Benchmarks for dialogue-level safety evaluation
Workshop: https://sites.google.com/view/safety-conv-ai-workshop
Proposed Category Structure