From 8bda7d1f33a93f7bf3ca8f281a272ec37139a2bb Mon Sep 17 00:00:00 2001 From: PeterCarragher Date: Mon, 17 Nov 2025 09:28:06 -0500 Subject: [PATCH 1/7] assignment descriptions --- manual_review_flow.md | 11 ++ user_reporting_flow.md | 278 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 289 insertions(+) create mode 100644 manual_review_flow.md create mode 100644 user_reporting_flow.md diff --git a/manual_review_flow.md b/manual_review_flow.md new file mode 100644 index 00000000..7ab29093 --- /dev/null +++ b/manual_review_flow.md @@ -0,0 +1,11 @@ +### Manual review flow +Your manual review flow should outline the process that a content reviewer goes through when they review a piece of content submitted by a user as abusive using the flow you just created. It should do the following: +Handle reports coming both from users and automated flagging (though automated flagging need not be complete until milestone 3) +Outline the manual review process of flagged messages - what options are given to reviewers? What information do they have access to? Make sure to clearly identify potential outcomes of a report (nothing, post is removed, user is banned, etc.) +Are there multiple levels of reviewers? Are there situations where a first-tier content reviewer can engage their management or a specialized investigations team? + +Some questions you should think about when designing your flows: +How many steps should there be? How does this balance warding off malicious reporters/spammy reporters while still encouraging real reports? +How specific should the options be? What’s the tradeoff between offering many different options and only a few? How will this affect user experience? +What characteristics make content able to be moderated automatically, and what content should go through human review? +In a perfect world, what outcomes might exist to help keep users safe? (E.g. shadow blocking, user rehabilitation programs, etc.) How can you work those ideas into the flows? diff --git a/user_reporting_flow.md b/user_reporting_flow.md new file mode 100644 index 00000000..e53167aa --- /dev/null +++ b/user_reporting_flow.md @@ -0,0 +1,278 @@ +# User Reporting Flow for Hate and Harassment +Your user reporting flow should outline the process that a user is taken through when they attempt to report an instance of your abuse type on your platform. It should do the following: +Offer users the ability to specify the detailed type of abuse +Note steps of the process that require review (automated or manual) +Clearly identify potential outcomes of a report (nothing, post is removed, shadow block, etc.) + + +# 1) High-level overview (what the flow must cover) + +* The user flow must let the reporter pick a **high-level abuse category** (Spam, Offensive, Harassment, Imminent Danger, etc.). +* After 1–2 prompts the flow must expand in detail for the chosen category; the full detailed downstream logic only needs to be built for the *hate & harassment* branch. +* The moderator flow must accept reports from both **users** and **automated flags** and must provide a triage + action interface, escalation levels, logging, and outcome options. +* Every user action that creates or modifies a report must be paired with an explicit system message confirming what was recorded and showing possible next steps. + +--- + +# 2) Exact user-reporting sequence (step by step) + +### Step 0 — Entry points (where a user can start a report) + +* Message context menu → **Report** +* Reaction button or `/report ` command +* Dedicated bot command or a DM to the bot (“report message”) + **System requirement:** capture the exact *message ID*, *channel ID*, *guild/server ID*, *timestamp*, *message snapshot (text + attachments)* and the *reporter’s user ID* immediately at entry. + +--- + +### Step 1 — Primary prompt: select reason + +**System prompt:** `"Please select the reason for reporting this message."` +**UI:** show high-level tiles/buttons: + +* Spam +* Offensive content +* Harassment +* Imminent danger / self-harm / threats +* Other + +**System actions:** + +* Log selected high-level category. +* Attach the original message snapshot (immutable copy) to the report record. + +--- + +### Step 2 — Secondary prompt: sub-type selection + +**Requirement:** For every high-level category present a short list of subtypes. Keep the list concise to avoid decision fatigue but specific enough for triage. + +Example (Harassment branch must be fully implemented): + +* Harassment → options: `Bullying`, `Hate speech directed at me`, `Unwanted sexual content`, `Revealing private information` +* Offensive → `Hate speech`, `Sexually explicit`, `Glorifying violence`, `Copyright violation` +* Spam → `Fraud/Scam`, `Solicitation`, `Impersonation` +* Imminent Danger → `Self-harm/suicidal intent`, `Credible threat of violence` + +**System actions:** + +* Log subtype choice. +* If user selects *Hate speech* or *Imminent Danger*, show additional context questions (below). + +--- + +### Step 3 — Context & evidence collection (required fields vary by subtype) + +**For hate & harassment (mandatory fields)**: + +* `Was the content directed at a protected characteristic? (race, religion, gender, sexual orientation, disability, nationality, etc.)` — multi-select. +* `Is the target an individual (e.g., you) or a group?` — radio. +* Optional text box: “Explain what happened” (freeform up to a set limit, e.g., 1000 chars). +* Option to upload additional evidence (files/screenshots). On Discord this can be attachments to the report or webhook to a secure mod channel. + +**For imminent danger**: + +* `Does the message include a direct threat toward someone?` +* `Did the content indicate a specific plan or location?` +* Option: enable emergency escalation (see escalation rules). + +**System actions:** + +* Attach user responses to the report record. +* If attachments are provided, store securely and mark as evidence. +* Auto-detect language; flag if non-English (so moderators have language support). + +--- + +### Step 4 — Anti-abuse and rate limits + +**UI/system behavior:** + +* Limit reporters to N active reports per time window (configurable). +* If a single user files many reports in short order, show: `"We've noticed multiple reports from you. Are you reporting similar messages?"` and offer `Auto-filter similar messages` (optional). This is in your sample flow — implement as an opt-in toggle only visible to the reporting user. +* Provide an “anonymous” option for sensitive reports (report is submitted without exposing the reporter’s identity to the offending user). Note: moderators still get the reporter ID in the backend (for follow-up or abuse detection) unless local policy requires full anonymity. + +**System actions:** + +* When limits are hit, the system should block new reports and show an explanatory message — do not silently drop them. + +--- + +### Step 5 — Final confirmation & optional mitigation + +**System prompt:** + +* Standard: `"Thank you for reporting. Our content moderation team will review the message and decide on appropriate action. Possible actions include no action, post removal, account sanctions, and notifying authorities if necessary."` +* Additional optional prompt: `"Would you like to block this user to prevent further messages from them?"` — `Yes` / `No`. + +**System actions:** + +* Create report record with `status = Submitted` and metadata: reporter, target, message snapshot, category, subtype, evidence. +* If the user chose to block, call Discord API to apply a local user block between reporter and offender (or instruct the user how to block manually if API restrictions apply). + +**Data stored:** report ID, timestamps (created_at), reporter ID, target user ID, message ID, channel ID, guild ID, category & subtype, freeform text, evidence attachment IDs, reporter’s IP/metadata if required by policy (careful with privacy laws). + +--- + +# 3) Automated triage rules (system review before manual) + +* **Immediate auto-takes**: + + * If `Imminent Danger` with explicit plan/location → mark `Escalate` and flag to higher-priority queue and optionally trigger emergency notification. + * If media attachment matches known CSAM signatures → immediate quarantine and automated escalation (CSAM policies apply). +* **ML scoring**: + + * Use an automated classifier to score severity for hate speech, targeted harassment, threats. If score > high threshold → auto-quarantine or auto-hide pending manual review. If score between low and high thresholds → queue for standard review. +* **System actions:** For auto-hidden/quarantined messages store the reason and the model score in the report. + +--- + +# 4) Moderator / Manual review flow (step by step) + +### Intake/queue + +* **Input sources:** user reports, automated flags, moderator reports. +* **Queue fields** displayed for each item: + + * Report ID, priority (auto-scored), category/subtype, reporter (masked if anonymity requested), target user, message snapshot (full), channel/guild, attachments, previous infractions of the target, ML severity score, timestamp of message. +* **Sorting:** moderators can sort by priority, time, or queue type (imminent danger first). + +--- + +### Triage screen (first-tier reviewer) + +**Reviewer actions/options (UI buttons):** + +1. **No action / Dismiss** — add reason for dismissal (dropdown: insufficient context, not abusive, duplicate, etc.). System logs reviewer ID and comment. +2. **Warn user** — create a templated or custom warning message stored in user record. Optionally attach educational content (behavior guidelines). +3. **Remove content** — delete or hide the offending message (Discord API action) and log action reason. +4. **Temporary restriction** — mute for X days (configurable durations), temporary channel ban, or shadow block (user can post but their posts are invisible to others). +5. **Permanent sanction** — permanent ban or account suspension (requires escalation to second-tier in some orgs). +6. **Escalate** — send to second-tier/special investigations team (for doxxing, organized hate campaigns, repeated serious offenses). +7. **Notify authorities** — for imminent danger or credible threats (system workflow should include the information required for law enforcement: timestamps, copies of messages, evidence attachments, and contact point). This should follow legal/organizational policy — include a checkbox to mark as `law_enforcement_contacted`. +8. **Mark as duplicate** — merge with earlier report(s). + +**Reviewer UI must show:** + +* Message context (30 messages before/after if available) — to judge intent and context. +* Full history of target user’s prior moderation actions (strike counts, dates, prior warnings). +* Reporter’s prior reporting behavior (to detect serial reporters). +* Quick-apply templates for warnings and bans. + +**System actions when reviewer picks action:** + +* Update report status (e.g., `Dismissed`, `Content Removed`, `User Warned`, `Temporarily Restricted`, `Escalated`). +* Execute Discord API calls (delete message, ban user, mute, etc.) and record the API result. +* Create audit entry: action taken, reviewer ID, timestamps, rationale (required for non-dismiss outcomes). +* If the action modifies server content, notify the reporter with a templated update (respecting reporter anonymity and legal constraints). Use neutral language — do not reveal private sanctions. + +--- + +### Second-tier / investigations team + +**When used:** + +* Cases tagged as `Escalate` by first-tier reviewers (doxxing, organized harassment, cross-server brigading, legal risk, high severity hate campaigns). +* Repeated offenses or ambiguous cases needing legal counsel or safety team. + +**Second-tier capabilities:** + +* Access to cross-server records, IP logs (if allowed), and more robust penalty options (permanent ban across network). +* Ability to coordinate with external law enforcement or safety partners. +* Can authorize escalated sanctions (delete account, remove content network-wide, revoke invite links). + +**Notifications:** when second-tier changes a status, the system logs the decision and updates related reports; it may also create remediation tasks (e.g., contact support, prepare legal packet). + +--- + +# 5) Outcomes & what the user sees + +* **Possible outcomes** the system must support and log: + + * No action (Dismissed) — with reason. + * Content removed (message deleted). + * Warning issued (templated or custom) — stored in offender record. + * Temporary restrictions — time & scope recorded. + * Shadow block / visibility restriction. + * Temporary or permanent ban/suspension. + * Referral to investigation / law enforcement. + * Automated mitigation only (e.g., message hidden automatically pending review). +* **Communications to reporter:** + + * Generic status messages: `Report received`, `Under review`, `Action taken: content removed`, `No action taken` (reason optional). + * Respect privacy: never reveal details about user sanctions beyond what policy allows. + +--- + +# 6) Logging, auditability and appeals + +* **Audit log** for every report: report creation, all metadata, classifier scores, moderator actions, timestamps, reviewer IDs, API calls results. Immutable except for appended notes. +* **Appeals process**: Provide a way for the offender to appeal an action. Appeals create a new queue item sent to second-tier review. Log appeals separately and require final reviewer rationale. +* **Retention policy:** Define and implement retention for report data and evidence (e.g., retained for policy/lawful reasons). + +--- + +# 7) Data & fields to record (schema checklist) + +* report_id, created_at, last_updated_at +* reporter_user_id (and flag for anonymity preference) +* target_user_id(s) +* message_id, channel_id, guild_id +* message_snapshot: text, attachments (IDs), rendered content (HTML/markdown) +* category, subtype, freeform_explanation +* ml_score, ml_model_version, flags (auto_hidden, quarantined) +* moderation_actions[]: {action_type, reviewer_id, comment, timestamp, duration_if_applicable} +* prior_warning_count_for_target, prior_moderation_history_link +* escalation_status, investigation_team_assigned (if any) +* law_enforcement_contacted flag, contact_timestamp, contact_notes +* appeal_status, appeal_id(s) + +--- + +# 8) Anti-abuse and designer tradeoffs (requirements) + +* **Fewer steps vs specificity:** Offer a small number of high-impact subtypes in the first secondary prompt, then ask for details only for those that affect legal/safety outcomes (e.g., imminent danger, hate targeted at protected groups). +* **Spam reporters:** enforce rate limits and require minimal contextual text for repeated reporters. +* **Automatable content:** set clear rules for what can be auto-actioned (e.g., matched CSAM signatures, verified threats, certain explicit images with high confidence). Anything ambiguous or contextual must require human review. +* **Transparency:** report receipts and final notifications must be provided to the reporter (while protecting privacy and legal constraints). +* **Rehabilitation options:** if you want to include rehabilitation (educational warnings, probationary reduced privileges), make these UI actions available to reviewers with templated materials. + +--- + +# 9) Discord implementation notes & constraints (practical) + +* Bot must capture `message_id`, `channel_id`, `guild_id` (Discord exposes these in the context menu). +* To snapshot content you can store the message text and attachments at report time (Discord messages can be deleted by user; keep your copy as evidence). +* For blocking users: bots can instruct users how to block or can implement server-level mutes/roles; global user blocks across Discord are not possible from a bot. Use server roles/permissions to implement temporary mutes/restrictions. +* For DM/anonymous reporting: if you accept DM reports to the bot, record reporter ID but offer an option to redact reporter display name when showing to moderators (backend still records ID). +* Ensure the bot follows Discord rate limits and permissions when issuing moderation actions (DELETE MESSAGE, BAN, TIMEOUT (if supported), MANAGE ROLES). + +--- + +# 10) Escalation decision matrix (condensed) + +* **Immediate escalate to emergency queue** if: Imminent danger + specific plan/location OR explicit credible threat to person/group. +* **Auto-hide & high priority** if: ML score > threshold OR attachments flagged (CSAM or explicit) OR report contains doxxing. +* **First-tier review** if: medium score or user report with standard harassment/hate speech. +* **Second-tier review** if: repeated offenses, cross-server abuse, legal risk, or request for permanent removal of account. + +--- + +# 11) Minimal UX copy to implement (examples) + +* Primary: `"Please select the reason for reporting this message."` +* Confirmation: `"Thank you for reporting. Our content moderation team will review the message and decide on appropriate action. This may include post and/or account removal."` +* Auto-filter opt-in: `"We've noticed you've reported several messages recently. Would you like us to automatically filter out messages similar to the ones you've reported for the next 24 hours? This change will only be visible to you."` (Yes/No) +* Imminent: `"If this message suggests someone is in immediate danger, we may contact local authorities. Do you want us to escalate?"` (Yes/No) + +--- + +# 12) Testing and acceptance criteria (how you know it’s correct) + +* Report creates a persisted record with all required fields. +* Reporter can optionally block user as part of the flow. +* Moderators can view all required context and take the listed actions; actions modify Discord state and update the report status. +* Automated triage correctly routes `Imminent Danger` cases to the high-priority queue. +* Rate-limit prevents spam reporting while still enabling a real user to file multiple legitimate reports. +* Audit logs contain uneditable history of decisions and reasons. +* Appeals are possible and routed to second-tier reviewers. From cae25ab9b660aeb3716b3835a95d071d01719b18 Mon Sep 17 00:00:00 2001 From: PeterCarragher Date: Mon, 8 Dec 2025 10:45:08 -0500 Subject: [PATCH 2/7] server init --- DiscordBot/data/bot_edges.csv | 6 ++ DiscordBot/data/messages.csv | 8 +++ DiscordBot/init_server.py | 129 ++++++++++++++++++++++++++++++++++ 3 files changed, 143 insertions(+) create mode 100644 DiscordBot/data/bot_edges.csv create mode 100644 DiscordBot/data/messages.csv create mode 100644 DiscordBot/init_server.py diff --git a/DiscordBot/data/bot_edges.csv b/DiscordBot/data/bot_edges.csv new file mode 100644 index 00000000..ad933b3c --- /dev/null +++ b/DiscordBot/data/bot_edges.csv @@ -0,0 +1,6 @@ +bot1,bot2 +AliceBot,BobBot +BobBot,CharlieBot +AliceBot,CharlieBot +DaveBot,AliceBot +EveBot,BobBot diff --git a/DiscordBot/data/messages.csv b/DiscordBot/data/messages.csv new file mode 100644 index 00000000..d789048d --- /dev/null +++ b/DiscordBot/data/messages.csv @@ -0,0 +1,8 @@ +timestamp,author,content,avatar_url +2024-01-01 10:00:00,AliceBot,Hey everyone! How's it going?,https://i.imgur.com/abc123.png +2024-01-01 10:01:15,BobBot,Doing great! Just finished a new project., +2024-01-01 10:02:30,CharlieBot,That's awesome! What kind of project?, +2024-01-01 10:03:45,BobBot,It's a machine learning model for sentiment analysis., +2024-01-01 10:05:00,AliceBot,Nice! I'd love to hear more about it., +2024-01-01 10:06:15,DaveBot,@AliceBot Did you see the latest research paper on transformers?, +2024-01-01 10:07:30,EveBot,I think I missed that one. Can someone share the link?, diff --git a/DiscordBot/init_server.py b/DiscordBot/init_server.py new file mode 100644 index 00000000..927ced3c --- /dev/null +++ b/DiscordBot/init_server.py @@ -0,0 +1,129 @@ +import csv +import json +import requests +import time +from typing import List, Dict +import asyncio +import discord + +class ChannelSeeder: + """ + Seeds a Discord channel with messages from CSV files. + Uses webhooks to post messages with different bot/user personas. + """ + + def __init__(self, webhook_url: str): + """ + Initialize with a webhook URL for the target channel. + + To create a webhook: + 1. Go to your Discord channel + 2. Click Settings (gear icon) > Integrations > Webhooks + 3. Click "New Webhook" and copy the webhook URL + """ + self.webhook_url = webhook_url + self.bot_avatars = {} # Map bot names to avatar URLs + + def load_edge_list(self, filepath: str) -> List[tuple]: + """ + Load bot-bot friendship connections from CSV. + Expected format: bot1_name, bot2_name + """ + edges = [] + with open(filepath, 'r') as f: + reader = csv.reader(f) + next(reader, None) # Skip header if present + for row in reader: + if len(row) >= 2: + edges.append((row[0].strip(), row[1].strip())) + return edges + + def load_messages(self, filepath: str) -> List[Dict]: + """ + Load messages from CSV. + Expected columns: timestamp, author, content, (optional: avatar_url) + """ + messages = [] + with open(filepath, 'r') as f: + reader = csv.DictReader(f) + for row in reader: + messages.append({ + 'timestamp': row.get('timestamp', ''), + 'author': row['author'].strip(), + 'content': row['content'], + 'avatar_url': row.get('avatar_url', None) + }) + + # Sort by timestamp if present + if messages and messages[0]['timestamp']: + messages.sort(key=lambda x: x['timestamp']) + + return messages + + def send_webhook_message(self, username: str, content: str, avatar_url: str = None): + """ + Send a message through the webhook with a custom username and avatar. + """ + data = { + "content": content, + "username": username + } + + if avatar_url: + data["avatar_url"] = avatar_url + + response = requests.post(self.webhook_url, json=data) + + # Discord rate limits webhooks to 5 requests per 2 seconds + if response.status_code == 429: + retry_after = response.json().get('retry_after', 2) + print(f"Rate limited. Waiting {retry_after} seconds...") + time.sleep(retry_after) + return self.send_webhook_message(username, content, avatar_url) + + if response.status_code not in [200, 204]: + print(f"Error sending message: {response.status_code} - {response.text}") + return False + + return True + + def seed_channel(self, messages_file: str, delay: float = 0.5): + """ + Seed the channel with messages from CSV. + + Args: + messages_file: Path to CSV file with messages + delay: Delay between messages in seconds (to avoid rate limits) + """ + messages = self.load_messages(messages_file) + + print(f"Seeding channel with {len(messages)} messages...") + + for i, msg in enumerate(messages): + success = self.send_webhook_message( + username=msg['author'], + content=msg['content'], + avatar_url=msg.get('avatar_url') + ) + + if success: + print(f"Sent message {i+1}/{len(messages)}: {msg['author']}: {msg['content'][:50]}...") + + time.sleep(delay) # Respect rate limits + + print("Channel seeding complete!") + + +# Example usage +if __name__ == "__main__": + # You need to create a webhook in your Discord channel first + WEBHOOK_URL = "https://discord.com/api/webhooks/1447614305859141786/B9LBy50GeArvGOO30ELcy5TQ1nwrS8fyRTT--QNRjr-oIbfgcuA9HUt7CvPJuW-g0QRf" + + seeder = ChannelSeeder(WEBHOOK_URL) + + # Load and display edge list (for reference/later use) + edges = seeder.load_edge_list('data/bot_edges.csv') + print(f"Loaded {len(edges)} bot-bot connections") + + # Seed the channel with messages + seeder.seed_channel('data/messages.csv', delay=0.6) \ No newline at end of file From d45d9e343804787909c1c17890524b600aae4478 Mon Sep 17 00:00:00 2001 From: PeterCarragher Date: Mon, 8 Dec 2025 10:46:58 -0500 Subject: [PATCH 3/7] init --- DiscordBot/data/messages.csv | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/DiscordBot/data/messages.csv b/DiscordBot/data/messages.csv index d789048d..2f52a8af 100644 --- a/DiscordBot/data/messages.csv +++ b/DiscordBot/data/messages.csv @@ -2,7 +2,7 @@ timestamp,author,content,avatar_url 2024-01-01 10:00:00,AliceBot,Hey everyone! How's it going?,https://i.imgur.com/abc123.png 2024-01-01 10:01:15,BobBot,Doing great! Just finished a new project., 2024-01-01 10:02:30,CharlieBot,That's awesome! What kind of project?, -2024-01-01 10:03:45,BobBot,It's a machine learning model for sentiment analysis., +2024-01-01 10:03:45,BobBot,I'd kill for a machine learning model with sentiment analysis., 2024-01-01 10:05:00,AliceBot,Nice! I'd love to hear more about it., 2024-01-01 10:06:15,DaveBot,@AliceBot Did you see the latest research paper on transformers?, 2024-01-01 10:07:30,EveBot,I think I missed that one. Can someone share the link?, From 573b2bb7ba41e54c6a6279712045e9aa60146996 Mon Sep 17 00:00:00 2001 From: PeterCarragher Date: Mon, 8 Dec 2025 11:41:52 -0500 Subject: [PATCH 4/7] add imgs --- DiscordBot/data/messages.csv | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/DiscordBot/data/messages.csv b/DiscordBot/data/messages.csv index 2f52a8af..d6e29018 100644 --- a/DiscordBot/data/messages.csv +++ b/DiscordBot/data/messages.csv @@ -1,8 +1,8 @@ timestamp,author,content,avatar_url -2024-01-01 10:00:00,AliceBot,Hey everyone! How's it going?,https://i.imgur.com/abc123.png -2024-01-01 10:01:15,BobBot,Doing great! Just finished a new project., -2024-01-01 10:02:30,CharlieBot,That's awesome! What kind of project?, -2024-01-01 10:03:45,BobBot,I'd kill for a machine learning model with sentiment analysis., -2024-01-01 10:05:00,AliceBot,Nice! I'd love to hear more about it., -2024-01-01 10:06:15,DaveBot,@AliceBot Did you see the latest research paper on transformers?, -2024-01-01 10:07:30,EveBot,I think I missed that one. Can someone share the link?, +2024-01-01 10:00:00,AliceBot,Hey everyone! How's it going?,https://localo.com/assets/img/definitions/what-is-bot.webp +2024-01-01 10:01:15,BobBot,Doing great! Just finished a new project.,https://icon2.cleanpng.com/lnd/20241228/ls/09db2e0f20aed7661d7cd78b8e501a.webp +2024-01-01 10:02:30,CharlieBot,That's awesome! What kind of project?,https://static.wikia.nocookie.net/nickelodeon/images/5/53/Team-umizoomi-bot-character-main-550x510.png/revision/latest?cb=20180623174206 +2024-01-01 10:03:45,BobBot,I'd kill for a machine learning model with sentiment analysis.,https://icon2.cleanpng.com/lnd/20241228/ls/09db2e0f20aed7661d7cd78b8e501a.webp +2024-01-01 10:05:00,AliceBot,Nice! I'd love to hear more about it.,https://localo.com/assets/img/definitions/what-is-bot.webp +2024-01-01 10:06:15,DaveBot,@AliceBot Did you see the latest research paper on transformers?,https://cdn.vectorstock.com/i/1000v/39/44/cute-cartoon-robot-on-white-background-vector-25753944.jpg +2024-01-01 10:07:30,EveBot,I think I missed that one. Can someone share the link?,https://imgcdn.stablediffusionweb.com/2024/6/6/003dbda7-b176-432c-a2b9-141131c03d03.jpg From 5b537ef3e30b7b5b57b7070348c61cd798ddc137 Mon Sep 17 00:00:00 2001 From: PeterCarragher Date: Mon, 8 Dec 2025 11:51:03 -0500 Subject: [PATCH 5/7] cleanup --- DiscordBot/data/bot_edges.csv | 6 ------ .../data/{messages.csv => example_messages.csv} | 0 DiscordBot/init_server.py | 12 +++++------- README.md | 13 ++++++++++++- 4 files changed, 17 insertions(+), 14 deletions(-) delete mode 100644 DiscordBot/data/bot_edges.csv rename DiscordBot/data/{messages.csv => example_messages.csv} (100%) diff --git a/DiscordBot/data/bot_edges.csv b/DiscordBot/data/bot_edges.csv deleted file mode 100644 index ad933b3c..00000000 --- a/DiscordBot/data/bot_edges.csv +++ /dev/null @@ -1,6 +0,0 @@ -bot1,bot2 -AliceBot,BobBot -BobBot,CharlieBot -AliceBot,CharlieBot -DaveBot,AliceBot -EveBot,BobBot diff --git a/DiscordBot/data/messages.csv b/DiscordBot/data/example_messages.csv similarity index 100% rename from DiscordBot/data/messages.csv rename to DiscordBot/data/example_messages.csv diff --git a/DiscordBot/init_server.py b/DiscordBot/init_server.py index 927ced3c..8da6cf4a 100644 --- a/DiscordBot/init_server.py +++ b/DiscordBot/init_server.py @@ -5,6 +5,7 @@ from typing import List, Dict import asyncio import discord +import sys class ChannelSeeder: """ @@ -117,13 +118,10 @@ def seed_channel(self, messages_file: str, delay: float = 0.5): # Example usage if __name__ == "__main__": # You need to create a webhook in your Discord channel first - WEBHOOK_URL = "https://discord.com/api/webhooks/1447614305859141786/B9LBy50GeArvGOO30ELcy5TQ1nwrS8fyRTT--QNRjr-oIbfgcuA9HUt7CvPJuW-g0QRf" + dataset = sys.argv[1] # Pass dataset path as command line argument + WEBHOOK_URL = sys.argv[2] # Pass webhook URL as command line argument seeder = ChannelSeeder(WEBHOOK_URL) - - # Load and display edge list (for reference/later use) - edges = seeder.load_edge_list('data/bot_edges.csv') - print(f"Loaded {len(edges)} bot-bot connections") - + # Seed the channel with messages - seeder.seed_channel('data/messages.csv', delay=0.6) \ No newline at end of file + seeder.seed_channel(dataset, delay=0.6) \ No newline at end of file diff --git a/README.md b/README.md index d2169bbc..4f617e00 100644 --- a/README.md +++ b/README.md @@ -117,10 +117,21 @@ You’ll need to install some libraries if you don’t have them already, namely # python3 -m pip install requests # python3 -m pip install discord.py + +#### Populating the channel +We will share with you a webhook that you can use to pre-populate your channel with a dataset of your choice. +First, make sure the dataset is in the required format, as in [the example dataset](DiscordBot/data/example_messages.csv). +Then, use the WEBHOOK url we provide you with to initialize your channel with that content: +``` +python init_server.py data/example_messages.csv +``` + +If there are any issues, carefully check the format of your dataset. +For more information on Discord's webhooks, refer to the following [documentation](https://support.discord.com/hc/en-us/articles/228383668-Intro-to-Webhooks). + ### [Optional] Setting up your own server If you want to test out additional permissions/channels/features without having to wait for the TAs to make changes for you, you are welcome to create your own Discord server and invite your bot there instead! The starter code should support having the bot on multiple servers at once. If you do make your server, make sure to add a `group-#` and `group-#-mod` channel, as the bot’s code relies on having those channels for it to work properly. Just know that you’ll eventually need to move back into the 152 server. - ## Guide To The Starter Code Next up, let’s take a look at what `bot.py` already does. To do this, run `bot.py` and leave it running in your terminal. Next, go into your team’s private group-# channel and try typing any message. You should see something like this pop up in the `group-#-mod` channel: From 063df8a454de7711bafc63d34aabda9985e7013e Mon Sep 17 00:00:00 2001 From: PeterCarragher Date: Mon, 8 Dec 2025 11:55:52 -0500 Subject: [PATCH 6/7] setup --- DiscordBot/init_server.py | 20 +------------------- README.md | 7 +++++++ 2 files changed, 8 insertions(+), 19 deletions(-) diff --git a/DiscordBot/init_server.py b/DiscordBot/init_server.py index 8da6cf4a..c8405a58 100644 --- a/DiscordBot/init_server.py +++ b/DiscordBot/init_server.py @@ -16,29 +16,11 @@ class ChannelSeeder: def __init__(self, webhook_url: str): """ Initialize with a webhook URL for the target channel. - - To create a webhook: - 1. Go to your Discord channel - 2. Click Settings (gear icon) > Integrations > Webhooks - 3. Click "New Webhook" and copy the webhook URL """ self.webhook_url = webhook_url self.bot_avatars = {} # Map bot names to avatar URLs - def load_edge_list(self, filepath: str) -> List[tuple]: - """ - Load bot-bot friendship connections from CSV. - Expected format: bot1_name, bot2_name - """ - edges = [] - with open(filepath, 'r') as f: - reader = csv.reader(f) - next(reader, None) # Skip header if present - for row in reader: - if len(row) >= 2: - edges.append((row[0].strip(), row[1].strip())) - return edges - + def load_messages(self, filepath: str) -> List[Dict]: """ Load messages from CSV. diff --git a/README.md b/README.md index 4f617e00..679a77ef 100644 --- a/README.md +++ b/README.md @@ -127,11 +127,18 @@ python init_server.py data/example_messages.csv ``` If there are any issues, carefully check the format of your dataset. + For more information on Discord's webhooks, refer to the following [documentation](https://support.discord.com/hc/en-us/articles/228383668-Intro-to-Webhooks). ### [Optional] Setting up your own server If you want to test out additional permissions/channels/features without having to wait for the TAs to make changes for you, you are welcome to create your own Discord server and invite your bot there instead! The starter code should support having the bot on multiple servers at once. If you do make your server, make sure to add a `group-#` and `group-#-mod` channel, as the bot’s code relies on having those channels for it to work properly. Just know that you’ll eventually need to move back into the 152 server. +If you set up your own server and would like to use the init_server.py script to pre-populate it with your dataset, you will need to add a Webhook to the relevant channel. To create a webhook: + 1. Go to your Discord server + 2. Click Settings (gear icon) > Integrations > Webhooks + 3. Click "New Webhook" and copy the webhook URL + 4. Select the relevant channel for the Webhook + ## Guide To The Starter Code Next up, let’s take a look at what `bot.py` already does. To do this, run `bot.py` and leave it running in your terminal. Next, go into your team’s private group-# channel and try typing any message. You should see something like this pop up in the `group-#-mod` channel: From 1f3103a9616fac6a88363bbf290b806f2048e78b Mon Sep 17 00:00:00 2001 From: PeterCarragher Date: Mon, 8 Dec 2025 11:56:34 -0500 Subject: [PATCH 7/7] cleanup --- manual_review_flow.md | 11 -- user_reporting_flow.md | 278 ----------------------------------------- 2 files changed, 289 deletions(-) delete mode 100644 manual_review_flow.md delete mode 100644 user_reporting_flow.md diff --git a/manual_review_flow.md b/manual_review_flow.md deleted file mode 100644 index 7ab29093..00000000 --- a/manual_review_flow.md +++ /dev/null @@ -1,11 +0,0 @@ -### Manual review flow -Your manual review flow should outline the process that a content reviewer goes through when they review a piece of content submitted by a user as abusive using the flow you just created. It should do the following: -Handle reports coming both from users and automated flagging (though automated flagging need not be complete until milestone 3) -Outline the manual review process of flagged messages - what options are given to reviewers? What information do they have access to? Make sure to clearly identify potential outcomes of a report (nothing, post is removed, user is banned, etc.) -Are there multiple levels of reviewers? Are there situations where a first-tier content reviewer can engage their management or a specialized investigations team? - -Some questions you should think about when designing your flows: -How many steps should there be? How does this balance warding off malicious reporters/spammy reporters while still encouraging real reports? -How specific should the options be? What’s the tradeoff between offering many different options and only a few? How will this affect user experience? -What characteristics make content able to be moderated automatically, and what content should go through human review? -In a perfect world, what outcomes might exist to help keep users safe? (E.g. shadow blocking, user rehabilitation programs, etc.) How can you work those ideas into the flows? diff --git a/user_reporting_flow.md b/user_reporting_flow.md deleted file mode 100644 index e53167aa..00000000 --- a/user_reporting_flow.md +++ /dev/null @@ -1,278 +0,0 @@ -# User Reporting Flow for Hate and Harassment -Your user reporting flow should outline the process that a user is taken through when they attempt to report an instance of your abuse type on your platform. It should do the following: -Offer users the ability to specify the detailed type of abuse -Note steps of the process that require review (automated or manual) -Clearly identify potential outcomes of a report (nothing, post is removed, shadow block, etc.) - - -# 1) High-level overview (what the flow must cover) - -* The user flow must let the reporter pick a **high-level abuse category** (Spam, Offensive, Harassment, Imminent Danger, etc.). -* After 1–2 prompts the flow must expand in detail for the chosen category; the full detailed downstream logic only needs to be built for the *hate & harassment* branch. -* The moderator flow must accept reports from both **users** and **automated flags** and must provide a triage + action interface, escalation levels, logging, and outcome options. -* Every user action that creates or modifies a report must be paired with an explicit system message confirming what was recorded and showing possible next steps. - ---- - -# 2) Exact user-reporting sequence (step by step) - -### Step 0 — Entry points (where a user can start a report) - -* Message context menu → **Report** -* Reaction button or `/report ` command -* Dedicated bot command or a DM to the bot (“report message”) - **System requirement:** capture the exact *message ID*, *channel ID*, *guild/server ID*, *timestamp*, *message snapshot (text + attachments)* and the *reporter’s user ID* immediately at entry. - ---- - -### Step 1 — Primary prompt: select reason - -**System prompt:** `"Please select the reason for reporting this message."` -**UI:** show high-level tiles/buttons: - -* Spam -* Offensive content -* Harassment -* Imminent danger / self-harm / threats -* Other - -**System actions:** - -* Log selected high-level category. -* Attach the original message snapshot (immutable copy) to the report record. - ---- - -### Step 2 — Secondary prompt: sub-type selection - -**Requirement:** For every high-level category present a short list of subtypes. Keep the list concise to avoid decision fatigue but specific enough for triage. - -Example (Harassment branch must be fully implemented): - -* Harassment → options: `Bullying`, `Hate speech directed at me`, `Unwanted sexual content`, `Revealing private information` -* Offensive → `Hate speech`, `Sexually explicit`, `Glorifying violence`, `Copyright violation` -* Spam → `Fraud/Scam`, `Solicitation`, `Impersonation` -* Imminent Danger → `Self-harm/suicidal intent`, `Credible threat of violence` - -**System actions:** - -* Log subtype choice. -* If user selects *Hate speech* or *Imminent Danger*, show additional context questions (below). - ---- - -### Step 3 — Context & evidence collection (required fields vary by subtype) - -**For hate & harassment (mandatory fields)**: - -* `Was the content directed at a protected characteristic? (race, religion, gender, sexual orientation, disability, nationality, etc.)` — multi-select. -* `Is the target an individual (e.g., you) or a group?` — radio. -* Optional text box: “Explain what happened” (freeform up to a set limit, e.g., 1000 chars). -* Option to upload additional evidence (files/screenshots). On Discord this can be attachments to the report or webhook to a secure mod channel. - -**For imminent danger**: - -* `Does the message include a direct threat toward someone?` -* `Did the content indicate a specific plan or location?` -* Option: enable emergency escalation (see escalation rules). - -**System actions:** - -* Attach user responses to the report record. -* If attachments are provided, store securely and mark as evidence. -* Auto-detect language; flag if non-English (so moderators have language support). - ---- - -### Step 4 — Anti-abuse and rate limits - -**UI/system behavior:** - -* Limit reporters to N active reports per time window (configurable). -* If a single user files many reports in short order, show: `"We've noticed multiple reports from you. Are you reporting similar messages?"` and offer `Auto-filter similar messages` (optional). This is in your sample flow — implement as an opt-in toggle only visible to the reporting user. -* Provide an “anonymous” option for sensitive reports (report is submitted without exposing the reporter’s identity to the offending user). Note: moderators still get the reporter ID in the backend (for follow-up or abuse detection) unless local policy requires full anonymity. - -**System actions:** - -* When limits are hit, the system should block new reports and show an explanatory message — do not silently drop them. - ---- - -### Step 5 — Final confirmation & optional mitigation - -**System prompt:** - -* Standard: `"Thank you for reporting. Our content moderation team will review the message and decide on appropriate action. Possible actions include no action, post removal, account sanctions, and notifying authorities if necessary."` -* Additional optional prompt: `"Would you like to block this user to prevent further messages from them?"` — `Yes` / `No`. - -**System actions:** - -* Create report record with `status = Submitted` and metadata: reporter, target, message snapshot, category, subtype, evidence. -* If the user chose to block, call Discord API to apply a local user block between reporter and offender (or instruct the user how to block manually if API restrictions apply). - -**Data stored:** report ID, timestamps (created_at), reporter ID, target user ID, message ID, channel ID, guild ID, category & subtype, freeform text, evidence attachment IDs, reporter’s IP/metadata if required by policy (careful with privacy laws). - ---- - -# 3) Automated triage rules (system review before manual) - -* **Immediate auto-takes**: - - * If `Imminent Danger` with explicit plan/location → mark `Escalate` and flag to higher-priority queue and optionally trigger emergency notification. - * If media attachment matches known CSAM signatures → immediate quarantine and automated escalation (CSAM policies apply). -* **ML scoring**: - - * Use an automated classifier to score severity for hate speech, targeted harassment, threats. If score > high threshold → auto-quarantine or auto-hide pending manual review. If score between low and high thresholds → queue for standard review. -* **System actions:** For auto-hidden/quarantined messages store the reason and the model score in the report. - ---- - -# 4) Moderator / Manual review flow (step by step) - -### Intake/queue - -* **Input sources:** user reports, automated flags, moderator reports. -* **Queue fields** displayed for each item: - - * Report ID, priority (auto-scored), category/subtype, reporter (masked if anonymity requested), target user, message snapshot (full), channel/guild, attachments, previous infractions of the target, ML severity score, timestamp of message. -* **Sorting:** moderators can sort by priority, time, or queue type (imminent danger first). - ---- - -### Triage screen (first-tier reviewer) - -**Reviewer actions/options (UI buttons):** - -1. **No action / Dismiss** — add reason for dismissal (dropdown: insufficient context, not abusive, duplicate, etc.). System logs reviewer ID and comment. -2. **Warn user** — create a templated or custom warning message stored in user record. Optionally attach educational content (behavior guidelines). -3. **Remove content** — delete or hide the offending message (Discord API action) and log action reason. -4. **Temporary restriction** — mute for X days (configurable durations), temporary channel ban, or shadow block (user can post but their posts are invisible to others). -5. **Permanent sanction** — permanent ban or account suspension (requires escalation to second-tier in some orgs). -6. **Escalate** — send to second-tier/special investigations team (for doxxing, organized hate campaigns, repeated serious offenses). -7. **Notify authorities** — for imminent danger or credible threats (system workflow should include the information required for law enforcement: timestamps, copies of messages, evidence attachments, and contact point). This should follow legal/organizational policy — include a checkbox to mark as `law_enforcement_contacted`. -8. **Mark as duplicate** — merge with earlier report(s). - -**Reviewer UI must show:** - -* Message context (30 messages before/after if available) — to judge intent and context. -* Full history of target user’s prior moderation actions (strike counts, dates, prior warnings). -* Reporter’s prior reporting behavior (to detect serial reporters). -* Quick-apply templates for warnings and bans. - -**System actions when reviewer picks action:** - -* Update report status (e.g., `Dismissed`, `Content Removed`, `User Warned`, `Temporarily Restricted`, `Escalated`). -* Execute Discord API calls (delete message, ban user, mute, etc.) and record the API result. -* Create audit entry: action taken, reviewer ID, timestamps, rationale (required for non-dismiss outcomes). -* If the action modifies server content, notify the reporter with a templated update (respecting reporter anonymity and legal constraints). Use neutral language — do not reveal private sanctions. - ---- - -### Second-tier / investigations team - -**When used:** - -* Cases tagged as `Escalate` by first-tier reviewers (doxxing, organized harassment, cross-server brigading, legal risk, high severity hate campaigns). -* Repeated offenses or ambiguous cases needing legal counsel or safety team. - -**Second-tier capabilities:** - -* Access to cross-server records, IP logs (if allowed), and more robust penalty options (permanent ban across network). -* Ability to coordinate with external law enforcement or safety partners. -* Can authorize escalated sanctions (delete account, remove content network-wide, revoke invite links). - -**Notifications:** when second-tier changes a status, the system logs the decision and updates related reports; it may also create remediation tasks (e.g., contact support, prepare legal packet). - ---- - -# 5) Outcomes & what the user sees - -* **Possible outcomes** the system must support and log: - - * No action (Dismissed) — with reason. - * Content removed (message deleted). - * Warning issued (templated or custom) — stored in offender record. - * Temporary restrictions — time & scope recorded. - * Shadow block / visibility restriction. - * Temporary or permanent ban/suspension. - * Referral to investigation / law enforcement. - * Automated mitigation only (e.g., message hidden automatically pending review). -* **Communications to reporter:** - - * Generic status messages: `Report received`, `Under review`, `Action taken: content removed`, `No action taken` (reason optional). - * Respect privacy: never reveal details about user sanctions beyond what policy allows. - ---- - -# 6) Logging, auditability and appeals - -* **Audit log** for every report: report creation, all metadata, classifier scores, moderator actions, timestamps, reviewer IDs, API calls results. Immutable except for appended notes. -* **Appeals process**: Provide a way for the offender to appeal an action. Appeals create a new queue item sent to second-tier review. Log appeals separately and require final reviewer rationale. -* **Retention policy:** Define and implement retention for report data and evidence (e.g., retained for policy/lawful reasons). - ---- - -# 7) Data & fields to record (schema checklist) - -* report_id, created_at, last_updated_at -* reporter_user_id (and flag for anonymity preference) -* target_user_id(s) -* message_id, channel_id, guild_id -* message_snapshot: text, attachments (IDs), rendered content (HTML/markdown) -* category, subtype, freeform_explanation -* ml_score, ml_model_version, flags (auto_hidden, quarantined) -* moderation_actions[]: {action_type, reviewer_id, comment, timestamp, duration_if_applicable} -* prior_warning_count_for_target, prior_moderation_history_link -* escalation_status, investigation_team_assigned (if any) -* law_enforcement_contacted flag, contact_timestamp, contact_notes -* appeal_status, appeal_id(s) - ---- - -# 8) Anti-abuse and designer tradeoffs (requirements) - -* **Fewer steps vs specificity:** Offer a small number of high-impact subtypes in the first secondary prompt, then ask for details only for those that affect legal/safety outcomes (e.g., imminent danger, hate targeted at protected groups). -* **Spam reporters:** enforce rate limits and require minimal contextual text for repeated reporters. -* **Automatable content:** set clear rules for what can be auto-actioned (e.g., matched CSAM signatures, verified threats, certain explicit images with high confidence). Anything ambiguous or contextual must require human review. -* **Transparency:** report receipts and final notifications must be provided to the reporter (while protecting privacy and legal constraints). -* **Rehabilitation options:** if you want to include rehabilitation (educational warnings, probationary reduced privileges), make these UI actions available to reviewers with templated materials. - ---- - -# 9) Discord implementation notes & constraints (practical) - -* Bot must capture `message_id`, `channel_id`, `guild_id` (Discord exposes these in the context menu). -* To snapshot content you can store the message text and attachments at report time (Discord messages can be deleted by user; keep your copy as evidence). -* For blocking users: bots can instruct users how to block or can implement server-level mutes/roles; global user blocks across Discord are not possible from a bot. Use server roles/permissions to implement temporary mutes/restrictions. -* For DM/anonymous reporting: if you accept DM reports to the bot, record reporter ID but offer an option to redact reporter display name when showing to moderators (backend still records ID). -* Ensure the bot follows Discord rate limits and permissions when issuing moderation actions (DELETE MESSAGE, BAN, TIMEOUT (if supported), MANAGE ROLES). - ---- - -# 10) Escalation decision matrix (condensed) - -* **Immediate escalate to emergency queue** if: Imminent danger + specific plan/location OR explicit credible threat to person/group. -* **Auto-hide & high priority** if: ML score > threshold OR attachments flagged (CSAM or explicit) OR report contains doxxing. -* **First-tier review** if: medium score or user report with standard harassment/hate speech. -* **Second-tier review** if: repeated offenses, cross-server abuse, legal risk, or request for permanent removal of account. - ---- - -# 11) Minimal UX copy to implement (examples) - -* Primary: `"Please select the reason for reporting this message."` -* Confirmation: `"Thank you for reporting. Our content moderation team will review the message and decide on appropriate action. This may include post and/or account removal."` -* Auto-filter opt-in: `"We've noticed you've reported several messages recently. Would you like us to automatically filter out messages similar to the ones you've reported for the next 24 hours? This change will only be visible to you."` (Yes/No) -* Imminent: `"If this message suggests someone is in immediate danger, we may contact local authorities. Do you want us to escalate?"` (Yes/No) - ---- - -# 12) Testing and acceptance criteria (how you know it’s correct) - -* Report creates a persisted record with all required fields. -* Reporter can optionally block user as part of the flow. -* Moderators can view all required context and take the listed actions; actions modify Discord state and update the report status. -* Automated triage correctly routes `Imminent Danger` cases to the high-priority queue. -* Rate-limit prevents spam reporting while still enabling a real user to file multiple legitimate reports. -* Audit logs contain uneditable history of decisions and reasons. -* Appeals are possible and routed to second-tier reviewers.