Browse All Ideas
Browse all ideas generated at the T&S Hackathons
T&S Hackathon Idea List
We’re always generating interesting ideas at T&S Hackathons! If you’d like to learn more about any of the below ideas or be connected to the groups that came up with the idea, please email hello [at] tshackathon.org anytime! Note these ideas are in no particular order (be sure to use the filter options below)
Title | Description | Topic | Event | Voted |
---|---|---|---|---|
AgeGuard | Develop an LLM to predict the age of a user with a high degree of confidence. A dataset to train, test, and validate the model could be constructed by verifying the age of a fixed number of users (through parental control features in the app like Instagram family center, Google family link) and then analyzing specific factors associated w/ them to understand trends: The age the accounts that these users are connected to Login times of the users of a specific age If the predicted age of the user doesn’t match with their reported age, we will take action like not showing certain content to them (like politics, sexually inappropriate content) etc. | Child Safety | San Francisco 2023 | |
Undersight Board | Voice of the Community System for Policy Development and Localization – An LLM-generated Persona Building system for moderators, policy managers, and operations teams to generate community specific input for T&S policy. | Governance | San Francisco 2023 | Top Idea |
TrustShare | A generative chatbot that empowers trust professionals. This chatbot will be trained on all publicly available trust and safety information across the internet, including transparency reports, video transcripts from trust and safety talks, current regulations, and researcher input. | Governance | San Francisco 2023 | |
Safe Speak | AI chatbot that empowers users to take control of distressing situations. Prompts users during a challenging exchange to report content and automatically plugs into existing reporting flows. | User Education | San Francisco 2023 | |
Safety Interventions | Use behavioral analytics, fraud detection machine learning models, textual analysis to detect scam and fraud signals in user to user interaction Provide warnings and fraud education to users to prevent further engagement with bad actors Highly customizable for each platform to address their needs. | Scams & Fraud | San Francisco 2023 | |
Clarif.ai | Tool that uses AI to optimize communications and interventions to each individual user in a way that maximizes comprehension and application Using both proprietary user data and public data the model: Learns who the user is Adapts to their understanding and usage of product Helps protect users with tailored communication and education. | Policy | San Francisco 2023 | |
SafetyGPT | SafetyGPT is a web-based interface that enables users to quickly access easy-to-understand information about how to stay safe or take action when experiencing a harm on top platforms. It’s your one stop shop to knowing all things about policy, community guidelines and safety and integrity tools. | Policy | San Francisco 2023 | |
UnsafeSets | Generating synthetic datasets to train models on harmful content. Real-world datasets to train models on harmful content are silo-ed and insufficient to support emerging platform needs. We propose UnSafeSets: generating synthetic datasets to train models on harmful, high sensitivity + low prevalence content. | Harmful Content | San Francisco 2023 | |
AI Safety Network | AI-powered connection model for sharing cross-platform abuse Any industry could spin up their own central clearing house for sharing hashed content identifying abusive users E.g., make it easy for Bumble to tell Tinder about a user who committed a sexual assault. Cross-sector effort (public, private, and civic) to define policies and privacy standards. | Data Sharing | San Francisco 2023 | |
SafetySherpa | We are designing a T&S Educational Chatbot for social media platforms which will educate users on policies and abuses when they seem to be interacting with suspected violating content. | User Education | San Francisco 2023 | |
XCALibR | XCALibR (Xplatform Content Abuse Library Reference): Cross-Platform Online Safety Collaboration Software. The database attempts to unify content policy and ethical standards across all platforms; to find a common ground. Based on that policy, ML models are built to learn from existing enforcement data that is accessible to every platform. Emphasizing transparency, tolerance level customizability & inclusivity. | Data Sharing | San Francisco 2023 | Top Idea |
TrustAlign | AI simplifies the user agreement process by comparing the user’s values with platform’s policies and terms of agreement and highlight differences in simple terms. If user is about to violate platform policies, AI informs user about violations. The long term of the goal of the tool will be ensuring accountability while maintaining privacy as a priority. | Transparency | San Francisco 2023 | |
One-Stop Policy Shop | We will leverage publicly available platform policies, vetted academic research on T&S, site safety centers, and existing regulations. In the short term, we will leverage DeepL and Google crowdsource for initial language translation for the LLMs. Long-term, we will red-team our LLM with researchers and civil society volunteers. | Policy | San Francisco 2023 | Top Idea |
B.L.O.B.I. (Big List of Bad Ideas) | An AI model designed to detect and track emerging hate groups by analyzing and learning from existing data sets of hate speech. In collaboration with trusted partners, we aim to continually refine the data set to improve accuracy and relevance of the training data. With TnS professionals, researchers, and NGOs in mind, this database will provide real-time updates on emerging hate speech patterns; helping them respond effectively to protect users. | Data Sharing | San Francisco 2023 | |
WatchDog | Info-sharing consortium to facilitate the transfer of bad actor behavioral patterns. An information-sharing consortium to facilitate the transfer of bad actor behavioral patterns and signals between Trust & Safety teams WatchDog will consume shared information and develop user level AI models focused on behavior to detect and enforce accounts and networks that cause harm A continuous feedback-loop will allow platforms to feed data into WatchDog as patterns and behaviors change over time. | Data Sharing | San Francisco 2023 | |
AI Intervention | AI intervention for checking and enhancing behavioral traits to detect scam activity. Platform allowing you to make purchases on them (for example UPIs) Platforms with detection systems Platforms looking for proactive solutions for detecting scam Activity | User Education | San Francisco 2023 | |
AlgoSpeakEasy | Train an AI model using diverse multilingual takedown datasets to identify and flag instances of moderation evasion across platforms, enabling the labeling of posts and profiles that engage in such tactics. Outputs an API to layer over existing trust and safety frameworks to continuously detect new examples of evasion tactics Categorizes types of behaviors and communities associated with those behaviors and flags to humans, allowing platforms and moderators to keep up with multi-platform abuse Bi-directional model allows platforms to contribute to model collaboratively Data anonymized and behavioral focused to remove privacy and compliance risk Mimics GIFCT’s Hash-Sharing Database | Data Sharing | San Francisco 2023 | Top Idea |
AI Gold Standards | Methodology to evaluate an AI system to eliminate biases, increase accuracy, and enhance contextualization. | Governance | San Francisco 2023 | |
Chippy | A prosocial AI-powered chatbot that identifies and prevents negative online experiences. “Chippy* takes a different approach to community moderation, by empowering users themselves with better education, tools, and perks to have users be more invested and proactive in the communities they are in”. Build trust in products; positive impact on company growth and user engagement. | User Education | San Francisco 2023 | |
360.ai | A third-party AI-based application, using LLMs, response metrics, sentiment and speech analysis to analyze electronic communication and identify exclusionary behaviors such as bullying, discrimination, and hate. Aggregate corporate data to: Analyze text and speech to detect tone and prevent harassment Measure employees against their own baseline to detect patterns of exclusion or discrimination Support HR compliance Use trend analysis to validate companies’ compliance with DEI values in the workplace. | User Education | San Francisco 2023 | |
SafeTrace Collaborative | Repository brings together data from various social media companies, collaborating to detect and combat malicious patterns associated with child sexual abuse and trafficking through advanced AI detection. STC is a first of it’s kind, open-source AI resource dedicated to protect children from malicious harm online. This unique repository brings together data from various social media companies, collaborating to detect and combat malicious patterns associated with child sexual abuse and trafficking through advanced AI detection. STC compiles and presents the data, mapping trends to deliver comprehensive insights and mitigate threats against children, ensuring their safety and wellbeing in the digital sphere. | Data Sharing | San Francisco 2023 | |
Salama Button | A dedicated in-site button to improve accessibility and reduce turn-around time for victims of online abuse. There is an increase in online abuse of children with low reporting of cases of abuse. This is attributed to the complexity of existing reporting systems across all platforms that don’t accommodate the brain responses during a state of panic and fear, hence, The Salama button aims to improve accessibility and reduce turn-around time for victims of online abuse. This is through a personalized safety process. | User Education | San Francisco 2023 | |
Acorn | An interactive AI-driven safety plug-in to empower young users and help keep them safe online. Acorn helps drive awareness & educates a user on online safety through minors-friendly design and communication Grows with the child until they reach the age of 18 Acorn helps explain policies, can give guidance on what content is visible to the user and can provide safety advice. | User Education | Paris 2023 | |
Unified Safety Platform | Unify and democratise access to T&S knowledge, solutions and signals Based on two pillars: 1) Provision of actionable regulatory information 2) Provision of plug and play content intelligence from open access and proprietary sources | Data Sharing | Paris 2023 | |
Think Twice | A custom AI chatbot will be put into the platform to make custom advices to align with community guidelines. The bot will then monitor and alert users when they try to publish content that might result in possible violations of community guidelines . | User Education | Paris 2023 | |
Policy Mapper | A policy development and refinement tool to helps policy teams at new, and small platform create policies that are benchmarked against industry and legal standards for illegal content. This tool bridges the gap between the law and the interpretation of the law. | Policy | Paris 2023 | |
Safe Sharing | A collaborative approach to reduce sexual harassment recidivism across platforms. Safe sharing creates a network between platforms and a third party knowledge source to be able to offer awareness raising programs to violating users so that they can safely return to platforms, as well as real-time insights on emerging threats to platforms. | Data Sharing | Paris 2023 | |
easyFlag | Centralised portal which: (i) connects content reporters to the appropriate trusted flagger specialised in the relevant region or subject matter; (ii) provides useful sorting & review features for TFs in a cost-effective & scalable manner | Governance | Paris 2023 | |
Genuine | An independent body that is held accountable by the public via its elected board of directors, drawing in competitive and diverse talent to stay ahead of evolving online risks. | Governance | Paris 2023 | |
Nudge | Nudge is a real-time, scam-specific detection system. It’s powered by AI and scam-specific datasets for LLM training, beginning with sextortion and growing the scam use cases over time. The scammer sends message to potential victim Nudge detects the high-risk message based on a scam-specific optimized dataset providing a risk score along with associated data around previous scam incidents User gets a nudge, informing them of the potential scam along with links to the nudge resource center Nudge resource center includes a scam-specific chatbot “NudgeAI”. | AI / ML | Paris 2023 | Top Idea |
V Game | “V” is an integration game that aims to crowdsource content moderation by letting users flag and moderate illegal content. Players earn points for accurately report and moderate violations, training algorithms to better detect the content and get rewarded for it. | Content Moderation | Paris 2023 | |
AI Safety Education Initiative | Creation and maintenance of modular materials for education and intervention. | User Education | Paris 2023 | |
CrossGuard | An open source platform sharing public records of cross platform abusive accounts. Our idea is a proactive online safety platform that seamlessly integrates with popular social media apps. It receives non-restricted data about an account enforced on one platform, uses it to check for matching or similar profiles on other platforms, AI compiles a ‘risk assessment’ and alerts users to potential fraudulent or malicious accounts. | Data Sharing | Paris 2023 | |
TrustNet | A knowledge-sharing platform for T&S professionals to share actionable intelligence on emerging online threats, aimed at facilitating early detection and coordinated responses to new trends. | Data Sharing | Paris 2023 | Top Idea |
#IsItReal | #IsItReal is a web-based chatbot for individuals & citizen real time fact-checkers that helps them with 360° fact checking of media file | Governance | Safety by Design 2024 | |
3 Design Changes | Ranking on quality, not engagement Limit rates for new & untrusted users to impede over-targeting of large groups of users. Privacy settings by default. | Design | Safety by Design 2024 | |
Red Team Upstream | Create a joint government/civil society Safety-by-Design ‘Red Team’ working ‘upstream’ with emerging tech sources e.g. university depts/sci hubs. Later develop ‘downstream’ for companies. | Partnerships | Safety by Design 2024 | |
The Cultural Context Database | A global database of traditional cultural expressions to ensure that automated AI systems can make the policy exceptions needed to protect free speech and local cultures. | Data Sharing | Safety by Design 2024 | |
STAMPed | A Certification Program, run by a not-for-profit, to help platforms demonstrate their commitment to Safety by Design to their business partners. | Certifications | Safety by Design 2024 | |
Trust Tech Summit | Set minimum standards Simplicity & clarity in regulations T&S trainings by regulators for founders, legal, policy teams, etc. | Industry Standards | Safety by Design 2024 | |
User Safety as a Common Standard | Apps required to be assessed against Safety by Design Principles Empowering users to be safe by giving them visibility into apps Safety features | Industry Standards | Safety by Design 2024 | |
Facelock | A database of “protected likenesses” where biometric data is stored in matchable, but not reconstructible state (e.g. hashes or tokens). Platforms can submit suspicious photos and videos for comparison to this database. This centralized database would significantly reduce likeness theft through the enable rapid, actionable, and accurate matching mechanism, available to any participating platform, without any increased friction at onboarding. | Data Sharing | Safety by Design 2024 | Top Idea |
WhatsApp , What’s up? | Interrupting social dynamics of small group or 1:1 harms on End-to-End Encrypted platforms across many harm types More oversight on app and technology safety while mitigating the risks of E2E brings to these innovations Addressing the risks with proactive and preventive measure. | Design | Safety by Design 2024 | |
Bridging Safety Gaps | Sporadically include AI generated ads/content that are customized to users’ interests, along with a detailed list of things to look out for if they interact with the content/ad | User Education | Safety by Design 2024 | |
SafeRoute BluePrint | Next-generation assessment engine that helps tech startups evaluate their trust and safety features against industry stndards. Offers personalized recommendations and roadmaps tailored to each company for easy prioritization. | Design | Safety by Design 2024 | |
User-Driven Safety Scorecard | Create a global survey for users worldwide to evaluate safety risks on top 20 platforms. 7-point Likert scale on crucial safety dimensions. | User Education | Safety by Design 2024 | |
Safer Together: Inclusive Safety by Design | Building products inclusively in the Safety by Design process with members of the most at-risk communities for Trust and Safety issues such as bullying, fraud, and harassment. These include: People with disabilities People in gender transition Children Neurodivergent communities LGBTQ+ Seniors | Design | Safety by Design 2024 | |
Project Safety Sidekick: Safety by Design AI Assistant | Our customized ‘Safety by Design’ AI assistant will take in the input to analyze similar features across the T&S industry and provide an output highlighting: a) what are the potential risk or harms of the feature or business b) what are the relevant user policies the platform should have in place c) what are the safety features that should be accessible to users of the app d) what are the data analysis methods that should be used to quantify the effectiveness of the safety features implemented | Design | Safety by Design 2024 | Top Idea |
Safety Unified Solution (SBD Toolkit) | A toolkit built and distributed by NGO/Professional organization that provides information and resources that can be easily implemented by any organization | Partnerships | Safety by Design 2024 | |
Safety Steward (Safety Stu) | We propose a “forward defence” public tech AI agent available (opt-in) to each citizen that: Renders ToS readable by a real person (currently less than 1% read ToS) Synthesizes with cookie/tracking metadata and historical data governance info to make it clear to users in natural language where their data is collected and being sent Assists with dynamic rendering of sites for accessibility concerns | Design | Safety by Design 2024 | |
Standardizing T&S Evaluation for LLM-based models | Develop standardized framework for specific T&S threats to use between model validation and deployment for LLM models Create stakeholder hub for signatories (AI companies, investors) to co-create and adopt this framework | Industry Standards | Safety by Design 2024 | Top Idea |
Foresight | Synthesize new threats with LLMs that can red-team your current policy. Augment private, historical examples of abuse detected under your current policy for stress testing. Allow engineers to automatically generate real-time threats for ML development. Share synthetically generated data across companies for collaboration. | Design | Safety by Design 2024 | Top Idea |
Got an idea for the T&S community to work on? Let us know about it!