Privacy & user-safety
PII detection and end-user / community-governance tooling. We wrap Presidio for PII in trainguard; most user-safety tools sit adjacent to a CSAM-detec
PII detection and end-user / community-governance tooling. We wrap Presidio for PII in trainguard; most user-safety tools sit adjacent to a CSAM-detection pipeline.
6 projects — 1 use · 2 learn from · 3 out of scope.
Project descriptions are adapted from awesome-safety-tools (maintained by ROOST); the verdicts and analysis are ours. Snapshot: June 2026 — a point-in-time view that complements, and does not replace, their living list.
Presidio
Use · by Microsoft · pairs with trainguard
This is the PII engine we bolt onto trainguard: rather than build our own detectors, we wrap Presidio to answer 'is this training dataset safe to use' by flagging and redacting names, faces, and identifiers before data ever reaches a model.
Open-source framework for detecting and anonymizing personally identifiable information (PII) and sensitive data in text, images, and structured data, using NLP, pattern matching, and customizable recognizers.
SquadBox
Learn from · by UW Social Futures Lab
Aimed at end-user harassment rather than platform CSAM, so it's adjacent rather than core — but its friend-sourced, human-in-the-loop moderation model is a concept worth borrowing for how trusted reviewers triage flagged content.
Tool that lets someone facing online harassment invite trusted friends to act as moderators for their inbox, reviewing and filtering incoming messages on the owner's behalf according to their preferences.
Uli
Learn from · by Tattle
Built for gender-based-violence response, not CSAM, so it's out of trainguard's direct lane — but its crowdsourced abuse-detection lists and evidence-archiving approach are concretely relevant to building content classifiers and preserving reportable evidence.
Browser plugin and accompanying resources for mitigating online gender-based violence in India, helping people of marginalized genders detect abusive content, archive evidence, and respond collectively to abuse on social media.
Fawkes
Out of scope · by Shawn Shan (SANDLab, University of Chicago)
Genuinely clever adversarial-ML work, but it protects an individual's face from recognition AIs rather than screening a platform's content for CSAM, so it sits outside trainguard's pipeline. Worth watching as a data-poisoning technique that could show up in datasets we screen.
Image-cloaking system that adds imperceptible pixel-level perturbations to photos so unauthorized facial-recognition models (e.g. Clearview-style scrapers) can't reliably identify the person, intended for personal privacy protection and academic research.
Frankly
Out of scope · by Applied Social Media Lab (Berkman Klein Center)
A deliberation and community-dialogue tool, not a detection or screening system — adjacent to online safety through healthier conversation design, but it doesn't touch the CSAM-detection problem trainguard addresses.
Open-source platform for hosting video-enabled deliberative conversations, with survey-based participant matching and structured event templates to support constructive group dialogue.
PolicyKit
Out of scope · by UW Social Futures Lab
Focused on community self-governance and moderation policy rather than automated abuse detection, so it's out of scope for a CSAM pipeline. Still a useful reference for how platforms encode and enforce their own safety rules above the detection layer.
Governance-authoring toolkit that lets online communities design and execute their own governance and moderation rules (drawing on Ostrom's commons framework) across platforms like Slack, Reddit, and Discourse.
Red-teaming & evaluation
Adversarial testing harnesses. We recommend pairing promptshield with one of these, and plan to contribute the CSAM-intent probes the generalist harne
Investigation & signal-sharing
Threat-signal sharing and investigation tooling. Meta ThreatExchange / python-threatexchange set the bar for hashstream; disinformation and platform-o