# c2pa-lite (/docs/c2pa-lite) **c2pa-lite** is a Rust library that stamps generated media with recoverable, cryptographically signed C2PA content credentials — a "this is synthetic, from this model, at this time" provenance signal. It targets the long tail of open and self-hosted image/video generators that ship no provenance today. ## Install [#install] ```bash cargo add c2pa-lite ``` ## What it does [#what-it-does] * Wraps contentauth/c2pa-rs to build, sign, embed, and verify C2PA manifests — it does not reimplement C2PA. * Fills C2PA's empty soft-binding slot with an OSS watermarker (default: Adobe TrustMark via ONNX) through a pluggable `SoftBinder` trait, so the watermark is swappable without touching the signing path. * Exposes a 3-call surface: `credential` (mark + sign + embed), `verify` (manifest + validation + watermark decode), and `recover` (refetch a stripped manifest from the store). * Supports JPEG and PNG, with experimental MP4/H.264 video via per-segment soft binding plus fragmented-MP4 hard binding. * Signs with local PKCS#8 keys for development and KMS/PKCS#11 for production, against a documented test-anchor trust list. * Emits a normalized JSON verdict that is wire-compatible with Adobe/CAI Verify (signed? trusted issuer? watermark decoded? recovered from store?). ## Quickstart [#quickstart] ```rust use c2pa_lite::{credential, verify}; // Mark + sign + embed a credential into a generated image. let signed = credential(&image_bytes, "stable-diffusion-xl", &signing_key)?; std::fs::write("output.signed.jpg", &signed)?; // Later, validate the asset and inspect the verdict. let report = verify(&signed)?; println!("signed: {}", report.signed); println!("trusted issuer: {}", report.trusted_issuer); println!("watermark decoded: {}", report.watermark_decoded); ``` ## Status [#status] Pre-release: the first crates.io publish is still pending, so pin versions and expect the API to move before a stable release. Signing and manifest handling delegate to c2pa-rs upstream; robust watermarking is the active work, and invisible marks are removable — c2pa-lite ships adversarial decode-survival numbers and never claims to be tamper-proof. ## Source [#source] [`packages/c2pa-lite`](https://github.com/digitalharm/fight-csam/tree/main/packages/c2pa-lite) # csam-shield (/docs/csam-shield) `csam-shield` is one-line middleware for upload pipelines that wires detectors like PhotoDNA, PDQ, Arachnid Shield, and Cloudflare behind a single normalized match / nomatch / pending interface. It ships as a TypeScript package for Express, Fastify, and Hono, with a Python sibling for FastAPI, Starlette, and Flask. ## Install [#install] ```bash # Node npm install @digitalharm/csam-shield # Python pip install csam-shield ``` ## What it does [#what-it-does] * Fans out one upload to an ordered list of detectors and normalizes every result into one `MatchResponse` with a `match` / `nomatch` / `pending` / `error` decision plus per-detector results. * Combines detectors with a configurable strategy: `any-match` (default), `majority`, or `consensus`. * Applies per-detector timeouts and a retry policy, with a fail-open (`allow`) or fail-closed (`deny`) `onError` stance for failed scans. * Includes a working PDQ-list detector that Hamming-matches against an operator-supplied hash set, requiring no external credentials. * Never persists the scanned bytes; it emits a redacted `logSummary` and an `onDecision` hook for your own audit sink. * Ships framework adapters for Express, Fastify, and Hono (Node) and FastAPI, Starlette, and Flask (Python). ## Quickstart [#quickstart] ```ts import { createShield } from "@digitalharm/csam-shield"; const shield = createShield({ detectors: [ { detector: "cloudflare-csam-scanning", config: { token: process.env.CF_TOKEN! } }, { detector: "photodna", config: { apiKey: process.env.PHOTODNA_KEY! } }, ], strategy: "any-match", onDecision: async (resp) => { await myAuditLog.write(resp); }, }); const result = await shield.scan({ kind: "image-bytes", data, contentType: "image/jpeg" }); if (result.decision === "match") { // block + escalate to the CyberTipline } ``` ## Status [#status] Pre-release: the public API surface and detector dispatch are implemented, but the first npm and PyPI publishes are still pending, so treat names and signatures as subject to change. Only the PDQ-list detector runs end to end today (it needs an operator-supplied hash list); the PhotoDNA, Cloudflare, and NCMEC adapters are scaffold stubs until each upstream's credentialing path is unblocked, and the shield never auto-reports or auto-bans. ## Source [#source] [`packages/csam-shield`](https://github.com/digitalharm/fight-csam/tree/main/packages/csam-shield) # cybertip-cli (/docs/cybertip-cli) `cybertip-cli` files NCMEC CyberTipline reports, the statutory §2258A submission step that platforms must perform themselves once detection and takedown are done. It ships a CLI plus a stable library API with both Node and Python bindings over a shared core. ## Install [#install] ```bash # Node npm install @digitalharm/cybertip-cli # Python pip install cybertip-cli ``` ## What it does [#what-it-does] * Builds a typed NCMEC report model generated from the published XSD, with a builder that fails closed on missing mandatory fields. * Runs an idempotent submit, upload, fileinfo, finish state machine with a crash-resumable WAL and exponential-backoff retries. * Defaults to a sandbox/dry-run mode that produces the wire payload without network I/O; live filing requires an explicit flag plus valid credentials. * Keeps an append-only, hash-chained audit log of every request and response, with a one-command redacted export for §2258A preservation. * Strips reporter PII and internal IDs from local audit copies via a declared redaction policy. * Never decides what is CSAM; it stays a downstream formatter that consumes a detection result plus evidence. ## Quickstart [#quickstart] ```python from cybertip_cli import CyberTipReport, submit_dry_run report = CyberTipReport( client_reference="cybertip-myorg-001", reporting_person={"org_name": "MyOrg", "esp_id": "ESP-001", "contact_email": "trust@myorg.example"}, incident={ "incident_type": "csam-distribution", "incident_datetime_iso": "2026-05-30T12:00:00Z", "description": "Detected via CSAM-Shield on upload.", "severity": "A", "evidence_refs": ["urn:evidencevault:abc123"], }, ) result = submit_dry_run(report) if not result.ok: for err in result.errors: print(f"ERROR: {err}") ``` ## Status [#status] Pre-release: the first publish to npm and PyPI is still pending. Production submission is counsel-gated and blocked until outside-counsel sign-off plus an active NCMEC ESP credential are in place, so only sandbox and dry-run modes are usable for now. ## Source [#source] [`packages/cybertip-cli`](https://github.com/digitalharm/fight-csam/tree/main/packages/cybertip-cli) # detectkit-test (/docs/detectkit-test) detectkit-test ships deterministic synthetic fixtures so engineers can prove their CSAM detection plumbing works end-to-end in CI, with zero real CSAM. It is a Python package: a versioned corpus, fingerprint generators, and a pytest plugin plus CLI that feed fixtures through your own pipeline adapter. ## Install [#install] ```bash pip install detectkit-test ``` ## What it does [#what-it-does] * Ships a versioned corpus of procedurally generated, provably non-CSAM synthetic images and short clips, each with a manifest of expected fingerprints (MD5, SHA-1, PDQ, TMK+PDQF, vPDQ). * Regenerates those fingerprints deterministically, so fixtures can be rebuilt and audited. * Engineers near-duplicate pairs at exact target Hamming distances (0, 10, 31, 32, 90) to exercise match-threshold boundaries. * Runs fixtures through a \~12-line `MatcherAdapter` you implement, asserting hit/miss/score and failing CI on plumbing regressions like truncation or wrong byte order. * Emits a false-positive characterization table (recall, precision, FP-rate) as JSON plus JUnit XML, with a prebuilt GitHub Action. ## Quickstart [#quickstart] ```python from detectkit_test import MatcherAdapter, run_suite class MyAdapter(MatcherAdapter): def match(self, fixture): # feed the fixture through your detection pipeline # and return whether it matched, plus the score result = my_pipeline.scan(fixture.bytes) return result.is_match, result.score report = run_suite(MyAdapter()) print(report.fp_rate, report.recall) report.to_junit("detectkit-results.xml") ``` You can also run the bundled suite from the command line: ```bash detectkit run --adapter myproject.adapters:MyAdapter ``` ## Status [#status] PRE-RELEASE: the first PyPI publish is still pending, and APIs may change before v0.1. Every fixture is synthetic and non-CSAM by construction (procedurally generated noise, gradients, and shapes), and the project never ingests real CSAM or real hash lists. ## Source [#source] [`packages/detectkit-test`](https://github.com/digitalharm/fight-csam/tree/main/packages/detectkit-test) # evidencevault (/docs/evidencevault) `evidencevault` is a Go service that preserves CSAM-report evidence defensibly: it keeps a tamper-evident chain of custody and per-jurisdiction preservation timers that hold up in court. It governs the sealed evidence after a report fires, sitting downstream of the detector so illegal bytes never leave your trust boundary. ## Install [#install] ```bash go get github.com/digitalharm/fight-csam/packages/evidencevault ``` ## What it does [#what-it-does] * Exposes the custody and retention lifecycle over an HTTP API (`evidencevaultd serve`), backed by either an in-memory store or a disk store that persists one JSON package per id and survives restarts. * Stores content-addressed evidence packages keyed by id, each carrying a `content_ref_hash` plus operator-supplied ciphertext. * Records an append-only chain of custody: every store, access, hold, and delete is logged with actor and reason, and the chain stays terminal once a package is deleted. * Tracks per-jurisdiction retention so `GET /expired` reports which packages would be eligible for destruction, while a litigation hold suspends expiry. * Stays out of the plaintext path: the operator wraps the blob with their own KMS and hands the vault only ciphertext. ## Quickstart [#quickstart] Run the service against an in-memory store, then walk a package through its lifecycle with curl. ```bash go build -o evidencevaultd ./cmd/evidencevaultd ./evidencevaultd serve --store=memory:: --addr=127.0.0.1:8080 ``` ```bash BASE=http://127.0.0.1:8080 # Store (ciphertext is base64-encoded operator-wrapped bytes) curl -X POST "$BASE/packages" -H 'Content-Type: application/json' \ -d '{"id":"ev-1","ciphertext":"aGVsbG8=","content_ref_hash":"sha256-abc","operator":"ts-op"}' # Get (logs an access entry on the custody chain) curl "$BASE/packages/ev-1?operator=auditor&purpose=subpoena-2026-014" # Place a litigation hold (suspends expiry) curl -X POST "$BASE/packages/ev-1/hold" -H 'Content-Type: application/json' \ -d '{"operator":"counsel","hold_ref":"lit-2026-001"}' # Delete: zeroes the ciphertext, keeps metadata + custody log curl -X DELETE "$BASE/packages/ev-1?operator=retention-bot" ``` ## Status [#status] Pre-release: the first publish is still pending, so treat names and signatures as subject to change. Retention enforcement, real KMS, and other production paths are counsel-gated for now — this build ships a noop-KMS that stores ciphertext as given (no confidentiality), and retention is queryable but not timer-enforced (deletion is always an explicit, audited `DELETE`). ## Source [#source] [`packages/evidencevault`](https://github.com/digitalharm/fight-csam/tree/main/packages/evidencevault) # hashkit-match (/docs/hashkit-match) `hashkit-match` is an in-memory multi-index Hamming matcher for PDQ hashes. It pairs with [hashkit](../hashkit): once you can compute PDQ hashes, this layer matches incoming hashes against a known-bad hash set. ## Install [#install] ```bash cargo add hashkit-match ``` ## What it does [#what-it-does] * Provides a multi-index Hamming (MIH) matcher over the 256-bit PDQ hash space, avoiding naive linear comparison at scale. * Matches incoming hashes against caller-supplied hash sets you've received from sources such as NCMEC, IWF, or Project Arachnid. * Uses a configurable distance threshold, defaulting to 31/256 (the PhotoDNA-equivalent threshold). * Is a pure data structure that ships no hash lists. * Offers bindings parallel to hashkit across Rust, WASM, Node, Deno, Bun, and Python. ## Quickstart [#quickstart] ```rust use hashkit_match::Matcher; // Build an index from a caller-supplied set of known PDQ hashes. let mut matcher = Matcher::new(); matcher.insert(known_hash); // Query an incoming hash; the default threshold is 31/256. if let Some(hit) = matcher.query(incoming_hash) { println!("matched within distance {}", hit.distance); } ``` ## Status [#status] Pre-release: the first crates.io publish is still pending, and the API is planned to ship alongside hashkit. Treat names and signatures as subject to change until the initial release. ## Source [#source] [`packages/hashkit-match`](https://github.com/digitalharm/fight-csam/tree/main/packages/hashkit-match) # hashkit (/docs/hashkit) **hashkit** computes Meta's PDQ (image) and TMK+PDQF (video) perceptual hashes from raw pixel data, backed by a frozen, NCMEC-cross-checked conformance vector suite. It is for trust-and-safety and platform teams who need to match user-generated content against known-CSAM hash lists and prove their hashes are byte-identical to the reference. ## Install [#install] ```bash cargo add digitalharm-hashkit ``` The crate is published as `digitalharm-hashkit` but imported as `hashkit` (e.g. `use hashkit::...`). ## What it does [#what-it-does] * Computes a 256-bit PDQ image hash plus a 0–100 quality score from a single-channel luma buffer. * Offers a PDQ-Dihedral variant that returns 8 hashes for the dihedral transforms (4 rotations × 2 mirrors) for robustness to rotation and mirroring. * Compares hashes by Hamming distance (0–256 bits); matches are typically below a threshold of 31. * Takes raw RGB/luma planes, never image codecs — the host decodes, keeping the core deterministic across runtimes. * Ships zero hash lists: the algorithm lives here, the known-CSAM lists stay with NCMEC, IWF, and Project Arachnid. * Is gated on a versioned conformance corpus so a release fails closed on any one-bit drift from the reference. ## Quickstart [#quickstart] ```rust use hashkit::pdq; // `luma` is single-channel, row-major, 1 byte per pixel. // Decode and downsample to luma yourself (hashkit takes no image codecs). let result = pdq::hash_from_luma(&luma, width, height)?; println!("hash: {}", result.hash.to_hex()); println!("quality: {}", result.quality.0); // Two hashes are a likely match when their Hamming distance is below ~31. let distance = result.hash.hamming(&other.hash); let is_match = distance < 31; ``` ## Status [#status] Pre-release: the first crates.io publish is still pending. PDQ image hashing (`hash_from_luma` and `hash_dihedral_from_luma`) is implemented by delegating to the maintained `pdqhash` crate, while the conformance corpus, WebAssembly bindings, and TMK+PDQF video features are still in progress toward v1.0. Pin versions and expect APIs to move before the first stable release. ## Source [#source] [`packages/hashkit`](https://github.com/digitalharm/fight-csam/tree/main/packages/hashkit) # hashstream (/docs/hashstream) `hashstream` distributes and syncs signed, versioned snapshots of CSAM hash lists. It is a Go service plus a TypeScript client SDK: operators publish snapshots from the lists they already hold, and consumers sync the latest version with a verifiable provenance trail. ## Install [#install] ```bash # Go server / library go get github.com/digitalharm/fight-csam/packages/hashstream # TypeScript client SDK npm install @digitalharm/hashstream-sdk ``` ## What it does [#what-it-does] * Publishes immutable, versioned snapshots of a hash list, each one Ed25519-signed so consumers can verify provenance before trusting it. * Ingests operator-supplied hash files (MD5, SHA1, PDQ, PhotoDNA) and normalizes them into canonical, content-addressed rows. * Lets consumers sync a full snapshot or just the diff since a known version, with stable pagination. * Records every published and served snapshot in an append-only, hash-chained audit log answering "which version was active at time T." * Ships no hash lists of its own — it only moves and attests to lists you already have rights to. ## Quickstart [#quickstart] ```typescript import { HashStreamClient } from "@digitalharm/hashstream-sdk"; const client = new HashStreamClient({ endpoint: "https://hashstream.internal", publicKey: process.env.HASHSTREAM_PUBLIC_KEY, // Ed25519 verify key }); // Pull the latest snapshot; the signature is verified before it resolves. const snapshot = await client.latest(); console.log(`version ${snapshot.version}: ${snapshot.entries.length} hashes`); ``` ## Status [#status] Pre-release: the first publish of both the Go module and the npm SDK is still pending, so treat versions, names, and signatures as subject to change until the initial release. hashstream operates only on operator-supplied lists and ships no hash lists of its own; you must hold your own NCMEC, IWF, or Project Arachnid agreements. ## Source [#source] [`packages/hashstream`](https://github.com/digitalharm/fight-csam/tree/main/packages/hashstream) # FightCSAM (/docs) **FightCSAM** is the developer front door to **eleven open-source, Apache-2.0 building blocks** for fighting child sexual abuse material (CSAM). Each tool does one job well and composes with the others into a compliance-defensible pipeline. ## Detect [#detect] Find known and near-duplicate CSAM in user-generated content. ## Report & preserve [#report--preserve] Meet statutory reporting and evidence-preservation obligations (§2258A). ## Prevent (AI generation) [#prevent-ai-generation] Stop CSAM from being generated or trained on in the first place. ## Provenance & care [#provenance--care] Sign what you generate; protect the humans who review. ## Verify [#verify] Prove your pipeline works — in CI, without touching real CSAM. ## Built for coding agents [#built-for-coding-agents] FightCSAM treats an AI coding agent as a first-class visitor: * **[`/llms.txt`](/llms.txt)** — a curated, machine-readable index of the whole site. * **[`/llms-full.txt`](/llms-full.txt)** — the entire docs corpus as one Markdown file. * **Per-page raw Markdown** at `/llms.mdx/...` — no HTML scraping, no JS execution. * **Static export** — every page is in the initial HTML; `curl` gets the full content. Coming next (see the [release plan](https://github.com/digitalharm/fight-csam/blob/main/docs/ops/v2-release-plan.md)): a `/.well-known/fightsam.json` package manifest, the guided **golden path**, the `create-fightcsam` scaffolder, and a docs **MCP server**. # promptshield (/docs/promptshield) **promptshield** is a drop-in classifier that screens text prompts for CSAM-generation intent before they reach a text-to-image or text-to-video model, so abusive requests are blocked before a single GPU cycle is spent. It is a Python library for self-hosted generation stacks that ship with no input-side defense of their own. ## Install [#install] ```bash pip install digitalharm-promptshield ``` The distribution is `digitalharm-promptshield` but imported as `promptshield` (e.g. `import promptshield`). ## What it does [#what-it-does] * Exposes a one-line `guard(prompt, negative_prompt=None)` that returns a structured verdict: allow, block, or review, with a calibrated score and the list of matched signals. * Runs a fast deterministic first stage — homoglyph, leetspeak, and whitespace normalization plus a curated lexicon — on CPU, no GPU required. * Keys on the conjunction of a minor-indicator concept and a sexual concept, so neither alone trips the gate, which keeps false positives down. * Handles negative-prompt evasion, such as `adult` or `mature` stuffed into negatives to disguise intent. * Logs no prompt content by default, with an optional hash-only audit trail for NCMEC-reporting integrations. * Is one input-side layer of a defense-in-depth stack, never a substitute for an output classifier plus known-CSAM hash matching. ## Quickstart [#quickstart] ```python import promptshield verdict = promptshield.guard(prompt, negative_prompt=negative_prompt) if verdict.action == "block": raise ValueError("Prompt rejected by promptshield") # verdict.action is one of: "allow", "block", "review" print(verdict.score, verdict.matched_signals) ``` A `review` verdict is for borderline prompts you want to route to a human or a stricter policy rather than allow or block outright. ## Status [#status] Pre-release: the first PyPI publish is still pending, so `pip install digitalharm-promptshield` is not live yet. The deterministic first stage, the fine-tuned ONNX classifier, the framework adapters, and the adversarial test suite are still in progress toward v0.1 — pin versions and expect APIs to move before the first stable release. ## Source [#source] [`packages/promptshield`](https://github.com/digitalharm/fight-csam/tree/main/packages/promptshield) # safemod (/docs/safemod) safemod is a moderator-wellbeing layer for content-review queues: it renders media blurred by default, enforces per-shift exposure limits, and reports only k-anonymous aggregate signals so no individual's wellbeing data is ever exposed. It is written in Rust, is zero-dependency, and is built `forbid(unsafe)`. ## Install [#install] ```bash cargo add safemod ``` ## What it does [#what-it-does] * Blur-by-default rendering with explicit click-to-reveal and automatic re-shroud after a timeout. * Per-moderator, per-shift exposure caps (reveal count and cumulative reveal-seconds) returning allow / soft-warn / hard-lockout decisions. * Weighted case rotation so no moderator gets a run of the most severe category beyond a configurable streak. * Anonymized wellbeing intake keyed only to an opaque pseudonym — no names, no raw identifiers. * K-anonymity floor on every aggregate: cohorts below a configurable threshold (default k=5) are suppressed. ## Quickstart [#quickstart] ```rust use safemod::{ExposureBudget, Decision}; let budget = ExposureBudget::per_shift() .max_reveals(40) .max_reveal_seconds(600); let mut session = budget.start("mod-pseudonym-7f3a"); match session.record_reveal(/* seconds = */ 12) { Decision::Allow => render_blurred(item), Decision::SoftWarn => prompt_break(item), Decision::HardLockout => end_shift(), } ``` ## Status [#status] PRE-RELEASE: the first crates.io publish is pending, so APIs may change before 0.1. It is zero-dependency and built `forbid(unsafe)`, and privacy-preserving by design — it stores no identifiers and enforces a k-anonymity floor on all aggregate output. ## Source [#source] [`packages/safemod`](https://github.com/digitalharm/fight-csam/tree/main/packages/safemod) # Coding-agent skill (/docs/skill) **`csam-safety`** is a Claude Agent Skill. Drop it into your coding agent and it already knows the 11 FightCSAM tools, the top ecosystem tools, the golden-path pipeline, and the hard legal/credential gates — so you can say *"add CSAM scanning to my upload pipeline"* and it wires the right tools with the correct install commands, keeping the no-hash-list and NCMEC/counsel gates intact. ## What it knows [#what-it-knows] * **The 11 FightCSAM tools** — purpose, verified install/import commands, when to use each, and current status/gates. * **\~64 top ecosystem projects** — what to use, what to learn from, and what to wrap vs build. * **The golden path** — assess → detect → report → prevent → provenance → care → verify. * **Compliance & gates** — §2258A / UK OSA / TAKE IT DOWN / EU DSA mapped to tools, and the rules that must never be bypassed (ship no hash lists; honor the NCMEC-ESP and outside-counsel gates). ## Install [#install] It ships in the FightCSAM repo under `.claude/skills/csam-safety`. Copy it into your agent's skills directory: ```bash # personal — available across all your projects cp -r fight-csam/.claude/skills/csam-safety ~/.claude/skills/ # or per-project cp -r fight-csam/.claude/skills/csam-safety /.claude/skills/ ``` Your agent then consults it automatically whenever you work on uploads, content moderation, NCMEC reporting, perceptual hashing, AI-generation safety, or Fediverse moderation — even if you don't mention "FightCSAM" or "CSAM". ## Source [#source] [`.claude/skills/csam-safety`](https://github.com/digitalharm/fight-csam/tree/main/.claude/skills/csam-safety) # trainguard (/docs/trainguard) `trainguard` is a Python pre-flight gate that screens a training corpus against known-bad CSAM hash lists before training begins. Point it at your dataset shards, match every image against the lists you hold, and get a signed, machine-readable compliance report you can hand to counsel. ## Install [#install] ```bash pip install trainguard ``` ## What it does [#what-it-does] * Streams samples from local directories and WebDataset tar shards, decoding images once and computing PDQ perceptual hashes in parallel. * Matches hashes against an operator-supplied hash file with a configurable Hamming-distance threshold; in-box PDQ needs no external credentials. * Activates operator-credentialed adapters (Project Arachnid Shield first) for live lists, with keys that are never logged or persisted. * Emits a deterministic JSON report plus a rendered PDF, signed with the operator's X.509 key (COSE) for a tamper-evident chain of custody. * Flags matches for human review with confidence bands — never auto-deletes, never auto-reports, and never writes matched media bytes to disk. * Returns a non-zero exit code on match so it can gate a CI training pipeline in block-on-match or warn-only mode. ## Quickstart [#quickstart] ```python from trainguard import ScreeningEngine engine = ScreeningEngine( hash_file="arachnid-pdq.txt", # operator-supplied list hamming_threshold=31, ) report = engine.scan("s3://corpus/shards/") print(f"scanned {report.scanned}, matched {report.matched}") # Signed JSON + PDF for your trust-and-safety team. report.write_signed("trainguard-report", signing_key="operator.pem") ``` ## Status [#status] Pre-release: the first PyPI publish is still pending, so treat the package name, API, and report format as subject to change until the initial release. trainguard operates only on operator-supplied lists and ships no hash lists of its own — you must hold your own NCMEC, IWF, or Project Arachnid agreements. ## Source [#source] [`packages/trainguard`](https://github.com/digitalharm/fight-csam/tree/main/packages/trainguard) # Classifiers & AI-safety models (/docs/ecosystem/classifiers) ML classifiers and LLM guardrails. FightCSAM ships no general model — csam-shield is built to wrap the best of these as swappable detector backends, and promptshield focuses narrowly on CSAM-generation intent. **22 projects** — 9 use · 10 learn from · 2 reference · 1 out of scope. *Project descriptions are adapted from [awesome-safety-tools](https://github.com/roostorg/awesome-safety-tools) (maintained by [ROOST](https://roost.tools)); the verdicts and analysis are ours. Snapshot: June 2026 — a point-in-time view that complements, and does not replace, their living list.* ### [Content Safety API](https://protectingchildren.google/tools-for-partners/) [#content-safety-api] **Use** · by Google · pairs with [csam-shield](/docs/csam-shield) This is the one classifier on the list aimed squarely at the gap hashing cannot cover: novel, previously-unseen CSAM. A FightCSAM user fronts known-material hash matching with this for first-seen content, which is exactly the role csam-shield is built to orchestrate as a detector backend. Gated and closed-weight, but free with registration and reachable through ROOST. > Google's ML classifiers for detecting CSAM, nudity, and explicit content in images and video, offered to qualifying partners free of charge but gated behind registration; accessible to members via the ROOST coop. ### [CoPE](https://huggingface.co/zentropi-ai/cope-a-9b) [#cope] **Use** · by Zentropi · pairs with [promptshield](/docs/promptshield) Policy-as-prompt classifiers let a team encode a CSAM-specific policy without training a model, which is precisely the leverage promptshield needs for prompt-intent screening. Open-weight on Hugging Face, so it slots in as a promptshield companion or csam-shield text backend with a CSAM-tuned policy. > A small (9B) language model for steerable content classification that scores text against policies a developer writes in plain language rather than a fixed label set. ### [gpt-oss-safeguard](https://github.com/openai/gpt-oss-safeguard) [#gpt-oss-safeguard] **Use** · by OpenAI · pairs with [promptshield](/docs/promptshield) Bring-your-own-policy reasoning models are a strong fit for nuanced CSAM-intent calls that a keyword filter misses, and an open license means a team can run it on-prem where sending prompts to a third party is a non-starter. We wrap it as a promptshield companion or csam-shield text backend with a CSAM policy. > An open-weight reasoning model that classifies text against safety policies supplied at inference time, returning a judgment with its reasoning rather than a fixed-taxonomy label. ### [Granite Guardian](https://github.com/ibm-granite/granite-guardian) [#granite-guardian] **Use** · by IBM Research · pairs with [csam-shield](/docs/csam-shield) A permissively-licensed, genuinely open guardrail that is easy to self-host, which matters when prompts cannot leave your infrastructure. We wrap it as a csam-shield detector backend or promptshield companion; its broad-harm coverage complements, rather than replaces, dedicated CSAM signals. > An Apache-2.0 family of input/output guardrail models from IBM covering general harm, RAG groundedness (hallucination), and agentic/function-calling risks. ### [Guardrails AI](https://github.com/guardrails-ai/guardrails) [#guardrails-ai] **Use** · by Guardrails AI · pairs with [promptshield](/docs/promptshield) A FightCSAM user already standardized on Guardrails can add CSAM-intent screening as one validator in their existing guard, and promptshield is naturally exposed as a Hub validator. It is a harness rather than a detector itself, but it is the right place to plug our checks in. > A Python framework that validates LLM inputs and outputs against predefined risks, with a Hub of community validators that can be composed into a guard. ### [Kanana Safeguard](https://huggingface.co/kakaocorp/kanana-safeguard-8b) [#kanana-safeguard] **Use** · by Kakao · pairs with [csam-shield](/docs/csam-shield) Architecturally the same class of wrappable guardrail as Llama Guard or ShieldGemma, and its multilingual strength is valuable where English-centric guards underperform. A FightCSAM user can reach for it as a csam-shield text backend or promptshield companion; note it covers general harm, so a CSAM policy still does the narrowing. > An open-weight 8B harmful-content detection model from Kakao for moderating LLM inputs and outputs, with notable multilingual (including Korean) coverage. ### [Llama Guard](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3) [#llama-guard] **Use** · by Meta · pairs with [csam-shield](/docs/csam-shield) One of the strongest open, self-hostable text guardrails, and its built-in child-exploitation category makes it a natural CSAM-text signal. We wrap it as a csam-shield detector backend or promptshield companion rather than reimplementing classification. > Meta's open-weight content-moderation model that classifies both prompts and responses in text interactions against a safety taxonomy that includes a child-exploitation category. ### [Llama Prompt Guard 2](https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Prompt-Guard-2/86M/MODEL_CARD.md) [#llama-prompt-guard-2] **Use** · by Meta · pairs with [promptshield](/docs/promptshield) Adversaries jailbreak a generator to coax out CSAM, so injection detection is a real layer of CSAM-prevention defense in depth. Tiny and cheap to run inline, it pairs with promptshield as the anti-jailbreak companion to CSAM-intent screening. > A small (86M) open-weight Meta classifier specialized in detecting prompt-injection and jailbreak attempts against LLMs. ### [ShieldGemma](https://www.kaggle.com/code/fernandosr85/shieldgemma-web-content-safety-analyzer) [#shieldgemma] **Use** · by Google DeepMind · pairs with [csam-shield](/docs/csam-shield) Open-weight and spanning both text and image (via ShieldGemma 2), so it can back both modalities a CSAM pipeline cares about. We wrap it as a csam-shield detector backend or promptshield companion; as with other general guards, a CSAM policy narrows it to the target harm. > A Gemma-based toolkit of open-weight models from Google DeepMind for detecting and mitigating harmful LLM content across safety categories, with ShieldGemma 2 extending coverage to images. ### [Detoxify](https://github.com/unitaryai/detoxify) [#detoxify] **Learn from** · by Unitary AI A clean, widely-copied reference for packaging a text classifier as a pip-installable model, which informs how we ship detector backends. It is general toxicity, not CSAM: useful as an architectural lesson rather than something a FightCSAM pipeline reaches for to catch abuse material. > A set of pretrained models for detecting generalized toxic language in text, trained on the Jigsaw toxic-comment datasets. ### [NSFW Keras Model](https://github.com/GantMan/nsfw_model) [#nsfw-keras-model] **Learn from** · by Gant Laborde A useful pattern for a self-hostable image classifier, but it detects adult NSFW content, which is a different problem from CSAM. We note it as general-purpose: it informs the detector-backend shape without being something a FightCSAM user wires in to find abuse material. > A CNN-based Keras/TensorFlow model that classifies images into explicit categories (porn, hentai, sexy, neutral, drawing). ### [OpenGuardrails](https://github.com/openguardrails/openguardrails) [#openguardrails] **Learn from** · by OpenGuardrails · pairs with [promptshield](/docs/promptshield) A useful reference for the proxy-gateway enforcement topology, applying safety at a network choke point rather than in app code, which is one way to deploy promptshield-style screening. It is a general LLM-security gateway rather than a CSAM tool, so we take the deployment pattern as the lesson. > A security gateway that fronts OpenAI-compatible APIs as a reverse proxy, applying safety protections to traffic passing through it. ### [OSmod (Moderator)](https://github.com/conversationai/conversationai-moderator) [#osmod-moderator] **Learn from** · by Jigsaw · pairs with [csam-shield](/docs/csam-shield) A reference design for routing model scores into a human-review queue, which is the orchestration problem csam-shield solves. It is general comment moderation rather than CSAM-specific, so we treat it as an architectural lesson for the detect-then-review loop, not a drop-in detector. > An open-source moderation toolkit combining ML models, APIs, and a review UI to help platforms triage and act on user comments at scale. ### [Perspective API](https://github.com/conversationai/perspectiveapi) [#perspective-api] **Learn from** · by Jigsaw The canonical example of an attribute-scoring moderation API, instructive for how we expose detector confidence scores. It is general toxicity and a closed hosted service, so we note it as general-purpose rather than a CSAM detector a FightCSAM user calls. > A hosted ML API that scores text for attributes like toxicity, insult, and threat to help platforms moderate conversations. ### [Private Detector](https://github.com/bumble-tech/private-detector) [#private-detector] **Learn from** · by Bumble A strong, production-proven example of open-sourcing a lewd-image detector, useful for how we structure and document an image backend. It targets adult lewd content rather than CSAM, so we flag it as adjacent and general-purpose, not a FightCSAM detection component. > A pretrained, open-sourced model from Bumble for detecting lewd (unsolicited nude) images. ### [RoGuard](https://github.com/Roblox/RoGuard-1.0) [#roguard] **Learn from** · by Roblox An instructive open peer to the guardrails we wrap, showing how a large platform tunes a general output-safety model to its own policy. It is general-harm and Roblox-shaped rather than CSAM-specific, so we treat it as an architectural reference rather than a drop-in backend. > An open LLM-safeguard model from Roblox for moderating text generation against a platform safety policy. ### [Sentinel](https://github.com/Roblox/Sentinel) [#sentinel] **Learn from** · by Roblox This is the gap we most want to emulate: behavioral, conversation-level grooming detection sits upstream of the image/text hashing FightCSAM covers and catches abuse before any media exists. A model of how to surface rare harmful patterns from sparse signal, and a clear direction for where CSAM-safety tooling should grow. > An open-source system from Roblox that uses contrastive learning to flag rare, hard-to-spot text classes such as grooming and other harmful behavioral patterns in real time. ### [Toxic Prompt RoBERTa](https://huggingface.co/Intel/toxic-prompt-roberta) [#toxic-prompt-roberta] **Learn from** · by Intel · pairs with [promptshield](/docs/promptshield) The same place in the stack as promptshield, screening prompts before they reach a model, which makes its packaging and latency profile directly instructive. It classifies general toxicity rather than CSAM intent, so we learn from the prompt-screening approach while noting the label set is general-purpose. > A RoBERTa-based classifier from Intel that detects toxic prompts and responses in LLM interactions. ### [Voice Safety Classifier](https://github.com/Roblox/voice-safety-classifier) [#voice-safety-classifier] **Learn from** · by Roblox Voice is a modality FightCSAM does not cover, and grooming-adjacent harm in real-time audio is a real vector worth learning from. We treat it as an architectural lesson for real-time, modality-specific detection rather than a CSAM tool a user wires in today. > An open-source ML model from Roblox that classifies harmful content in real-time voice chat. ### [Purple Llama](https://github.com/meta-llama/PurpleLlama) [#purple-llama] **Reference** · by Meta The parent collection rather than a single component: the directly wrappable pieces, Llama Guard and Llama Prompt Guard 2, are catalogued on their own. We point here as the canonical home and orientation for Meta's open safety stack. > Meta's umbrella project of tools to assess and improve LLM security, bundling Llama Guard, the CyberSec Eval benchmarks, and Code Shield. ### [Risk Atlas Nexus](https://github.com/IBM/risk-atlas-nexus) [#risk-atlas-nexus] **Reference** · by IBM Research Governance and taxonomy tooling rather than a runtime detector: it maps risks to controls instead of classifying content. Useful as a reference when authoring the policies that drive promptshield or CoPE and when arguing coverage to compliance, but nothing a pipeline calls at inference time. > A knowledge-graph toolkit from IBM that links AI risk taxonomies to evaluations, mitigations, and controls so teams can reason about coverage across frameworks. ### [NSFW Filtering](https://github.com/nsfw-filter/nsfw-filter) [#nsfw-filtering] **Out of scope** · by nsfw-filter This is an end-user safety extension, not developer infrastructure: it protects the viewer at the browser, with no API or pipeline integration point. It sits outside the platform-side CSAM-detection problem FightCSAM addresses. > A browser extension that blurs or blocks NSFW images in the browser for the person using it. # Datasets & benchmarks (/docs/ecosystem/datasets) Training and evaluation datasets. We anchor promptshield’s evaluation to NVIDIA Aegis 2.0 and borrow Tattle / Uli annotation methodology; the rest are listed for reference. **33 projects** — 33 reference. *Project descriptions are adapted from [awesome-safety-tools](https://github.com/roostorg/awesome-safety-tools) (maintained by [ROOST](https://roost.tools)); the verdicts and analysis are ours. Snapshot: June 2026 — a point-in-time view that complements, and does not replace, their living list.* ### [Aegis Content Safety 2.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) [#aegis-content-safety-20] **Reference** · by NVIDIA · pairs with [promptshield](/docs/promptshield) Our eval anchor for PromptShield: CC-BY-4.0 licensing plus a "Sexual (minor)" subset make it usable for CSAM-intent benchmarking. > Content-moderation and toxicity dataset spanning a broad LLM-safety taxonomy, including a dedicated "Sexual (minor)" category. Released under CC-BY-4.0. ### [AI Alignment (RLHF)](https://atlas.nomic.ai/map/anthropic_rlhf) [#ai-alignment-rlhf] **Reference** · by Anthropic Foundational helpful/harmless preference data for alignment research. > RLHF alignment data, explorable as a Nomic Atlas map. ### [AILuminate](https://github.com/mlcommons/ailuminate) [#ailuminate] **Reference** · by MLCommons Standardized industry safety benchmark across many harm categories. > Human-created prompts spanning a standardized set of harm categories. ### [ALERT](https://huggingface.co/datasets/Babelscape/ALERT) [#alert] **Reference** · by Babelscape Pairs standard and adversarial variants for safety stress-testing. > Standard and adversarial red-team prompts organized by a safety taxonomy. ### [Aya Red-teaming](https://huggingface.co/datasets/CohereForAI/aya_redteaming) [#aya-red-teaming] **Reference** · by Cohere Good for multilingual red-team coverage beyond English. > Multilingual red-team prompts for probing model safety across languages. ### [badwords](https://github.com/hughsie/badwords) [#badwords] **Reference** · by Richard Hughes Quick multilingual profanity seed list, not a substitute for a real classifier. > Bad-word lists compiled across multiple locales. ### [CCP Sensitive Prompts](https://huggingface.co/datasets/promptfoo/CCP-sensitive-prompts) [#ccp-sensitive-prompts] **Reference** · by Promptfoo Niche set for probing political censorship behavior in models. > Prompts on topics sensitive to the Chinese Communist Party. ### [DarkBench](https://huggingface.co/datasets/apart/darkbench) [#darkbench] **Reference** · by Apart Useful for evaluating manipulative / dark-pattern model tendencies. > Benchmark for detecting dark design patterns in LLM behavior. ### [DEFCON Red Teaming](https://github.com/humane-intelligence/ai_village_defcon_grt_data) [#defcon-red-teaming] **Reference** · by Humane Intelligence Real crowd-sourced red-team attempts from a large public event. > Data from the DEF CON AI Village generative red-teaming event. ### [Do Not Answer](https://huggingface.co/datasets/LibrAI/do-not-answer) [#do-not-answer] **Reference** · by LibrAI Targeted refusal-behavior eval for questions a model should decline. > Questions designed to test whether a model correctly refuses. ### [Forbidden Questions](https://huggingface.co/datasets/TrustAIRLab/forbidden_question_set) [#forbidden-questions] **Reference** · by TrustAIRLab Policy-grounded prompts for testing disallowed-content refusals. > Questions derived from categories in the OpenAI usage policy. ### [HackAPrompt](https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset) [#hackaprompt] **Reference** · by HackAPrompt Big corpus of human-crafted prompt-injection attacks. > Large dataset of prompt-injection and jailbreaking attempts. ### [HarmBench](https://github.com/centerforaisafety/HarmBench) [#harmbench] **Reference** · by CAIS Standard harness for automated red-teaming comparisons. > Standardized evaluation dataset for automated red-teaming. ### [HiroKachi Jailbreak](https://sizu.me/love) [#hirokachi-jailbreak] **Reference** Community-sourced jailbreak collection; treat provenance with caution. > Collection of adversarial prompt-attack examples. ### [Jailbreak Prompt Generator](https://huggingface.co/tsq2000/Jailbreak-generator) [#jailbreak-prompt-generator] **Reference** Generator rather than a fixed set, for synthesizing attack prompts at scale. > A model that generates jailbreak prompts. ### [JailbreakBench](https://huggingface.co/datasets/JailbreakBench/JBB-Behaviors) [#jailbreakbench] **Reference** · by JailbreakBench Standard behaviors set for benchmarking jailbreak defenses. > Curated harmful behaviors for evaluating jailbreak robustness. ### [JailbreakHub](https://huggingface.co/datasets/walledai/JailbreakHub) [#jailbreakhub] **Reference** · by WalledAI Prompt+response pairs, handy for studying what jailbreaks actually elicit. > Jailbreak prompts paired with model responses. ### [LLM-LAT harmful](https://huggingface.co/datasets/LLM-LAT/harmful-dataset) [#llm-lat-harmful] **Reference** · by LLM-LAT General harmful-behavior probe set, often used in latent adversarial training. > Prompts for assessing harmful model behaviors. ### [MedSafetyBench](https://github.com/AI4LIFE-GROUP/med-safety-bench) [#medsafetybench] **Reference** · by AI4LIFE-GROUP Domain-specific eval for medical-safety failure modes. > Medical-safety prompts for evaluating models in clinical contexts. ### [Multilingual Vulnerability](https://github.com/CarsonDon/Multilingual-Vuln-LLMs) [#multilingual-vulnerability] **Reference** Probes safety gaps that appear in non-English languages. > Multilingual prompts that surface LLM vulnerabilities. ### [PKU-SafeRLHF](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF) [#pku-saferlhf] **Reference** · by PKU-Alignment Good for safety-preference / RLHF training signal on response harmfulness. > Prompts paired with RLHF safety markers identifying unsafe responses. ### [Red Team Resistance Leaderboard](https://huggingface.co/spaces/HaizeLabs/red-teaming-resistance-benchmark) [#red-team-resistance-leaderboard] **Reference** · by Haize Labs Comparative ranking of model attack-resistance rather than a raw dataset. > Leaderboard ranking models by their resistance to attacks. ### [Rentry Jailbreak](https://rentry.org/gpt0721) [#rentry-jailbreak] **Reference** Informal community jailbreak dump; unversioned, verify before use. > A collected set of jailbreak prompts. ### [SidFeel Jailbreak](https://github.com/sidfeels/PromptsDB) [#sidfeel-jailbreak] **Reference** Another community jailbreak prompt collection for attack coverage. > A collection of jailbreak prompts. ### [SorryBench](https://huggingface.co/datasets/sorry-bench/sorry-bench-202503) [#sorrybench] **Reference** · by SorryBench Tests refusal robustness under paraphrase and linguistic mutation. > Adversarial prompts augmented with linguistic mutations. ### [SOSBench](https://huggingface.co/datasets/SOSBench/SOSBench) [#sosbench] **Reference** · by SOSBench Good for evaluating dangerous scientific-capability refusals (e.g. chem/bio). > Regulation-grounded hazard benchmark spanning six scientific domains. ### [TDC23-RedTeaming](https://huggingface.co/datasets/walledai/TDC23-RedTeaming) [#tdc23-redteaming] **Reference** · by WalledAI Competition-derived red-team prompts for benchmark continuity. > Prompts from the TDC23 red-teaming track. ### [Toxic Chat](https://huggingface.co/datasets/lmsys/toxic-chat) [#toxic-chat] **Reference** · by LMSYS Realistic in-the-wild toxicity from live chat, good for conversational moderation eval. > Toxic conversations drawn from real user interactions with Vicuna. ### [Toxicity](https://huggingface.co/datasets/google/jigsaw_toxicity_pred) [#toxicity] **Reference** · by Jigsaw License-clean (CC0) baseline for generic toxicity classification. > Wikipedia comments labeled for toxicity. Released under CC0. ### [Transphobia Awareness](https://doi.org/10.5281/zenodo.15482694) [#transphobia-awareness] **Reference** Targeted set for evaluating anti-trans hate detection. > Transphobia-related queries with annotations. ### [Uli Dataset](https://github.com/tattle-made/uli_dataset) [#uli-dataset] **Reference** · by Tattle · pairs with [detectkit-test](/docs/detectkit-test) Its expert per-annotator methodology informs how we structure our CSAM-intent eval. > Gendered-abuse dataset built with an expert, per-annotator labeling methodology. ### [VTC](https://github.com/unitaryai/VTC) [#vtc] **Reference** · by Unitary AI Reference for multimodal (video + comment) toxicity work. > Video-text-comments dataset and method for multimodal moderation. ### [XSTest](https://github.com/paul-rottger/exaggerated-safety) [#xstest] **Reference** Catches over-refusal: safe prompts a model wrongly declines. > Prompts testing exaggerated-safety (over-refusal) behaviors. # Decentralized & Fediverse (/docs/ecosystem/decentralized-fediverse) AT-Protocol and Fediverse moderation — FightCSAM’s #1 target. Our planned Bluesky adapter fills the perceptual-hash gap in hepa and emits to Ozone. **5 projects** — 2 use · 1 learn from · 1 reference · 1 out of scope. *Project descriptions are adapted from [awesome-safety-tools](https://github.com/roostorg/awesome-safety-tools) (maintained by [ROOST](https://roost.tools)); the verdicts and analysis are ours. Snapshot: June 2026 — a point-in-time view that complements, and does not replace, their living list.* ### [Automod (hepa)](https://github.com/bluesky-social/indigo/tree/main/automod) [#automod-hepa] **Use** · by Bluesky Automod hands rules the raw media bytes but ships no perceptual-hash hook, so image-similarity matching is exactly the gap left open. FightCSAM's planned AT-Proto adapter slots in here as a hepa blob rule wrapping hashkit + hashkit-match, then emits to Ozone. > A 'rules engine' framework that augments human moderators on the AT Protocol network by proactively identifying patterns of behavior and content. It processes firehose events (new posts, handle changes) via the hepa service daemon, maintains metadata caches and counters, and can fire outcomes like account reports and content labels. ### [Ozone](https://github.com/bluesky-social/ozone) [#ozone] **Use** · by Bluesky Ozone is the natural sink for our AT-Proto adapter: the hepa rule emits labels and reports straight into its queue. We also plan a safemod skin for its reviewer pane so hash-match context lands in front of human reviewers. > A self-hostable web interface for labeling and moderating content on AT Protocol / Bluesky. Moderators triage, escalate, and action reports; apply labels and takedowns to content and accounts; review profiles and post threads (including some removed content) in a reviewer pane; and send templated moderation emails. ### [FIRES](https://github.com/fedimod/fires) [#fires] **Learn from** · by FediMod FIRES is the model for advisory-style distribution we want to interoperate with: we plan a FIRES-compatible output for hashstream so Fediverse admins can subscribe to our recommendations and decide for themselves, rather than receiving forced blocks. > A protocol and reference server (Fediverse Intelligence Replication Endpoint Server) for exchanging moderation advisories and recommendations across the Fediverse. Trust & safety teams publish research-backed recommendations that client servers pull and periodically refresh; it is explicitly not designed for creating denylists, leaving final decisions to each moderator. ### [FediCheck](https://connect.iftas.org/library/iftas-documentation/fedicheck/) [#fedicheck] **Reference** · by IFTAS FediCheck operates at the domain/instance layer rather than per-media, so it sits adjacent to our hash-matching work. It's a useful reference for how Fediverse admins consume shared trust-and-safety lists, and a complement to the FIRES-style advisory output we plan. > A Moderation-as-a-Service tool for ActivityPub providers (e.g. Mastodon) that synchronizes a server's domain-level denylist with curated upstream lists such as IFTAS's CARIAD, sparing admins from manually researching and blocking problem domains. After IFTAS wound down operations, it moved toward being open-sourced so anyone can run the service against their own upstream providers. ### [Fediverse Spam Filtering](https://github.com/MarcT0K/Fediverse-Spam-Filtering) [#fediverse-spam-filtering] **Out of scope** · by Marc Damie This targets text-spam classification, a different problem from the perceptual-hash media matching FightCSAM focuses on. It's an interesting PoC for Fediverse moderation tooling, but out of scope for our AT-Proto adapter. > A proof-of-concept spam filter for Fediverse platforms (e.g. Mastodon) using a Naive Bayes classifier over status features like content words, spoiler text, media attachments, tags, and sensitivity flags. It exposes REST endpoints for prediction, outlier review, and model import/export, and minimizes admin workload by surfacing outliers and random samples for labeling. # Overview (/docs/ecosystem) A developer’s map of the open-source online-safety landscape — not just *what exists*, but **how each project fits (or doesn’t) a CSAM-safety pipeline** built around the FightCSAM tools. It covers **113 projects** across 9 categories. For each, we give a verdict and a short take. Our own tools live in [Tools](/docs); this section is everything *around* them. ## How to read the verdicts [#how-to-read-the-verdicts] * **Use** — recommend integrating alongside FightCSAM (24). * **Learn from** — a leader or alternative on an axis we also build (35). * **Reference** — a dataset, benchmark, or knowledge resource (41). * **Out of scope** — an adjacent problem FightCSAM deliberately does not address (13). ## Categories [#categories] | Category | Projects | | ------------------------------------------------------------------------ | -------- | | [Perceptual hashing & matching](/docs/ecosystem/perceptual-hashing) | 11 | | [Classifiers & AI-safety models](/docs/ecosystem/classifiers) | 22 | | [Rules, decisioning & clustering](/docs/ecosystem/rules-decisioning) | 9 | | [Infrastructure, queues & review](/docs/ecosystem/infrastructure-review) | 9 | | [Red-teaming & evaluation](/docs/ecosystem/red-teaming) | 8 | | [Privacy & user-safety](/docs/ecosystem/privacy-user-safety) | 6 | | [Investigation & signal-sharing](/docs/ecosystem/investigation) | 10 | | [Decentralized & Fediverse](/docs/ecosystem/decentralized-fediverse) | 5 | | [Datasets & benchmarks](/docs/ecosystem/datasets) | 33 | ## Credit & scope [#credit--scope] This directory is built on **[awesome-safety-tools](https://github.com/roostorg/awesome-safety-tools)**, the community-maintained list curated by [ROOST](https://roost.tools). They maintain the canonical, living catalogue; we add an opinionated layer on top — categorization, a build-vs-wrap verdict, and how each piece slots into a defensible CSAM-detection, reporting, and prevention pipeline. Like their list, **inclusion here is not an endorsement** — it is an attempt to map the landscape so a developer can choose well. Verdicts reflect FightCSAM’s specific lens (un-gated, self-hostable CSAM safety) and are a **June 2026 snapshot**; projects move fast, so treat this as a starting point and check the source. *Project descriptions are adapted from [awesome-safety-tools](https://github.com/roostorg/awesome-safety-tools) (maintained by [ROOST](https://roost.tools)); the verdicts and analysis are ours. Snapshot: June 2026 — a point-in-time view that complements, and does not replace, their living list.* # Infrastructure, queues & review (/docs/ecosystem/infrastructure-review) Queues, abuse-management plumbing, and moderator review surfaces to depend on rather than rebuild. **9 projects** — 3 use · 4 learn from · 2 out of scope. *Project descriptions are adapted from [awesome-safety-tools](https://github.com/roostorg/awesome-safety-tools) (maintained by [ROOST](https://roost.tools)); the verdicts and analysis are ours. Snapshot: June 2026 — a point-in-time view that complements, and does not replace, their living list.* ### [BullMQ](https://github.com/taskforcesh/bullmq) [#bullmq] **Use** · by Taskforce.sh · pairs with [csam-shield](/docs/csam-shield) A battle-tested job queue of exactly the kind we wrap rather than build — durable retries and rate limiting are precisely what a hash-then-act pipeline needs under load. A FightCSAM user scaling CSAM-Shield's processing should reach for BullMQ (or similar) as the execution substrate beneath the orchestration logic. > A Redis-based message queue and batch-processing system for Node and Python, providing durable jobs, retries, rate limiting, and concurrency for background work. ### [Content Review Filters](https://github.com/facebook/content-review-filters) [#content-review-filters] **Use** · by Meta · pairs with [safemod](/docs/safemod) The validated reviewer-wellness render patterns, shipped as adoptable React components — the exact client-side mutation layer SafeMod needs. A FightCSAM user deploying SafeMod gets these as its render surface, with SafeMod adding the exposure caps, rotation, and k-anonymized wellbeing tracking the components alone do not provide. > A TypeScript/React component library that lets moderation tools apply protective filters — blur, grayscale, reduced-detail stylization, auto-mute, and opt-in interstitials — to graphic images and video so reviewers can manage exposure. ### [RabbitMQ](https://github.com/rabbitmq) [#rabbitmq] **Use** · by RabbitMQ / VMware · pairs with [csam-shield](/docs/csam-shield) Core messaging infrastructure of the kind we wrap, never reimplement — the broker that decouples detection, matching, and actioning stages. A FightCSAM user running CSAM-Shield across services can use RabbitMQ as the transport between pipeline stages rather than building bespoke queueing. > A widely deployed open-source message broker for queue-based communication between application components, supporting AMQP and other protocols with flexible routing and delivery guarantees. ### [AbuseIO](https://github.com/AbuseIO/AbuseIO) [#abuseio] **Learn from** · by AbuseIO · pairs with [cybertip-cli](/docs/cybertip-cli) A mature inbound-abuse-complaint workflow — parse, deduplicate, ticket, and notify — adjacent to the outbound statutory reporting CyberTip CLI handles. A FightCSAM user building a complaint-intake or case-tracking surface around cybertip-cli can study its ticketing and correspondence model rather than the report-filing path itself. > An open-source toolkit to receive, process, and respond to abuse reports, correlating incoming complaints into tickets and notifying the responsible parties. ### [Mjolnir](https://github.com/matrix-org/mjolnir) [#mjolnir] **Learn from** · by Matrix.org · pairs with [csam-shield](/docs/csam-shield) A production policy-enforcement and actioning engine for one federated protocol, turning shared rule lists into bans, redactions, and ACL changes. A FightCSAM user wiring CSAM-Shield's detect-then-act loop into a Matrix or chat platform can mine its policy-list distribution and automated-actioning patterns. > An all-in-one moderation bot for Matrix that enforces content policies through shareable ban lists, redactions, anti-spam, server ACLs, and room shutdowns across rooms and homeservers. ### [NCMEC Reporting](https://github.com/ello/ncmec_reporting) [#ncmec-reporting] **Learn from** · by ello · pairs with [cybertip-cli](/docs/cybertip-cli) The sole open-source predecessor to a maintained CyberTipline filer, but abandonware and unlicensed, so it cannot be adopted — only read. A FightCSAM user should treat it as the protocol-and-retention reference behind cybertip-cli and EvidenceVault, while cybertip-cli is the supported, audited successor that closes the gap it left. > A Ruby client for filing reports to NCMEC's CyberTipline, implementing the ESP submission workflow — the only prior open-source CyberTipline client. ### [Owlculus](https://github.com/be0vlk/owlculus) [#owlculus] **Learn from** · by be0vlk · pairs with [evidencevault](/docs/evidencevault) A working investigator-facing case-management UI — cases, entities, evidence, and collaboration — the surface EvidenceVault deliberately lacks above its custody engine. A FightCSAM user who needs a review or case-workflow layer over EvidenceVault's sealed packages can study its case and evidence-organization model directly. > An open-source OSINT toolkit and case-management platform (Python/FastAPI backend, Vue frontend) with multi-user cases, role-based permissions, entity and evidence organization, and cross-case correlation. ### [Access](https://github.com/discord/access) [#access] **Out of scope** · by Discord A general internal access-request and entitlement-governance portal, not a content-safety surface. It overlaps only loosely with the dual-control, audited access grants EvidenceVault enforces over sealed evidence, and solves a different problem at the org-IT layer. > A centralized portal for managing access to internal systems, letting employees request and owners grant time-bound, audited entitlements across an organization's tools. ### [Open Truss](https://github.com/open-truss/open-truss) [#open-truss] **Out of scope** · by GitHub A general-purpose no-code internal-tooling framework with no content-safety specialization. It is a way to build admin UIs in the abstract, not a moderation, detection, or evidence primitive, so it sits outside the FightCSAM stack. > A React framework for building internal tools through YAML configuration rather than code, letting developers expose display components and data sources that non-developers assemble into tools without redeploying. # Investigation & signal-sharing (/docs/ecosystem/investigation) Threat-signal sharing and investigation tooling. Meta ThreatExchange / python-threatexchange set the bar for hashstream; disinformation and platform-observability work is deliberately out of our scope. **10 projects** — 3 learn from · 1 reference · 6 out of scope. *Project descriptions are adapted from [awesome-safety-tools](https://github.com/roostorg/awesome-safety-tools) (maintained by [ROOST](https://roost.tools)); the verdicts and analysis are ours. Snapshot: June 2026 — a point-in-time view that complements, and does not replace, their living list.* ### [Feluda](https://github.com/tattle-made/feluda) [#feluda] **Learn from** · by Tattle · pairs with [csam-shield](/docs/csam-shield) Feluda's operator pattern — letting an operator escalate from cheap hash-based matching to heavier ML models depending on budget and need — is exactly the tiered-analysis design we want to inform csam-shield. Strong architecture worth borrowing from. > A configurable engine for analyzing multi-lingual, multi-modal content (text, images, video), built around a modular operator pattern that lets you swap analysis techniques. ### [python-threatexchange](https://github.com/facebook/ThreatExchange/tree/main/python-threatexchange) [#python-threatexchange] **Learn from** · by Meta · pairs with [hashstream](/docs/hashstream) Its SignalExchangeAPI — a checkpoint-able, source-agnostic interface for pulling signals from NCMEC, StopNCII, TCAP and beyond — is the bar we hold hashstream to. We intend to ship a SignalExchangeAPI plugin so its existing users can adopt hashstream with minimal friction. > A Python library and CLI for media hash exchange, built on a pluggable SignalExchangeAPI with fetchers for NCMEC, StopNCII, Tech Against Terrorism (TCAP), and Meta's own ThreatExchange. ### [ThreatExchange](https://github.com/facebook/ThreatExchange) [#threatexchange] **Learn from** · by Meta · pairs with [hashstream](/docs/hashstream) The reference signal-sharing platform for trust & safety, and the model we measure hashstream against. We plan to interoperate so its participants can adopt us rather than choose between the two. > Meta's platform and toolset for privacy-compliant sharing of threat and safety-harm signals between organizations, spanning REST APIs and open-source hashing/matching projects. ### [ThreatExchange PHP client](https://github.com/certly/threatexchange) [#threatexchange-php-client] **Reference** · by Certly · pairs with [hashstream](/docs/hashstream) A handy illustration of what a thin language-specific ThreatExchange client looks like, but it's effectively frozen (last release 2016). Useful as a reference point, not a dependency. > A PHP client library for authenticating against and querying Meta's ThreatExchange API. ### [CIB Mango Tree](https://github.com/CIB-Mango-Tree/CIB-Mango-Tree-Website) [#cib-mango-tree] **Out of scope** · by CIB Mango Tree Solid tooling for the disinformation and influence-operations research community, but coordinated inauthentic behavior is deliberately outside FightCSAM's remit, which centers on CSAM detection, reporting, and prevention. Different problem, different field. > A collection of tools to help researchers detect and analyze coordinated inauthentic behavior (CIB) — organized, deceptive activity on social platforms. ### [Crossover](https://crossover.social/) [#crossover] **Out of scope** · by Crossover Valuable work on algorithmic transparency and recommender-system observability, but that disinformation and election-integrity focus sits outside FightCSAM's CSAM-centric scope. We point to it respectfully rather than fold it in. > Dashboards that monitor social-network recommendation algorithms, surfacing how disinformation and election-related content is amplified. ### [DAU Dashboard](https://github.com/tattle-made/dau-dashboard) [#dau-dashboard] **Out of scope** · by Tattle A useful collaborative investigation surface, but its deepfake and synthetic-media analysis focus is a different problem from FightCSAM's CSAM detection and reporting mission. Worthwhile in its own lane. > A collaborative space for the Deepfake Analysis Unit — an Elixir/Phoenix web app where teams jointly examine and investigate suspected deepfake media. ### [Interference](https://github.com/DFRLab/interference2024) [#interference] **Out of scope** · by DFRLab Important election-integrity and foreign-influence research, but tracking interference campaigns falls outside FightCSAM's CSAM scope. We credit the work without treating it as part of our toolchain. > A database tracking alleged instances of foreign interference in the 2024 US election. ### [OpenMeasures](https://gitlab.com/openmeasures) [#openmeasures] **Out of scope** · by OpenMeasures A capable platform-observability and trend-investigation toolkit, but its disinformation and harmful-narrative focus is deliberately outside FightCSAM's CSAM-centric mission. Valuable work on an adjacent problem. > A platform for investigating internet trends, harmful narratives, and emerging online movements across fringe and mainstream platforms. ### [TikTok Observatory](https://github.com/aiforensics/tkobservatory) [#tiktok-observatory] **Out of scope** · by AI Forensics Sharp work on recommender-system accountability and platform observability, but algorithmic promotion/demotion on TikTok is outside FightCSAM's CSAM detection and reporting scope. We acknowledge it as strong work on a different question. > Tooling that monitors TikTok's recommendation algorithm, measuring which content the platform promotes or demotes. # Perceptual hashing & matching (/docs/ecosystem/perceptual-hashing) Image and video perceptual hashing and matching — the axis hashkit, hashkit-match, and csam-shield build on. Meta’s PDQ / TMK / vPDQ are upstream of our hashing and our conformance source; we never claim to beat them. **11 projects** — 10 learn from · 1 reference. *Project descriptions are adapted from [awesome-safety-tools](https://github.com/roostorg/awesome-safety-tools) (maintained by [ROOST](https://roost.tools)); the verdicts and analysis are ours. Snapshot: June 2026 — a point-in-time view that complements, and does not replace, their living list.* ### [Altitude](https://github.com/jigsaw-code/altitude) [#altitude] **Learn from** · by Jigsaw (Google) · pairs with [hashkit-match](/docs/hashkit-match) A self-contained reviewer-facing front end over a hash matcher — the same hash-then-action loop a CSAM pipeline runs, just aimed at a different harm vertical. FightCSAM users building a moderation queue on top of hashkit-match can study its review-UI and actioning patterns directly. > A free, open-source web UI and hash-matching tool that helps platforms find and review violent-extremism and terrorism content, drawing on shared industry hash sets. ### [Hasher-Matcher-Actioner (HMA)](https://github.com/facebook/ThreatExchange/tree/main/hasher-matcher-actioner) [#hasher-matcher-actioner-hma] **Learn from** · by Meta · pairs with [csam-shield](/docs/csam-shield) The reference end-to-end hash -> match -> act service, and the closest external analogue to CSAM-Shield's orchestration role. A FightCSAM user choosing between adopting HMA wholesale or wiring CSAM-Shield's unified middleware should read this as the leading alternative on the orchestration axis. > A turnkey service that combines a hashing algorithm, a matching function, and the ability to hook into actions, so platforms can stand up content matching end to end. ### [hma-matrix](https://github.com/matrix-org/hma-matrix) [#hma-matrix] **Learn from** · by Matrix.org · pairs with [csam-shield](/docs/csam-shield) A real-world example of fitting a generic HMA matcher to one platform's data model and moderation flow. A FightCSAM user integrating CSAM-Shield into a federated or chat platform can mine it for protocol-binding patterns. > Matrix-specific extensions to Meta's HMA, adapting the hasher-matcher-actioner for the Matrix messaging ecosystem. ### [Lattice Extract](https://github.com/adobe/lattice_extract) [#lattice-extract] **Learn from** · by Adobe · pairs with [hashkit-match](/docs/hashkit-match) A precision aid that addresses a known perceptual-hash failure mode — collages and grid layouts that collide spuriously. A FightCSAM user tuning hashkit-match thresholds could pair it as a pre-filter to cut false positives before review. > Grid and lattice detection intended to guard against false positives in hash matching by identifying repeating structured patterns in images. ### [MediaModeration](https://github.com/wikimedia/mediawiki-extensions-MediaModeration) [#mediamoderation] **Learn from** · by Wikimedia · pairs with [csam-shield](/docs/csam-shield) A production CSAM hash-matching integration built into a specific CMS, with the credentialed-list and review plumbing that implies. A FightCSAM user adding CSAM-Shield to a content platform can study how it sequences upload, match, and escalation. > A MediaWiki extension that performs CSAM hash matching on uploaded media for Wikimedia projects. ### [PDQ](https://github.com/facebook/ThreatExchange/tree/main/pdq) [#pdq] **Learn from** · by Meta · pairs with [hashkit](/docs/hashkit) The upstream image-hashing standard that the NCMEC Hash Sharing API accepts and that HashKit implements and conforms to — it is our source of truth, not a competitor. A FightCSAM user should treat PDQ as the canonical algorithm and HashKit as the byte-identical, multi-runtime way to run it. > An open-source perceptual hash algorithm for images, producing a compact fingerprint whose Hamming distance approximates visual similarity. ### [Perception](https://github.com/thorn-oss/perception) [#perception] **Learn from** · by Thorn · pairs with [hashkit](/docs/hashkit) A mature multi-algorithm hashing toolkit covering much of the same ground as HashKit's core, with strong benchmarking and deployment utilities. A FightCSAM user working in Python may reach for Perception's breadth, while HashKit's draw is cross-runtime byte-identical output and NCMEC-verified vectors. > A Python library offering a common wrapper around popular perceptual hashes (such as ImageHash), with tooling to benchmark and deploy them. ### [RocketChat CSAM](https://github.com/prostasia/rocketchatcsam) [#rocketchat-csam] **Learn from** · by Prostasia Foundation · pairs with [csam-shield](/docs/csam-shield) A concrete CSAM hash-matching plug-in for one chat platform, showing what it takes to retrofit detection into existing self-hosted software. A FightCSAM user running Rocket.Chat-style infrastructure can compare it against wiring CSAM-Shield's middleware directly. > An integration that adds CSAM hash matching to Rocket.Chat deployments. ### [TMK+PDQF](https://github.com/facebook/ThreatExchange/tree/main/tmk) [#tmkpdqf] **Learn from** · by Meta · pairs with [hashkit](/docs/hashkit) The upstream video-hashing standard accepted by the NCMEC Hash Sharing API, and one of the algorithms HashKit implements and conforms to — our reference, never something we claim to beat. A FightCSAM user needs TMK+PDQF for known-video matching; HashKit's role is producing identical hashes across every runtime. > An open-source algorithm for visual-similarity matching of videos (TMK+PDQF), producing a fixed-length video fingerprint. ### [vPDQ](https://github.com/facebook/ThreatExchange/tree/main/vpdq) [#vpdq] **Learn from** · by Meta · pairs with [hashkit](/docs/hashkit) An upstream PDQ-based approach to video matching that complements TMK, and part of the Meta conformance lineage HashKit follows rather than competes with. A FightCSAM user choosing a video-hashing strategy should evaluate vPDQ vs TMK+PDQF on their own merits, with HashKit supplying portable, verified implementations. > An open-source method for visual-similarity matching of videos that applies the PDQ image algorithm frame by frame. ### [HMA-CLIP demo](https://github.com/juanmrad/HMA-CLIP-demo) [#hma-clip-demo] **Reference** · by juanmrad · pairs with [csam-shield](/docs/csam-shield) A worked example of bolting a new signal type (CLIP embeddings) onto an HMA-style pipeline. Useful to a FightCSAM user weighing how to extend CSAM-Shield beyond perceptual hashing toward classifier or embedding signals. > A demo extending HMA with CLIP embeddings, offered as a reference for how to add new signal/format extensions to the Hasher-Matcher-Actioner. # Privacy & user-safety (/docs/ecosystem/privacy-user-safety) PII detection and end-user / community-governance tooling. We wrap Presidio for PII in trainguard; most user-safety tools sit adjacent to a CSAM-detection pipeline. **6 projects** — 1 use · 2 learn from · 3 out of scope. *Project descriptions are adapted from [awesome-safety-tools](https://github.com/roostorg/awesome-safety-tools) (maintained by [ROOST](https://roost.tools)); the verdicts and analysis are ours. Snapshot: June 2026 — a point-in-time view that complements, and does not replace, their living list.* ### [Presidio](https://github.com/microsoft/presidio) [#presidio] **Use** · by Microsoft · pairs with [trainguard](/docs/trainguard) This is the PII engine we bolt onto trainguard: rather than build our own detectors, we wrap Presidio to answer 'is this training dataset safe to use' by flagging and redacting names, faces, and identifiers before data ever reaches a model. > Open-source framework for detecting and anonymizing personally identifiable information (PII) and sensitive data in text, images, and structured data, using NLP, pattern matching, and customizable recognizers. ### [SquadBox](https://github.com/amyxzhang/squadbox) [#squadbox] **Learn from** · by UW Social Futures Lab Aimed at end-user harassment rather than platform CSAM, so it's adjacent rather than core — but its friend-sourced, human-in-the-loop moderation model is a concept worth borrowing for how trusted reviewers triage flagged content. > Tool that lets someone facing online harassment invite trusted friends to act as moderators for their inbox, reviewing and filtering incoming messages on the owner's behalf according to their preferences. ### [Uli](https://github.com/tattle-made/Uli) [#uli] **Learn from** · by Tattle Built for gender-based-violence response, not CSAM, so it's out of trainguard's direct lane — but its crowdsourced abuse-detection lists and evidence-archiving approach are concretely relevant to building content classifiers and preserving reportable evidence. > Browser plugin and accompanying resources for mitigating online gender-based violence in India, helping people of marginalized genders detect abusive content, archive evidence, and respond collectively to abuse on social media. ### [Fawkes](https://github.com/Shawn-Shan/fawkes) [#fawkes] **Out of scope** · by Shawn Shan (SANDLab, University of Chicago) Genuinely clever adversarial-ML work, but it protects an individual's face from recognition AIs rather than screening a platform's content for CSAM, so it sits outside trainguard's pipeline. Worth watching as a data-poisoning technique that could show up in datasets we screen. > Image-cloaking system that adds imperceptible pixel-level perturbations to photos so unauthorized facial-recognition models (e.g. Clearview-style scrapers) can't reliably identify the person, intended for personal privacy protection and academic research. ### [Frankly](https://github.com/berkmancenter/frankly/) [#frankly] **Out of scope** · by Applied Social Media Lab (Berkman Klein Center) A deliberation and community-dialogue tool, not a detection or screening system — adjacent to online safety through healthier conversation design, but it doesn't touch the CSAM-detection problem trainguard addresses. > Open-source platform for hosting video-enabled deliberative conversations, with survey-based participant matching and structured event templates to support constructive group dialogue. ### [PolicyKit](https://github.com/policykit/policykit) [#policykit] **Out of scope** · by UW Social Futures Lab Focused on community self-governance and moderation policy rather than automated abuse detection, so it's out of scope for a CSAM pipeline. Still a useful reference for how platforms encode and enforce their own safety rules above the detection layer. > Governance-authoring toolkit that lets online communities design and execute their own governance and moderation rules (drawing on Ostrom's commons framework) across platforms like Slack, Reddit, and Discourse. # Red-teaming & evaluation (/docs/ecosystem/red-teaming) Adversarial testing harnesses. We recommend pairing promptshield with one of these, and plan to contribute the CSAM-intent probes the generalist harnesses deliberately omit. **8 projects** — 5 use · 3 learn from. *Project descriptions are adapted from [awesome-safety-tools](https://github.com/roostorg/awesome-safety-tools) (maintained by [ROOST](https://roost.tools)); the verdicts and analysis are ours. Snapshot: June 2026 — a point-in-time view that complements, and does not replace, their living list.* ### [Aymara](https://github.com/aymara-ai/aymara-sdk-python) [#aymara] **Use** · by Aymara · pairs with [promptshield](/docs/promptshield) Aymara's safety and jailbreak evals are a clean way to score whether prompts reach a model after promptshield screens them; we ship the guard and recommend pairing it with this kind of adversarial eval rather than trusting the filter blind. It has no CSAM-intent coverage out of the box, which is the gap our planned probe pack fills. > A Python SDK of automated evaluation tools for AI safety, accuracy, and jailbreak vulnerability, scoring model responses against configurable policies. ### [Garak](https://github.com/NVIDIA/garak) [#garak] **Use** · by NVIDIA · pairs with [promptshield](/docs/promptshield) Garak's breadth of off-the-shelf probes makes it our first recommendation for stress-testing promptshield and the csam-shield prompt path across many attack classes at once. It deliberately omits CSAM specifics, so we plan to contribute CSAM-intent probes as a plugin rather than rebuilding the harness. > NVIDIA's framework for adversarial testing and evaluation of LLMs, shipping a broad library of probes for jailbreaks, prompt injection, toxicity, and data leakage. ### [Prompt Fuzzer](https://github.com/prompt-security/ps-fuzz) [#prompt-fuzzer] **Use** · by Prompt Security · pairs with [promptshield](/docs/promptshield) Prompt Fuzzer is a fast way to probe whether crafted injections slip past promptshield before reaching the model, which is exactly the adversarial pairing we recommend alongside shipping the guard. Its generalist injection focus means CSAM-intent cases are still on you to add. > An interactive tool from Prompt Security for testing the prompt-injection and jailbreak resilience of an LLM system's configuration and system prompt. ### [Promptfoo](https://github.com/promptfoo/promptfoo) [#promptfoo] **Use** · by Promptfoo · pairs with [promptshield](/docs/promptshield) Promptfoo's developer experience and OWASP/NIST-mapped attack strategies make it our recommended pick for repeatable, reportable evals of promptshield and the csam-shield path in CI. We plan to contribute CSAM-intent strategies as a plugin, since the built-in packs intentionally leave that domain out. > An automated LLM evaluation and red-teaming framework with report generation and ready-to-use attack strategies mapped to OWASP and NIST frameworks. ### [PyRIT](https://github.com/Azure/PyRIT) [#pyrit] **Use** · by Microsoft · pairs with [promptshield](/docs/promptshield) PyRIT's multi-turn orchestration is the one we reach for when an attack only emerges across a conversation, which is the harder case for any single-prompt guard like promptshield to catch. It ships no CSAM-intent content, so our planned red-team pack plugs into PyRIT rather than replacing it. > Microsoft's Python Risk Identification Tool for generative AI, built for automated red teaming including multi-turn, conversational attack orchestration. ### [Counterfit](https://github.com/Azure/counterfit/) [#counterfit] **Learn from** · by Microsoft Counterfit's harness-and-attack abstraction is worth studying as a model for orchestrating adversarial tests, though its focus on classic adversarial-ML perturbations sits further from the prompt-screening path promptshield guards. Useful as a reference for structuring an attack suite more than as a day-to-day guard validator. > A command-line automation tool from Microsoft for assessing the security and robustness of AI models, wrapping adversarial-ML attacks behind a common interface. ### [LLM Canary](https://github.com/LLM-Canary/LLM-Canary) [#llm-canary] **Learn from** · by LLM Canary LLM Canary's vulnerability benchmark is a useful reference for the categories worth tracking when you validate a guard like promptshield, and its scored test cases show one way to make results comparable over time. Lighter-weight than the generalist harnesses, so we treat it as a learn-from source for our own probe scoring. > A benchmarking tool that evaluates LLMs for security vulnerabilities and adversarial robustness against a curated set of test cases. ### [Socketteer](https://github.com/socketteer) [#socketteer] **Learn from** · by Socketteer Socketteer's model-vs-model setups are a useful source of ideas for generating adversarial conversations that probe a guard's blind spots, even if they are more research scaffolding than a turnkey validator for promptshield. We treat it as a learn-from reference for designing multi-turn CSAM-intent scenarios. > A collection of experimental tools that let AI models interact with each other to surface conversational weaknesses and emergent failure modes. # Rules, decisioning & clustering (/docs/ecosystem/rules-decisioning) Rules engines and decisioning. FightCSAM does not build a rules engine — we ship engines that feed yours. ROOST Osprey is the one we recommend and target with an adapter. **9 projects** — 4 use · 2 learn from · 3 reference. *Project descriptions are adapted from [awesome-safety-tools](https://github.com/roostorg/awesome-safety-tools) (maintained by [ROOST](https://roost.tools)); the verdicts and analysis are ours. Snapshot: June 2026 — a point-in-time view that complements, and does not replace, their living list.* ### [Druid](https://github.com/apache/druid) [#druid] **Use** · by Apache A solid analytics datastore to depend on when you need to query and trend the moderation and detection events your pipeline emits at scale. It is infrastructure under a Trust & Safety stack, not a rules engine itself. > High-performance, real-time analytics database for fast aggregation and slice-and-dice queries over large event streams. ### [Osprey](https://github.com/roostorg/osprey) [#osprey] **Use** · by ROOST · pairs with [csam-shield](/docs/csam-shield) This is the rules engine we recommend and explicitly do not try to rebuild: FightCSAM ships an Osprey/Coop adapter so our detection signals (hashes, classifier scores, prompt flags) become events Osprey can decision on. Our tools are the engines that feed your Osprey, not a competing decisioning layer. > High-performance rules engine for real-time Trust & Safety and anti-abuse event processing at scale, running in production at Bluesky, Discord, and Matrix. ### [scikit-learn](https://github.com/scikit-learn/scikit-learn) [#scikit-learn] **Use** · by scikit-learn General-purpose clustering and ML infrastructure you can depend on to group near-duplicate reports, cluster abuse signals, or triage event streams before they reach your decisioning engine. Foundational tooling, not CSAM-specific. > Mature Python machine-learning library that includes clustering algorithms such as K-Means, DBSCAN, and hierarchical clustering. ### [SpamAssassin](https://spamassassin.apache.org) [#spamassassin] **Use** · by Apache Battle-tested filtering infrastructure to depend on for the spam-and-abuse layer that sits alongside CSAM detection; its scored-rules approach is a useful upstream signal source for a decisioning engine. General-purpose anti-abuse infra rather than a T\&S rules engine. > Mature anti-spam platform combining text analysis, Bayesian filtering, and DNS blocklists to score and classify messages. ### [bogofilter](https://bogofilter.sourceforge.io/) [#bogofilter] **Learn from** · by bogofilter A clean, lightweight example of a classifier that improves from reviewer feedback, a pattern worth studying for any human-in-the-loop moderation signal. It is a mail-focused Bayesian filter, so it informs design rather than slotting directly into a CSAM pipeline. > Fast Bayesian spam filter that classifies messages and continuously learns from human corrections. ### [Marble](https://github.com/checkmarble/marble) [#marble] **Learn from** · by Checkmarble · pairs with [evidencevault](/docs/evidencevault) The strongest open-source case-management and decisioning UI in this space; its immutable audit trail is a model worth studying for evidencevault's chain-of-custody and reviewer-audit views. It is built for fintech fraud rather than CSAM, so we learn from its patterns rather than wrap it. > Real-time fraud-detection and compliance engine for fintech, with a rules layer, an immutable audit trail, and a built-in case manager. ### [RulesEngine](https://microsoft.github.io/RulesEngine/) [#rulesengine] **Reference** · by Microsoft A well-scoped library for expressing rules in JSON if your stack is .NET, useful as a reference for rule-definition ergonomics. It is a general business-rules library, not a high-throughput T\&S event engine like Osprey, which is what a real-time abuse pipeline needs. > A .NET library for abstracting business rules as JSON-defined expressions that can be evaluated at runtime. ### [SmiteSpam](https://github.com/wikimedia/mediawiki-extensions-SmiteSpam) [#smitespam] **Reference** · by Wikimedia A narrowly scoped spam-identification tool tied to MediaWiki, useful as a reference for wiki operators but not a general decisioning component. Outside a CSAM pipeline it serves mainly as a worked example of platform-specific spam triage. > A MediaWiki extension that identifies likely spam pages so administrators can review and clean them up. ### [SQRL](https://github.com/sqrl-lang/sqrl) [#sqrl] **Reference** · by Smyte Historically influential as a purpose-built rules language for abuse detection and worth reading for its stateful-stream design, but it has been inactive since 2023. Treat it as a reference for ideas rather than a dependency; Osprey is the maintained engine we point teams to. > Smyte Query and Rules Language, a safe stateful language for writing rules over event streams; the project is inactive as of 2023.