PII Detection Without Cloud AI — Local SLM + Layered Architecture

Most PII Detection Doesn’t Need a Model

When teams add AI to regulated data pipelines, there’s a natural pull toward using a language model for PII detection. It feels like the right tool — models understand context, they can catch subtle patterns, and they’re already in the stack.

But most PII detection is pattern matching. Credit card numbers have checksums. SSNs have formats. IBANs follow country-specific templates. Phone numbers match known patterns. This is deterministic work — and 98% of detection decisions fall into this category.

Running deterministic tasks through a cloud API means paying per token for work a regex can do in microseconds. It means adding network latency to something that should be sub-millisecond. And under GDPR, it means your data leaves the perimeter before you’ve decided whether it’s safe to leave — which is the compliance event, regardless of what the provider does afterward.

There’s a more efficient architecture: deterministic layers first, a small local model only for the genuinely ambiguous remainder.

The Pattern: Layered Detection

The architecture that works for regulated data pipelines separates detection into layers, each doing what it’s good at:

Layer 1: Pattern recognisers (regex + context keywords)
         → catches structured entities: SSNs, credit cards, IBANs, phone numbers
         → zero ambiguity, zero latency, fully auditable

Layer 2: Statistical NER (trained ML model, ~400MB)
         → catches named entities: person names, organisations, locations
         → runs locally, no API calls, ~22ms per request

Layer 3: Local SLM (1.5B parameters, no cloud tokens)
         → handles the ambiguous remainder: implicit sensitivity, context judgment
         → only invoked when layers 1-2 say "I don't know"

Each layer has a different cost profile:

Layer	Latency	Auditability	What it catches
Pattern recognisers	under 1ms	Every match logged with rule ID	Structured formats (SSN, credit card, phone)
Statistical NER	~22ms	Entity type + confidence score	Names, organisations, locations
Local SLM (when needed)	~150ms	Prompt + response logged	”My therapist said…”, obfuscated patterns

The key insight: layers 1 and 2 handle 98% of detection decisions. The SLM only fires on the remaining 2% — the genuinely ambiguous cases where you need contextual judgment. No cloud API call. No token cost. No data leaving your network.

Why This Matters Architecturally

1. Auditability is binary — you have it or you don’t

When a pattern recogniser flags a credit card number, you can point to the exact regex, the matched string, and the confidence boost from surrounding keywords like “card number.” A compliance officer can review this.

When a language model says “this contains PII,” you have a probability and a prompt. Try explaining that to a regulator.

For regulated industries, the audit trail isn’t a nice-to-have. It’s the whole point. Deterministic layers give you this for free. Probabilistic layers require building a separate governance wrapper around them.

2. Failure modes are fundamentally different

A pattern recogniser either matches or doesn’t. You can enumerate every case it handles and test them. When it misses something, you add a rule — the fix is permanent and predictable.

A language model fails probabilistically. The same input might be caught in one run and missed in the next. You can’t write a regression test that guarantees it will always detect a specific pattern. In regulated data pipelines, non-deterministic failure modes are unacceptable for the core detection path.

3. The threshold is a policy decision, not a technical one

Detection sensitivity should be set by your compliance team, not your ML team. A financial services company might want every SSN-shaped string flagged. A marketing platform might only care about validated formats.

In a layered architecture, this is a configuration change — adjust the confidence threshold. In a model-first architecture, it’s a prompt engineering exercise with unpredictable outcomes.

What We Measured

We tested this architecture on 60 synthetic prompts across healthcare, finance, and telecom — each labelled with the PII entities it contains.

Default configuration: 76.4% recall (42 of 55 entities detected).

After diagnosing the failures, we found most weren’t detection limitations:

Checksum validation rejecting edge-case formats
Missing recognisers for domain-specific entities (date of birth, medical licence)
Threshold filtering valid detections (scored 0.40, threshold was 0.60)

After adding five custom pattern recognisers and tuning the threshold: 98.2% recall (54 of 55 entities). No change in latency. No GPU. No LLM in the detection path.

The single remaining miss was a routing error — the prompt was sent to the wrong pipeline. The detector never ran. That’s an orchestration problem, not a detection problem.

The Principle Generalises

This isn’t just about PII. The layered pattern applies to any regulated data pipeline where you need:

High recall on known patterns (use deterministic rules)
Flexibility for edge cases (use a small model, locally)
Full auditability (deterministic layers give you this; probabilistic layers require governance wrappers)
No data exfiltration (everything runs locally, nothing leaves your perimeter)

The same architecture works for document classification, content moderation, and compliance screening. The layers change — different rules, different models — but the principle is the same: deterministic first, probabilistic only where necessary, governance around everything.

The Token Economy

The industry defaults to the biggest model available. This is the equivalent of running every database query through a distributed analytics cluster when most of them are simple key-value lookups.

Consider the cost profile of processing 10,000 prompts per day:

Pattern recognisers: zero tokens, zero API cost, sub-millisecond
Statistical NER: zero tokens, zero API cost, ~22ms on CPU
Local SLM (1.5B): zero cloud tokens — runs on your hardware, ~150ms
Cloud LLM: ~500 tokens per call × $0.003/1K tokens = $15/day for 10K calls

With layered detection, 98% of decisions cost nothing. The SLM handles 2% locally. Cloud LLM usage drops to near zero for detection — reserved only for tasks that genuinely require large-model reasoning.

Right-sizing means each layer earns its place by handling what the previous layer can’t. Nothing runs unless it needs to.

Practical Implications

If you’re designing a data pipeline for regulated industries:

Start with rules. Enumerate the PII patterns in your domain. Build recognisers. Test them. This gets you to 80%+ recall before you touch any model.
Add statistical NER for names and entities. A trained NER model is not an LLM — it’s a purpose-built classifier. Fast, predictable, auditable.
Reserve the SLM for the ambiguous remainder. “My therapist said…” has no named entity, but it’s sensitive. That’s where a language model adds value — as a second opinion, not the primary detector.
Make the threshold configurable per domain. Healthcare and finance detect aggressively. Marketing filters more. This is a policy decision exposed as configuration.
Log everything. Every layer, every match, every confidence score. The audit trail is the product in regulated environments.

This architecture pattern emerged from our work on governed AI routing pipelines. For the detailed implementation — custom recognisers, threshold tuning methodology, and three-mode benchmark (rules-only vs hybrid vs SLM-only) — see our labs.