SPF `~all` vs `-all`: rollout strategy to move to hard fail without breaking legitimate senders

spf

If a domain is still publishing ~all, the usual reason is not that softfail is inherently better. It is that the team does not yet trust its sender inventory enough to say, with confidence, "everything not listed is unauthorized."

That instinct is healthy.

Moving from ~all to -all should be treated as an inventory and rollout project, not as a one-line DNS cleanup. The safe goal is simple: get to hard fail only after every legitimate sender is known, aligned where needed, and tested under real traffic.

What ~all and -all actually mean

In RFC 7208, the SPF qualifiers map to different results:

  • ~all returns softfail
  • -all returns fail

The same RFC describes softfail as a weak statement that the host is probably not authorized, while fail is an explicit statement that the host is not authorized.

That difference matters operationally even though receivers still apply their own local policy. A hard-fail SPF record is the domain owner saying: "this list is complete enough that everything else should be treated as unauthorized."

The first important nuance: DMARC does not treat softfail as a pass

This is where many rollout discussions drift off course.

DMARC only passes when at least one aligned authentication mechanism produces a pass result, per RFC 7489. A message with SPF softfail does not have an SPF pass for DMARC purposes. A message with SPF fail also does not.

So if the question is "will switching from ~all to -all make DMARC start passing?" the answer is no.

The real difference is:

  • ~all gives receivers a weaker SPF signal for non-authorized sources
  • -all gives receivers a definitive SPF fail for non-authorized sources
  • both still require sender inventory discipline if legitimate mail lacks aligned DKIM

If DMARC alignment needs a refresher, Return-Path vs From: practical implications is the companion read.

Why many domains start with ~all

There is a practical reason Google documentation still shows ~all in many setup examples, while Microsoft documentation recommends -all for Microsoft 365 domains once the setup is understood.

Those docs are solving slightly different problems:

  • Google's setup guidance assumes many admins are still identifying all senders and should avoid breaking mail while that discovery work is incomplete.
  • Microsoft's guidance assumes you should get to a definitive authorization boundary, especially when DKIM and DMARC are also in place.

Neither position is irrational. They reflect different stages of maturity.

That is the right mental model for rollout too: ~all is often a transitional state, not the destination.

What actually breaks when teams switch too early

Changing ~all to -all does not usually break well-configured mail streams that already have aligned DKIM and correct SPF authorization.

What it exposes is all the mail the organization forgot about.

Typical casualties are:

  • old CRM or ticketing systems still sending from the main domain
  • web forms using the organizational domain from an app server not in SPF
  • printers, scanners, and appliances relaying directly
  • regional business tools bought outside central IT
  • vendors that send with the right visible From but the wrong bounce domain
  • sources that were surviving only because receivers were lenient about softfail

This is why Building a sender inventory with DMARC reports should usually happen before the SPF qualifier change, not after it.

The safe rollout strategy

The shortest good strategy is:

  1. inventory every sender
  2. fix alignment and authorization gaps
  3. separate risky streams onto subdomains where appropriate
  4. validate with live traffic and reports
  5. only then change ~all to -all

The rest of this post expands each step.

Step 1: Build a real sender inventory

Do not trust tribal knowledge here.

Ask a narrower question than "what systems send email?" Ask: "what systems send mail that uses this exact domain in the SMTP envelope sender or visible From path?"

Sources to check:

  • DMARC aggregate reports
  • existing SPF include: and ip4: or ip6: terms
  • ESP and CRM admin panels
  • application configs and SMTP relays
  • support platforms, billing systems, and forms
  • security devices and multifunction printers

If the apex domain SPF record already feels crowded or unclear, that is usually a sign the domain is carrying too many unrelated streams. In that case, SPF flattening vs includes: tradeoffs, failure modes, and safer alternatives and Transactional vs marketing email separation are directly relevant.

Step 2: For each sender, decide how it is supposed to pass DMARC

This is the step teams skip when they focus only on SPF syntax.

Every live sender should have an intended authentication path:

  • SPF pass and aligned 5321.MailFrom
  • DKIM pass and aligned d= domain
  • ideally both

If a sender can only survive because receivers are tolerant of SPF softfail, that sender is already fragile.

Common examples:

Sender A: properly authorized platform

example.com. IN TXT "v=spf1 include:_spf.example.net ~all"

The sender is authorized by the include and signs with aligned DKIM. Changing the terminal qualifier to -all is probably safe for this source because authorized mail will still SPF-pass before evaluation reaches all.

Sender B: visible From looks right, bounce path is wrong

Mail is sent as billing@example.com, but the actual 5321.MailFrom is mailer.vendor-example.net, and there is no aligned DKIM.

That sender was never in a healthy state. Moving to -all did not create the problem. It merely removed the ambiguity around it.

Step 3: Remove stale senders before adding new exceptions

A common mistake is to discover uncertainty and respond by stuffing more includes into the main SPF record "just in case."

That usually creates two new problems:

  • more lookup pressure toward the SPF 10-lookup limit
  • a permanent record full of legacy authorizations nobody wants to revisit

If an old platform has not sent legitimate mail in months, delete it rather than preserving it as a hypothetical future need.

If lookup budgeting is already tight, SPF 10-DNS-lookup limit: why it happens, how to audit includes, and mitigation patterns is the better next step than adding more guesswork.

Step 4: Move third-party or higher-risk streams to subdomains

This is often the cleanest way to reach -all on the main domain faster.

For example:

  • example.com for employee and core transactional mail
  • marketing.example.com for campaign traffic
  • support.example.com for ticketing systems
  • alerts.example.com for product notifications

Microsoft explicitly recommends subdomains for email services that are not under your direct control. Operationally, that advice is solid even outside Microsoft 365.

It reduces blast radius and lets the main domain reach a tighter SPF posture without waiting for every external platform to become equally well managed.

Step 5: Keep DKIM from being the hidden blocker

Many teams frame this migration as an SPF-only change, but the dangerous cases are usually hybrid ones:

  • SPF is incomplete
  • DKIM is absent or not aligned
  • DMARC reports are not being watched closely

If a sender has aligned DKIM, then switching from ~all to -all is much less likely to affect legitimate delivery. If a sender has no aligned DKIM and marginal SPF, then the environment is already brittle.

That is one reason Microsoft says to deploy DKIM and DMARC alongside SPF, and why Google's sender guidance also expects authenticated mail rather than SPF alone.

Step 6: Roll out with a transition window, not a cliff

RFC 7208 explicitly warns that when SPF records change, there should be a transition period so the old policy remains valid long enough for legitimate mail already in transit to be checked under the expected policy.

In practice, that means:

  • avoid making the change during an outage or major campaign launch
  • allow for DNS TTL and queued-mail lag
  • watch DMARC aggregate data and support tickets after the cutover
  • keep rollback simple if a forgotten sender appears

The DNS edit is small. The monitoring window after it is the real rollout.

A practical sequence that works well

For a busy production domain, this pattern is usually safer than debating ~all versus -all in the abstract:

  1. Confirm exactly one SPF record exists.
  2. Remove obviously dead includes or IPs.
  3. Confirm aligned DKIM on critical employee, billing, and transactional streams.
  4. Move marketing or vendor-heavy streams to dedicated subdomains where possible.
  5. Review DMARC aggregate reports until unknown sources are explained.
  6. Change the terminal qualifier from ~all to -all.
  7. Monitor for a few days and fix any missed sender rather than reverting immediately unless there is active impact.

That sequence is boring. Boring is good here.

Example: safe before-and-after design

Before

example.com. IN TXT "v=spf1 include:_spf.google.com include:spf.protection.outlook.com include:mailer.vendor-example.net ~all"

Problems:

  • employee mail and vendor mail are mixed together
  • the vendor may not be under direct operational control
  • the team is using ~all because they do not trust that the record is complete

Better target state

example.com. IN TXT "v=spf1 include:_spf.google.com include:spf.protection.outlook.com -all"
marketing.example.com. IN TXT "v=spf1 include:mailer.vendor-example.net -all"

Why this is better:

  • the main domain gets a clearer authorization boundary
  • the vendor stream has its own policy surface
  • troubleshooting becomes much easier
  • reputation and breakage are isolated by stream

When it is still reasonable to stay on ~all

Staying on ~all is defensible for a while if any of these are true:

  • the sender inventory is still incomplete
  • multiple business units can launch mail without central review
  • key senders still lack aligned DKIM
  • a domain migration or ESP migration is in progress
  • DMARC reporting is not yet giving enough visibility

But treat that as temporary technical debt, not a best practice to preserve forever.

When -all is the right move

Moving to -all is usually justified when:

  • legitimate senders are known and documented
  • stale authorizations have been removed
  • important streams have aligned DKIM
  • third-party streams are segmented where practical
  • DMARC reports have stopped revealing surprises

At that point, -all is not an aggressive setting. It is simply an honest one.

Softfail is often a discovery posture. Hard fail is an enforcement posture. The operational question is not which one feels safer. It is whether the sender inventory is complete enough for enforcement.

Bottom line

The move from SPF ~all to -all should happen after inventory, segmentation, and validation, not before.

If a domain still depends on ~all to avoid breaking legitimate senders, the real work is to identify those senders, fix their SPF or DKIM path, and move high-variance services onto their own subdomains. Once that is done, -all becomes the natural end state rather than a risky leap.

Previous Post