The first few DMARC aggregate reports usually feel harmless.
Then a domain starts sending more mail, more receivers participate, forwarding paths multiply, and suddenly the rua mailbox turns into an operations problem instead of a reporting checkbox.
At that point the question is not "how do we get DMARC XML at all?" It is "how do we keep receiving it without letting storage, parsing, and retention drift into a mess?"
This post is about the practical side of operating DMARC aggregate reporting at volume: mailbox sizing, handling both .zip and .gz attachments safely, and deciding how long to keep the data.
If the XML structure itself is still new, start with DMARC reporting 101 and DMARC aggregate reports explained. This article assumes the basic DMARC report format is already familiar.
DMARC aggregate reports are designed to scale better than per-message failure reporting, but "scalable" does not mean "operationally free."
Volume grows from a few different directions at once:
RFC 7489 explicitly positions DMARC as something that must work at Internet scale and defines aggregate feedback as the normal reporting mechanism for that reason. It also allows domain owners to publish destination URIs with optional maximum-size hints such as mailto:reports@example.com!50m for report delivery requests.
That is useful context, but it does not solve the mailbox engineering for you.
This is the mistake that causes the most avoidable pain.
Teams often size only for the attachment as delivered over email. But the thing that actually matters to downstream processing is the expanded XML size and the number of reports arriving per day.
Those are not the same number.
For example:
200 KB.gz attachment.3 MB.The practical outcome is simple:
Compressed attachment size is not a safe proxy for final storage footprint. Plan for the expanded XML and the parsed dataset, not just the email attachment.
There is no universal number that fits every domain, but a planning model works well.
Estimate these four things:
Then plan mailbox capacity around at least several days of buffered intake, not just one day.
Why several days? Because the real outage pattern is usually this:
For many teams, a reasonable operational target is:
The last metric is the most important one. A mailbox can still look "small enough" while processing is already falling behind.
The rua mailbox should be an intake queue, not the permanent system of record.
That distinction matters because mailboxes are usually weak at all of the things DMARC operations eventually need:
The better pattern is:
That keeps the mailbox from becoming a second unmanaged archive.
.gz and .zip as normal, expected report formatsAt scale, compressed reports are not an edge case. They are normal.
In practice, DMARC aggregate XML often arrives as one of these:
report.xml.gzYour ingestion logic should treat gzip and ZIP handling as baseline functionality, not as optional polish.
The operational differences matter:
The gzip format is straightforward and common for DMARC reporting. Tooling such as Python's gzip support reads and decompresses gzip content directly, and it can also handle multi-member gzip data.
For DMARC operations, gzip is usually the simpler case:
ZIP is more flexible, which also means more things can vary:
That does not make ZIP unsafe by definition, but it does mean your code should inspect the archive rather than assuming a single expected filename.
Python's zipfile documentation also calls out decompression pitfalls such as invalid archives, unsupported compression methods, and resource exhaustion from oversized extraction. That is directly relevant to DMARC pipelines that process attachments automatically.
This is the boring part that saves incidents.
Filenames help, but they are not a trust boundary.
A report named something.xml.gz might not actually be valid gzip. A .zip file might contain unexpected members. Validate the attachment format before processing it.
Do not allow an attachment with a modest compressed size to expand without bounds.
Even if the sender is a legitimate receiver, corrupt files and accidental oversized payloads still happen. Put limits around:
This is not paranoia. It is standard hygiene for any automated archive handling.
Most DMARC aggregate reports contain one XML document. Still, the parser should verify:
If the archive contains multiple XML files, decide deliberately whether to reject, process all, or process only known-valid members.
At minimum, store enough report-level metadata to avoid processing the same report repeatedly.
Useful keys usually include:
That matters because retries, forwarding, mailbox rules, and manual re-imports all create duplicate-processing risk.
Throwing away raw attachments immediately can make parser debugging painful.
Keeping them forever is usually unnecessary.
The balanced pattern is short-lived raw retention for troubleshooting, with longer retention applied only to parsed or normalized report data.
If the rua mailbox is growing fast, a few habits make a disproportionate difference.
Do not mix DMARC reports with human support mail or abuse workflows.
A dedicated address makes it much easier to:
If you use an external destination, DMARC external report destinations covers the authorization side.
Mailbox quota alerts are too late.
Add monitoring for:
Those metrics show whether the system is keeping up.
A broken parser can make a healthy DMARC deployment look blind.
If the issue is mailbox growth or decompression failure, that is an ingestion problem, not evidence that SPF, DKIM, or DMARC alignment suddenly got worse.
Keep those dashboards separate.
Retention policy should follow operational need and privacy discipline, not vague instinct.
RFC 7489's privacy considerations exist for a reason: even aggregate reports are still operational telemetry about who is sending with your domain, from which source IPs, and with what authentication outcomes.
That does not mean aggregate reports are too sensitive to keep.
It means they should be retained intentionally.
For most teams, it helps to think in three layers:
Keep short-term.
Purpose:
This is usually the layer to expire first.
Keep only if there is a concrete operational reason.
Purpose:
Many teams can keep this for less time than normalized results, or avoid keeping it at all after successful ingestion.
Keep the longest.
Purpose:
This is generally the form that delivers the real operational value.
The right window depends on how the data is used, but these questions help:
In practice, many teams benefit from keeping normalized DMARC reporting data longer than the raw attachments.
That keeps operational history while reducing clutter and unnecessary copies of the same information in different formats.
For the privacy angle in more depth, DMARC report privacy & compliance is the companion article.
If the current state is "reports go into a mailbox and someone hopes the parser keeps up," start here:
rua mailbox.That is enough to move from "DMARC reporting exists" to "DMARC reporting is actually operable at scale."
At scale, DMARC aggregate reporting is partly an email-authentication topic and partly a data-ingestion topic.
The domains that stay sane operationally are the ones that treat the rua mailbox as a short-lived intake point, handle .gz and .zip as first-class formats, and keep only the data forms that continue to provide value.
If mailbox growth is becoming noticeable, that is the right time to tighten the pipeline. Waiting until the mailbox is full usually means the parser, retention policy, and observability were already behind.