Store & Forward (S&F) — Quick Guide
This guide explains how to collect monitoring data at the edge, buffer it safely to files, and then ingest it into the cloud database with the Orchestrator — all using the single entrypoint main.py.
Overview
- Edge Monitoring writes NDJSON files (optionally gzip-compressed) instead of writing directly to the database.
- Files are rotated by size/lines/time and moved to a transfer folder.
- A transport mechanism (e.g., shared folder or sync agent) delivers files to the cloud inbox.
- The S&F Orchestrator watches the inbox, sorts records by timestamp, and performs idempotent inserts/updates.
Edge — How to Enable S&F
- Command:
python main.py --monitoring <DATASOURCE> --store-and-forward --saf-source-id edge-01- Optional flags:
--saf-max-lines 10000(default)--saf-max-bytes 8(MB, default)--saf-max-age 30(seconds, default)--saf-outbox instance/saf/outbox--saf-transfer instance/saf/transfer--saf-compress gzip(default)--saf-storage-limit 100(MB; warn at 90%, delete oldest at 100%)- Behavior:
- Outbox: temporary staging for active
.ndjson.partfiles while they are being written. - Transfer: finalized files are moved to
transfer/(as.ndjsonor.ndjson.gz). This is the folder you sync to the cloud. - If storage reaches 100% of the limit, the oldest files are deleted and a warning is logged.
Cloud — Orchestrator
- Command (headless):
python main.py --saf-orchestrator --saf-inbox instance/saf/inbox- With Web UI:
python main.py --ux --saf-orchestrator --saf-inbox instance/saf/inbox- Navigate to
/monitoring/saf-orchestrator - Behavior:
- Reads top-level files in the inbox (ignores subfolders), validates JSON, sorts globally by
value_ts, and performs idempotent upserts. - Moves processed files to
inbox/processed/. Invalid files go toinbox/quarantine/. - Processed retention: the orchestrator enforces a size limit on
inbox/processed/(default 500MB). At 90% it logs a warning; at 100% it deletes the oldest files.
Record Format (NDJSON)
Each line is a JSON object with minimal fields:
{
"source_id": "edge-01",
"source_seq": 123,
"created_at": "2025-10-17T10:00:00Z",
"datasource": "rest-api",
"stream_id": "stream-1",
"stream_name": "S1",
"value": { ... },
"value_ts": "2025-10-17T10:00:00Z",
"status": "online",
"event_type": "scan_success"
}
source_seqis assigned at the edge per record; cloud uses it for idempotency.value_tsdrives chronological ordering across files.
Tips & Troubleshooting
- Avoid running multiple Monitoring processes on the same edge directory (to prevent duplicate work).
- If you see repeated ingest logs, ensure files are moving to
processed/and the inbox contains only fresh.ndjson(.gz)files at top level. - If
Recent Filesis empty after changing inbox, it will display the latest fromprocessed/as a fallback once available. - For large backlogs, increase
--saf-max-lines/bytes/ageand consider a larger storage limit. - If streams are missing in the cloud DB, create
MonitoredStreamentries (future option: auto-create placeholders).
Safety Notes
- Compression on, no encryption in Phase 1.
- Storage policy (edge transfer / cloud processed): warn at 90%; at 100% delete oldest files to keep the system running.
- Idempotency: re-ingesting the same files updates existing records; no duplicates.