Architecture¶

Eneru is intentionally split into small, boring pieces. The daemon has to keep making shutdown decisions while networks are flaky, notification endpoints are unreachable, SQLite is slow, or a remote server is already half gone. Most architectural choices come from that constraint.

This page is a map for operators and contributors. It explains the main runtime paths and points to the files to read next.

System shape¶

Eneru does not talk to UPS hardware directly. NUT owns drivers and hardware communication; Eneru consumes NUT data and runs policy.

  Hardware and drivers                 Eneru policy and action

+-----------------------+     +-----------------------+     +-----------------------+
| UPS hardware          |     | NUT server / upsc     |     | Eneru monitor         |
| USB, SNMP, vendor     +---->| driver-specific data  +---->| triggers and policy   |
| protocol              |     | UPS variables         |     | shutdown coordinator  |
+-----------------------+     +-----------------------+     +-----------+-----------+
                                                                  |
                  +-------------------------------+---------------+----------------+
                  |                               |                                |
                  v                               v                                v
       +----------+-----------+       +-----------+----------+        +------------+---------+
       | Shutdown phases      |       | Observability        |        | Notifications        |
       | VMs, containers, SSH |       | SQLite, API, metrics |        | Apprise queue, retry |
       | filesystems, local   |       | graphs, state file   |        | coalescing          |
       +----------------------+       +----------------------+        +----------------------+

Start reading in src/eneru/monitor.py. The module owns the per-UPS monitor loop, UPS polling, trigger evaluation, state updates, and shutdown orchestration.

Per-UPS monitor loop¶

Each UPS group runs the same core loop. The loop polls NUT, updates state, records a stats sample, evaluates health checks, and triggers shutdown when a policy condition is met.

UPSGroupMonitor loop

    +-----------------------+
    | poll NUT via upsc     |
    | one snapshot/cycle    |
    +-----------+-----------+
                |
                v
    +-----------+-----------+        +-------------------------+
    | update MonitorState   +------->| state file              |
    | status, timers, flags |        | SQLite sample buffer    |
    +-----------+-----------+        +-------------------------+
                |
                v
    +-----------+-----------+
    | health checks         |
    | voltage, AVR, bypass  |
    | overload, battery     |
    +-----------+-----------+
                |
                v
    +-----------+-----------+
    | shutdown triggers     |
    | FSD, battery, runtime |
    | depletion, extended   |
    +-----------+-----------+
                |
                v
    +-----------+-----------+
    | shutdown sequence     |
    | only when triggered   |
    +-----------------------+

Key files:

File	What to read there
`src/eneru/monitor.py`	`UPSGroupMonitor`, polling, trigger evaluation, shutdown sequence, lifecycle cleanup
`src/eneru/state.py`	`MonitorState`, the in-memory state shared by the loop, TUI snapshots, and redundancy evaluator
`src/eneru/health/voltage.py`	Voltage thresholds, auto-detect re-snap, hysteresis, AVR, bypass, overload
`src/eneru/health/battery.py`	Depletion-rate history and battery anomaly confirmation

Shutdown phases are mixins¶

Shutdown behavior is decomposed into mixins instead of one large monitor file. UPSGroupMonitor owns the sequence; each phase owns its own implementation.

UPSGroupMonitor
  |
  +-- VMShutdownMixin
  |     libvirt / virsh graceful shutdown, force-destroy fallback
  |
  +-- ContainerShutdownMixin
  |     Docker / Podman detection, compose stacks, remaining containers
  |
  +-- FilesystemShutdownMixin
  |     sync, per-mount unmount, timeout handling
  |
  +-- RemoteShutdownMixin
        SSH phases, pre-shutdown actions, final shutdown command

The sequence stays readable in monitor.py, while the mechanics live in small files:

File	Phase
`src/eneru/shutdown/vms.py`	Libvirt/KVM VMs
`src/eneru/shutdown/containers.py`	Docker, Podman, compose
`src/eneru/shutdown/filesystems.py`	Sync and unmount
`src/eneru/shutdown/remote.py`	SSH-based remote shutdown
`src/eneru/actions.py`	Predefined remote action command templates

This makes shutdown extensions easier to review. A new phase should be a small mixin, wired into UPSGroupMonitor, documented in the config reference, and covered by unit and E2E tests.

Multi-UPS coordination¶

Single-UPS mode runs one UPSGroupMonitor. Multi-UPS mode creates one monitor per UPS group and shares the logger, notification worker, and local-shutdown coordination.

                    +------------------------+
                    | MultiUPSCoordinator    |
                    | starts one monitor per |
                    | configured UPS group   |
                    +-----------+------------+
                                |
       +------------------------+------------------------+
       |                        |                        |
       v                        v                        v
+------+---------+      +-------+--------+       +-------+--------+
| UPS group A    |      | UPS group B    |       | UPS group C    |
| monitor thread |      | monitor thread |       | monitor thread |
+------+---------+      +-------+--------+       +-------+--------+
       |                        |                        |
       +------------------------+------------------------+
                                |
                                v
                    +-----------+------------+
                    | shared services        |
                    | notification worker    |
                    | local shutdown lock    |
                    | lifecycle coordination |
                    +------------------------+

Only the group marked is_local: true can manage local resources such as VMs, containers, and filesystems. Remote servers can belong to any UPS group, but validation prevents duplicate ownership.

Read src/eneru/multi_ups.py for coordinator lifecycle, monitor thread management, local shutdown locking, and signal handling.

Redundancy groups¶

Redundancy groups are separate from independent UPS groups. They watch multiple UPS snapshots and shut down shared resources only when quorum is lost.

 +----------------------+     +----------------------+
 | UPS-A monitor state  |     | UPS-B monitor state  |
 | health + advisory    |     | health + advisory    |
 +----------+-----------+     +----------+-----------+
            |                            |
            +-------------+--------------+
                          |
                          v
            +-------------+--------------+
            | RedundancyGroupEvaluator   |
            | classify each member as    |
            | HEALTHY, DEGRADED,         |
            | CRITICAL, or UNKNOWN       |
            +-------------+--------------+
                          |
                          v
                  quorum still healthy?
                     |              |
                    yes             no
                     |              |
                     v              v
             keep monitoring   run group shutdown

Per-UPS triggers still run for redundancy members, but they become advisory flags. The evaluator decides whether the group should act based on min_healthy, degraded_counts_as, and unknown_counts_as.

Read src/eneru/redundancy.py for the evaluator, executor, group flag files, and shutdown reuse.

Persistent notifications¶

Notifications are deliberately asynchronous. The monitor inserts a row into SQLite and continues; a worker thread handles delivery, retry, coalescing, and lifecycle cleanup.

 +-------------------+      insert row       +------------------------+
 | monitor thread    +---------------------->| SQLite notifications   |
 | power/lifecycle   |                       | pending/sent/cancelled |
 | event occurs      |                       +-----------+------------+
 +---------+---------+                                   ^
           |                                             |
           | continue shutdown                           | retry/update rows
           v                                             |
 +---------+---------+                         +---------+--------------+
 | shutdown work     |                         | NotificationWorker    |
 | never waits on    |                         | Apprise delivery      |
 | network delivery  |                         | coalescing, pruning   |
 +-------------------+                         +------------------------+

When power is unstable, the network is usually unstable too. Discord, email, ntfy, and phone push services can all be unreachable while shutdown is in progress. Eneru should not block VM shutdown because a notification endpoint is down.

Key files:

File	What to read there
`src/eneru/notifications.py`	`NotificationWorker`, retry, backoff, coalescing, flush behavior
`src/eneru/stats.py`	SQLite `notifications` table and persistence helpers
`src/eneru/lifecycle.py`	Startup classification, recovery fold-in, lifecycle coalescing
`src/eneru/deferred_delivery.py`	systemd deferred stop delivery and restart suppression

SQLite stats and TUI separation¶

The monitoring loop writes samples to an in-memory buffer. StatsWriter flushes that buffer in the background. The TUI opens the database read-only and blends in live state-file samples so graphs do not lag by the full writer interval.

 +--------------+      +------------------+      +------------------+
 | monitor loop +----->| sample buffer    +----->| StatsWriter      |
 | one UPS poll |      | in memory        |      | flush every 10s  |
 +--------------+      +------------------+      +--------+---------+
                                                           |
                                                           v
                                                 +---------+----------+
                                                 | per-UPS SQLite    |
                                                 | samples, events   |
                                                 | aggregates, queue |
                                                 +---------+----------+
                                                           ^
                                                           |
                                       read-only queries   |
                                                           |
                                                 +---------+----------+
                                                 | TUI dashboard     |
                                                 | status, graph     |
                                                 | recent events     |
                                                 +--------------------+

Read src/eneru/stats.py, src/eneru/tui.py, and src/eneru/graph.py for the data store, dashboard, and Braille graph renderer.

Observability and the API¶

The API, Prometheus renderer, MQTT publisher, and TUI all read from the same status model in src/eneru/status.py. That model combines live monitor snapshots with sidecar JSON for remote health. It includes UPS battery/runtime/load, power-quality readings, grid-quality states, remote-health rows, redundancy-group rows, and event history.

monitor state + sidecars
        |
        v
  status read model
        |
        +-- HTTP API (/api/v1/ups, /api/v1/events, /api/v1/remote-health, /api/v1/config)
        +-- Prometheus /metrics
        +-- MQTT status payload
        +-- TUI one-shot and live views
        +-- browser dashboard (static SPA, see below)

Remote-health probes are the exception to the "read-only consumer" rule: the daemon's RemoteHealthManager runs the configured harmless SSH probe_command, default true, then writes live state and sidecar JSON. API, MQTT, Prometheus, and the TUI only read that state.

v6.0: authenticated write-path + dashboard¶

The embedded http.server handler (src/eneru/api.py) gained an opt-in, tiered auth layer and a small set of write endpoints. With api.auth disabled it is byte-for-byte the v5.x read-only API; with it enabled, reads stay open (so Prometheus keeps scraping) while writes require a credential. Every request flows through one gate:

          HTTP request
               |
               v
     +---------+----------+      static asset?  -> serve eneru/web SPA (no auth)
     | EneruAPIHandler    |      health/ready?   -> always open
     | _auth_active()     |
     | _authorize(write=) |
     +----+----------+----+
          |          |
       read         write (POST/PUT/DELETE)
          |          |
   open unless    require credential (401), else act + _audit() to events table
 require_for_reads   |
          |          +-- POST /auth/login,/logout        (SessionManager: in-memory TTL tokens)
          v          +-- POST /ups/{n}/command, PUT .../variables   (nut_control: upscmd/upsrw)
   sanitized vs      +-- DELETE /ups/{n}/events           (StatsStore.delete_events)
   extended /config  +-- POST /config/reload              (live reload, see below)
               |
               v
        +------+-------+        +------------------------+
        | AuthStore     |       | dashboard (eneru/web)  |
        | users + keys  |<------+ thin SPA: bearer token |
        | bcrypt/sha256 |       | in sessionStorage      |
        +---------------+       +------------------------+

AuthStore (src/eneru/auth.py) is a dedicated global SQLite DB (users + API keys) separate from the per-UPS stats DBs; auth is read live per request, so creating a user takes effect with no restart. Credentials never appear in /api/v1/config (sanitized anonymously, extended — but still secret-free — when authenticated). The dashboard under src/eneru/web/ is a dependency-free SPA served from the package via importlib.resources under a strict Content-Security-Policy.

Read src/eneru/api.py for the handler, gate, and routing; src/eneru/auth.py for the credential store; and src/eneru/nut_control.py for the upscmd/upsrw wrappers.

Config hot-reload¶

SIGHUP (or POST /api/v1/config/reload) re-reads the config without dropping the daemon (src/eneru/reload.py). The new file is parsed and validated; on any error the running config is kept and the error reported. Safe sections are swapped in place on the shared config object that the poll loop reads live; sections that own process-level handlers (api bind/port + auth, logging, local_shutdown) are reported restart-required rather than half-applied.

SIGHUP / POST /config/reload
        |
        v
  load_and_validate(path) --(invalid)--> keep old config, report errors
        | valid
        v
  apply_reload(): per-section diff
        +-- safe        -> swap in place (triggers, nut_control, prometheus, stats retention)
        +-- subsystem   -> hand to apply_reload() hook (notifications, MQTT, remote-health)
        +-- restart-req. -> report only (api, logging, local_shutdown, topology, db paths)

Configuration as a safety boundary¶

Config loading enforces a set of safety rules at parse time:

Raw voltage warning thresholds are not user-configurable; users choose bounded presets.
Safety-critical notification events cannot be suppressed.
Non-local UPS groups cannot own local resources.
A remote server cannot be owned by both an independent UPS group and a redundancy group.
shutdown_order and legacy parallel are mutually exclusive.
Exactly one UPS or redundancy group may be local.

config.yaml
    |
    v
+---+----------------+
| ConfigLoader       |
| parse YAML         |
| apply defaults     |
| validate safety    |
+---+----------------+
    |
    v
typed dataclasses used by monitors, coordinator, TUI, and workers

Read src/eneru/config.py for the dataclasses, parser, compatibility aliases, and validator rules.

Packaging matters¶

Eneru is a system daemon first. The native package path and the PyPI path intentionally differ:

Install path	Runtime shape
deb/rpm	`/opt/ups-monitor/eneru.py` wrapper, systemd service, config under `/etc/ups-monitor/`, command exposed as `eneru`
PyPI	Python package entry point, user-managed service or foreground process

Packaging tests check installed files and import layout because the deb/rpm build enumerates package contents explicitly in nfpm.yaml. Adding a new module under src/eneru/ requires a matching package entry.

Read packaging/eneru-wrapper.py, packaging/eneru.service, nfpm.yaml, and tests/test_packaging.py.

Contributor reading order¶

If you want to dig deeper, read in this order:

src/eneru/AGENTS.md for the module map and mixin conventions.
src/eneru/config.py for the data model.
src/eneru/monitor.py for the single-UPS runtime.
src/eneru/multi_ups.py for group coordination.
src/eneru/redundancy.py if you are working on A+B power.
src/eneru/notifications.py and src/eneru/stats.py if you are working on observability.
The matching tests under tests/ before changing behavior.