turnstone/patterns/default.yaml
pyr0ball 683f54fd94 feat(patterns): add audio domain — PipeWire/ALSA xrun and quantum patterns
Six new patterns covering the PipeWire + ALSA audio failure modes that
surface as crackling/stuttering on Linux desktops:

- pipewire_overflow: protocol-pulse OVERFLOW channel messages (confirmed
  present in Muninn journal — dozens per incident)
- pipewire_underrun: pw.node/spa.alsa underrun messages
- alsa_xrun: ALSA-level xrun from kernel or ALSA lib (snd_pcm)
- pipewire_quantum_mismatch: sample-rate/quantum mismatch detection
- pipewire_node_error: PipeWire node failures (device unavailable)
- pipewire_jackdbus_missing: harmless JACK probe at INFO — suppresses
  false positives from daily PipeWire restarts

Also adds 'audio' as a valid domain value in the header comment.

Companion Robin knowledge doc:
  circuitforge-plans/robin/known-issues/pipewire-alsa-quantum-xrun.md
2026-06-10 11:33:19 -07:00

315 lines
12 KiB
YAML

# Turnstone pattern library — named regex patterns for log tagging at ingest time.
# Each matched pattern name is stored on RetrievedEntry.matched_patterns and
# used to boost retrieval relevance for diagnostic queries.
#
# domain: groups patterns into service health domains for triage-level summaries.
# Valid domains: service_health | networking | auth | storage | memory |
# kernel | power | web_proxy | media | gpu | audio
#
# Patterns are applied in order; multiple can match a single entry.
patterns:
- name: service_restart
pattern: "(restarting|restart requested|service.*start)"
severity: WARN
domain: service_health
description: Service restart detected
- name: connection_lost
pattern: "(connection (lost|dropped|refused|timed? out)|disconnect(ed)?)"
severity: ERROR
domain: networking
description: Network or device connection failure
- name: auth_failure
pattern: "(auth(entication)? (failed?|error|denied)|permission denied|unauthorized)"
severity: ERROR
domain: auth
description: Authentication or authorization failure
- name: oom
pattern: "(out of memory|OOM|killed process|cannot allocate)"
severity: CRITICAL
domain: memory
description: Out-of-memory condition
- name: segfault
pattern: "(segmentation fault|segfault|SIGSEGV|core dump)"
severity: CRITICAL
domain: kernel
description: Process crash or memory corruption
- name: disk_full
pattern: "(no space left|disk full|filesystem.*full|ENOSPC)"
severity: ERROR
domain: storage
description: Storage capacity exhausted
- name: timeout
pattern: "(timed? out|deadline exceeded|operation timed?)"
severity: WARN
domain: networking
description: Operation timeout
- name: caddy_tls_error
pattern: "(acme|certificate|tls).*(error|fail|invalid|expired|renew)"
severity: ERROR
domain: web_proxy
description: Caddy TLS or certificate error
- name: caddy_config_error
pattern: "(config|caddyfile|directive).*(error|invalid|unknown|unrecognized)"
severity: ERROR
domain: web_proxy
description: Caddy configuration error
- name: caddy_auth_error
pattern: "(forward_auth|basicauth|basic_auth).*(error|fail|denied|invalid|unreachable)"
severity: ERROR
domain: web_proxy
description: Caddy authentication middleware failure
- name: caddy_upstream_error
pattern: "(upstream|backend|reverse.proxy).*(error|fail|unreachable|refused|timeout)"
severity: ERROR
domain: web_proxy
description: Caddy upstream/backend failure
- name: service_update
pattern: "(upgraded?|updated?|installing|dpkg|apt|package).*(caddy|nginx|apache|proxy)"
severity: INFO
domain: web_proxy
description: Web server package update detected
- name: power_failure
pattern: "(power (fail|loss|outage|cut)|ups|battery|shutdown.*power|lost power)"
severity: CRITICAL
domain: power
description: Power failure or UPS event
- name: network_interface
pattern: "(eth[0-9]|ens[0-9]|enp[0-9]|wlan[0-9]).*(down|up|carrier|link)"
severity: WARN
domain: networking
description: Network interface state change
- name: ip_change
pattern: "(new ip|ip.*(changed|assigned|address)|dhcp.*(ack|offer|bound|renew))"
severity: INFO
domain: networking
description: IP address change or DHCP event
# ── System / journald patterns ─────────────────────────────────────────────
- name: systemd_fail
pattern: "(Failed to start|failed with result|entered failed state|start request repeated too quickly|Main process exited)"
severity: ERROR
domain: service_health
description: systemd service failed to start or crashed
- name: oom_kill
pattern: "(Killed process|oom.kill|oom_kill_process|Out of memory: Kill|memory cgroup out of memory)"
severity: CRITICAL
domain: memory
description: Kernel OOM killer terminated a process
- name: disk_hw_error
pattern: "(ata[0-9]|sd[a-z]|nvme[0-9]).*(error|failed|reset|timeout|exception|EH|FAILED COMMAND)"
severity: ERROR
domain: storage
description: Storage device hardware error or reset
- name: fs_error
pattern: "(EXT4-fs error|XFS.*error|BTRFS.*error|I/O error|blk_update_request.*error|buffer I/O error)"
severity: ERROR
domain: storage
description: Filesystem or block I/O error
- name: kernel_error
pattern: "(kernel: BUG|kernel panic|Oops:|general protection fault|Call Trace|RIP:.*[0-9a-f]{16})"
severity: CRITICAL
domain: kernel
description: Kernel bug, panic, or oops — system may be unstable
- name: ssh_brute
pattern: "(Failed password|Invalid user|authentication failure|Connection closed by authenticating user).*(sshd|ssh)"
severity: WARN
domain: auth
description: SSH authentication failure — possible brute force
- name: container_crash
pattern: "(container.*exited|oci runtime.*error|podman.*error|docker.*error|container.*killed|OCI.*failed)"
severity: ERROR
domain: service_health
description: Container runtime error or unexpected exit
- name: smart_error
pattern: "(smartd|SMART.*error|reallocated sector|pending sector|uncorrectable sector|Current_Pending_Sector)"
severity: CRITICAL
domain: storage
description: SMART disk health warning — potential drive failure
- name: nfs_error
pattern: "(nfs.*error|nfs.*timeout|RPC.*timed out|nfs4.*server.*not responding|mount.*nfs.*failed)"
severity: ERROR
domain: networking
description: NFS mount or RPC timeout
# Add device/service-specific patterns below this line:
- name: qbit_tracker_error
pattern: "(tracker|announce).*(not working|error|fail|unreachable|timeout|refused|invalid)"
severity: WARN
domain: media
description: qBittorrent tracker connection or announce failure
- name: qbit_port_bind
pattern: "(couldn't? listen|bind.*fail|port.*in use|listening.*fail)"
severity: CRITICAL
domain: media
description: qBittorrent failed to bind listen port — firewall or port conflict
- name: qbit_disk_error
pattern: "(cannot (write|open|create)|disk.*error|i/o error|file.*fail|write.*fail)"
severity: ERROR
domain: media
description: qBittorrent disk write or file access failure
- name: qbit_hash_fail
pattern: "(hash.*(check|fail|mismatch)|recheck|piece.*fail)"
severity: WARN
domain: media
description: qBittorrent torrent hash verification failure — possible corrupt data
- name: qbit_peer_ban
pattern: "(peer.*ban|banned.*peer|blocked.*peer)"
severity: INFO
domain: media
description: qBittorrent peer banned (encryption enforcement or bad actor)
- name: qbit_download_complete
pattern: "(download.*complet|torrent.*finish|has finished downloading)"
severity: INFO
domain: media
description: qBittorrent torrent download completed
- name: qbit_ratio_limit
pattern: "(ratio.*reach|seeding.*limit|stop.*seeding|upload.*limit)"
severity: INFO
domain: media
description: qBittorrent seeding ratio or time limit reached
- name: qbit_session_error
pattern: "(session.*error|couldn't? resume|resume.*fail|torrent.*error)"
severity: ERROR
domain: media
description: qBittorrent session or resume data error
- name: plex_eae_failure
pattern: "(EAE timeout|EAE not running|eac3_eae.*error reading output|Error submitting packet to decoder.*I/O error)"
severity: ERROR
domain: media
description: Plex EasyAudioEncoder (EAC3 Dolby audio transcoder) crashed — service restart required
# - name: avcx_device_error
# pattern: "ERR-\d{4}"
# severity: ERROR
# description: AVCX device error code
# ── VPN / tunnel patterns ──────────────────────────────────────────────────
- name: vpn_tunnel_fail
pattern: "(wg-quick@|wireguard|spirit-city-tunnel|cf-orch-tunnel|cf-tunnel|openvpn|vpn).*(failed|error|exit.code|timeout|connection reset)"
severity: ERROR
domain: networking
description: VPN or WireGuard tunnel service failed — remote node may be unreachable
- name: vpn_handshake
pattern: "(handshake|peer.*allowed|WireGuard|wg-quick).*(initiating|complete|timeout|fail|retrying)"
severity: WARN
domain: networking
description: WireGuard peer handshake event — track for timeout/retry patterns
- name: dns_degraded
pattern: "(degraded feature set|DNS.*fall.?back|resolver.*fail|NXDOMAIN|DNS.*timeout|SERVFAIL)"
severity: WARN
domain: networking
description: DNS resolver degradation or fallback — often precedes connectivity failures
# ── GPU / NVIDIA driver patterns ───────────────────────────────────────────
- name: nvidia_api_mismatch
pattern: "(NVRM: API mismatch|nvidia.*version mismatch|driver.*mismatch|kernel module.*mismatch)"
severity: ERROR
domain: gpu
description: NVIDIA kernel module version does not match userspace driver — GPU ops will fail until driver reinstalled
- name: nvidia_xid
pattern: "(NVRM: Xid|Xid.*(error|critical)|GPU.*Xid)"
severity: CRITICAL
domain: gpu
description: NVIDIA Xid error — GPU hardware fault or driver crash (check nvidia-smi error code)
- name: nvidia_gpu_reset
pattern: "(nvidia.*reset|GPU.*reset|NVRM.*reset|nvml.*error|NVLink.*fail)"
severity: ERROR
domain: gpu
description: NVIDIA GPU reset or NVLink fault — possible hardware instability
# ── Power / thermal patterns ───────────────────────────────────────────────
- name: acpi_error
pattern: "(ACPI.*failed|ACPI.*error|ACPI.*_DSM|acpi.*_PPC|ACPI BIOS Error)"
severity: WARN
domain: kernel
description: ACPI firmware evaluation failure — often harmless but can indicate BIOS/power management issues
- name: thermal_throttle
pattern: "(CPU.*throttl|thermal throttl|Package temp|TjMax|temperature.*critical|No RAPL|RAPL.*not available)"
severity: WARN
domain: power
description: CPU/GPU thermal throttling or thermal management subsystem unavailable
- name: undervoltage
pattern: "(under.?voltage|brownout|voltage.*(low|critical)|power supply.*insufficient)"
severity: ERROR
domain: power
description: Undervoltage event — instability risk, check PSU and cable connections
# ── Audio / PipeWire / ALSA ──────────────────────────────────────────────────
- name: pipewire_overflow
pattern: "(OVERFLOW channel|stream.*OVERFLOW|protocol.pulse.*OVERFLOW)"
severity: WARN
domain: audio
description: PipeWire-Pulse stream buffer overflow — client not draining audio fast enough; usually indicates a quantum/period-size mismatch or CPU scheduling issue
- name: pipewire_underrun
pattern: "(pw\\.node.*underrun|spa\\.alsa.*underrun|alsa.*underrun|UNDERRUN)"
severity: WARN
domain: audio
description: PipeWire/ALSA buffer underrun (xrun) — audio thread missed its deadline; increase quantum or period-size for the affected device
- name: alsa_xrun
pattern: "(ALSA.*[Xx][Rr][Uu][Nn]|alsa.*xrun|snd_pcm.*xrun|pcm.*underrun|pcm.*overrun)"
severity: WARN
domain: audio
description: ALSA xrun (hardware buffer overrun/underrun) — increase api.alsa.period-size via WirePlumber rule or raise clock.min-quantum
- name: pipewire_quantum_mismatch
pattern: "(quantum.*mismatch|rate.*mismatch|sample.rate.*mismatch|resampl.*fail|can.*t adapt quantum)"
severity: WARN
domain: audio
description: PipeWire quantum or sample-rate mismatch between nodes — check for mixed 44100/48000 streams; may need per-device WirePlumber rules
- name: pipewire_node_error
pattern: "(pw\\.node.*error|node.*ERROR|pipewire.*failed to set|spa\\.alsa.*error|alsa_sink.*error|alsa_source.*error)"
severity: ERROR
domain: audio
description: PipeWire node error — device may be unavailable or misconfigured
- name: pipewire_jackdbus_missing
pattern: "(jackdbus.*reply|jackaudio.*service.*not.*provided|org\\.jackaudio\\.service)"
severity: INFO
domain: audio
description: PipeWire JACK D-Bus probe — JACK not running; benign on non-JACK systems, fires once per PipeWire restart