feat: cf_core.audio — shared PCM/audio utility module (cf-voice + Sparrow dedup) #50

New issue

Closed

opened 2026-04-17 15:01:35 -07:00 by pyr0ball · 0 comments

pyr0ball commented

2026-04-17 15:01:35 -07:00

Owner

Context

Both cf-voice and Sparrow hand-roll the same audio utility functions independently:

PCM int16 bytes -> float32 numpy array conversion
16kHz resampling
RMS energy gate (detect silence)
Chunk accumulation buffer (collect N x 100ms chunks into a classify window)

cf-voice has these in stt.py and context.py. Sparrow will need them for torchaudio stitching (export) and for any future acoustic analysis path.

Proposed module

circuitforge_core/
└── audio/
    ├── __init__.py
    ├── convert.py      pcm_to_float32(), float32_to_pcm(), bytes_to_np()
    ├── resample.py     resample(audio, from_hz, to_hz) -> np.ndarray
    ├── gate.py         is_silent(audio, rms_threshold=0.005) -> bool
    └── buffer.py       ChunkAccumulator (accumulate(), flush(), is_ready())

MIT licensed — no inference, no BSL

Pure signal processing utilities. No model weights. No HuggingFace. Dependency: numpy only (scipy for high-quality resampling).

Consumers

cf-voice: replace hand-rolled versions in stt.py + context.py
Sparrow: use in services/musicgen.py and api/endpoints/export.py
Future: Avocet (audio preprocessing for training corpus), Linnet (chunk accumulation)

Priority

Low — not a current blocker. File now so it doesn't get lost. Implement after cf_core.musicgen (#49) is done.

## Context Both cf-voice and Sparrow hand-roll the same audio utility functions independently: - PCM int16 bytes -> float32 numpy array conversion - 16kHz resampling - RMS energy gate (detect silence) - Chunk accumulation buffer (collect N x 100ms chunks into a classify window) cf-voice has these in `stt.py` and `context.py`. Sparrow will need them for torchaudio stitching (export) and for any future acoustic analysis path. ## Proposed module ``` circuitforge_core/ └── audio/ ├── __init__.py ├── convert.py pcm_to_float32(), float32_to_pcm(), bytes_to_np() ├── resample.py resample(audio, from_hz, to_hz) -> np.ndarray ├── gate.py is_silent(audio, rms_threshold=0.005) -> bool └── buffer.py ChunkAccumulator (accumulate(), flush(), is_ready()) ``` ## MIT licensed — no inference, no BSL Pure signal processing utilities. No model weights. No HuggingFace. Dependency: numpy only (scipy for high-quality resampling). ## Consumers - cf-voice: replace hand-rolled versions in `stt.py` + `context.py` - Sparrow: use in `services/musicgen.py` and `api/endpoints/export.py` - Future: Avocet (audio preprocessing for training corpus), Linnet (chunk accumulation) ## Priority Low — not a current blocker. File now so it doesn't get lost. Implement after cf_core.musicgen (#49) is done.