feat(classifier): add Hybrid-BERT label mapping shim for krishnas4415/log-anomaly-detection-models #41

Closed
opened 2026-05-25 20:32:33 -07:00 by pyr0ball · 0 comments
Owner

Context

The backup Stage 2 classifier (krishnas4415/log-anomaly-detection-models Hybrid-BERT, MIT) has been investigated. Labels confirmed from the config.

Label mapping (confirmed)

Hybrid-BERT output Turnstone SeverityLabel
normal INFO
security_anomaly ERROR
system_failure CRITICAL
performance_issue WARN
network_anomaly WARN
config_error ERROR
hardware_issue CRITICAL

Problem: non-standard checkpoint format

This model is a raw PyTorch checkpoint (pytorch_model.pt), not a standard HF model. AutoModelForSequenceClassification.from_pretrained() and pipeline() both fail because:

  1. No model_type in root config.json — models are buried in subdirectories (models/Hybrid-BERT-Log-Anomaly-Detection/)
  2. pytorch_model.pt is a raw save, not pytorch_model.bin with tied HF architecture
  3. Architecture note: 'Hybrid BERT with Template Features' — the template feature extraction is undocumented, likely a preprocessing step not included in the checkpoint

To use it: load bert-base-uncased with a custom 7-class head, then load the .pt weights manually via torch.load(). The template feature part is unknown risk.

Recommendation

Search for a better-packaged log severity classifier on HF Hub before investing in custom loading code for this one. Candidates to evaluate:

  • malduwez/LogBERT-v1 — was mentioned in early spec drafts; verify it still exists
  • Any model with standard HF pipeline compatibility and ERROR/WARN/INFO/CRITICAL or anomaly-type labels on LogHub data
  • Fine-tuning byviz/bylastic_classification_logs further on Turnstone's own log corpus (using Avocet) may be more practical than loading this model

Current status

byviz/bylastic_classification_logs is active as Stage 2 primary. The krishnas4415 model is not usable without significant custom integration work.

## Context The backup Stage 2 classifier (`krishnas4415/log-anomaly-detection-models` Hybrid-BERT, MIT) has been investigated. Labels confirmed from the config. ## Label mapping (confirmed) | Hybrid-BERT output | Turnstone SeverityLabel | |---|---| | normal | INFO | | security_anomaly | ERROR | | system_failure | CRITICAL | | performance_issue | WARN | | network_anomaly | WARN | | config_error | ERROR | | hardware_issue | CRITICAL | ## Problem: non-standard checkpoint format This model is a raw PyTorch checkpoint (`pytorch_model.pt`), not a standard HF model. `AutoModelForSequenceClassification.from_pretrained()` and `pipeline()` both fail because: 1. No `model_type` in root `config.json` — models are buried in subdirectories (`models/Hybrid-BERT-Log-Anomaly-Detection/`) 2. `pytorch_model.pt` is a raw save, not `pytorch_model.bin` with tied HF architecture 3. Architecture note: 'Hybrid BERT with Template Features' — the template feature extraction is undocumented, likely a preprocessing step not included in the checkpoint **To use it:** load `bert-base-uncased` with a custom 7-class head, then load the `.pt` weights manually via `torch.load()`. The template feature part is unknown risk. ## Recommendation Search for a better-packaged log severity classifier on HF Hub before investing in custom loading code for this one. Candidates to evaluate: - `malduwez/LogBERT-v1` — was mentioned in early spec drafts; verify it still exists - Any model with standard HF pipeline compatibility and `ERROR/WARN/INFO/CRITICAL` or anomaly-type labels on LogHub data - Fine-tuning `byviz/bylastic_classification_logs` further on Turnstone's own log corpus (using Avocet) may be more practical than loading this model ## Current status `byviz/bylastic_classification_logs` is active as Stage 2 primary. The `krishnas4415` model is not usable without significant custom integration work.
pyr0ball added this to the beta milestone 2026-06-01 15:10:00 -07:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/turnstone#41
No description provided.