feat: bundled skills suggestion list and content filter utility

- config/skills_suggestions.yaml: 168 curated tags across skills (77),
  domains (40), keywords (51) covering CS/TAM/ops and common tech roles;
  structured for future community aggregate (paid tier backlog)
- scripts/skills_utils.py: filter_tag() rejects blanks, URLs, profanity,
  overlong strings, disallowed chars, and repeated-char runs;
  load_suggestions() reads bundled YAML per category
This commit is contained in:
pyr0ball 2026-02-26 13:09:32 -08:00
parent e982fa7a8b
commit 93bf6b3c6f
2 changed files with 260 additions and 0 deletions

View file

@ -0,0 +1,193 @@
# skills_suggestions.yaml — Bundled tag suggestions for the Skills & Keywords UI.
# Shown as searchable options in the multiselect. Users can add custom tags beyond these.
# Future: community aggregate (paid tier) will supplement this list from anonymised installs.
skills:
# ── Customer Success & Account Management ──
- Customer Success
- Technical Account Management
- Account Management
- Customer Onboarding
- Renewal Management
- Churn Prevention
- Expansion Revenue
- Executive Relationship Management
- Escalation Management
- QBR Facilitation
- Customer Advocacy
- Voice of the Customer
- Customer Health Scoring
- Success Planning
- Customer Education
- Implementation Management
# ── Revenue & Operations ──
- Revenue Operations
- Sales Operations
- Pipeline Management
- Forecasting
- Contract Negotiation
- Upsell & Cross-sell
- ARR / MRR Management
- NRR Optimization
- Quota Attainment
# ── Leadership & Management ──
- Team Leadership
- People Management
- Cross-functional Collaboration
- Change Management
- Stakeholder Management
- Executive Presentation
- Strategic Planning
- OKR Setting
- Hiring & Recruiting
- Coaching & Mentoring
- Performance Management
# ── Project & Program Management ──
- Project Management
- Program Management
- Agile / Scrum
- Kanban
- Risk Management
- Resource Planning
- Process Improvement
- SOP Development
# ── Technical Skills ──
- SQL
- Python
- Data Analysis
- Tableau
- Looker
- Power BI
- Excel / Google Sheets
- REST APIs
- Salesforce
- HubSpot
- Gainsight
- Totango
- ChurnZero
- Zendesk
- Intercom
- Jira
- Confluence
- Notion
- Slack
- Zoom
# ── Communications & Writing ──
- Executive Communication
- Technical Writing
- Proposal Writing
- Presentation Skills
- Public Speaking
- Stakeholder Communication
# ── Compliance & Security ──
- Compliance
- Risk Assessment
- SOC 2
- ISO 27001
- GDPR
- Security Awareness
- Vendor Management
domains:
# ── Software & Tech ──
- B2B SaaS
- Enterprise Software
- Cloud Infrastructure
- Developer Tools
- Cybersecurity
- Data & Analytics
- AI / ML Platform
- FinTech
- InsurTech
- LegalTech
- HR Tech
- MarTech
- AdTech
- DevOps / Platform Engineering
- Open Source
# ── Industry Verticals ──
- Healthcare / HealthTech
- Education / EdTech
- Non-profit / Social Impact
- Government / GovTech
- E-commerce / Retail
- Manufacturing
- Financial Services
- Media & Entertainment
- Music Industry
- Logistics & Supply Chain
- Real Estate / PropTech
- Energy / CleanTech
- Hospitality & Travel
# ── Market Segments ──
- Enterprise
- Mid-Market
- SMB / SME
- Startup
- Fortune 500
- Public Sector
- International / Global
# ── Business Models ──
- Subscription / SaaS
- Marketplace
- Usage-based Pricing
- Professional Services
- Self-serve / PLG
keywords:
# ── CS Metrics & Outcomes ──
- NPS
- CSAT
- CES
- Churn Rate
- Net Revenue Retention
- Gross Revenue Retention
- Logo Retention
- Time-to-Value
- Product Adoption
- Feature Utilisation
- Health Score
- Customer Lifetime Value
# ── Sales & Growth ──
- ARR
- MRR
- GRR
- NRR
- Expansion ARR
- Pipeline Coverage
- Win Rate
- Average Contract Value
- Land & Expand
- Multi-threading
# ── Process & Delivery ──
- Onboarding
- Implementation
- Knowledge Transfer
- Escalation
- SLA
- Root Cause Analysis
- Post-mortem
- Runbook
- Playbook Development
- Feedback Loop
- Product Roadmap Input
# ── Team & Culture ──
- Cross-functional
- Distributed Team
- Remote-first
- High-growth
- Fast-paced
- Autonomous
- Data-driven
- Customer-centric
- Empathetic Leadership
- Inclusive Culture
# ── Job-seeker Keywords ──
- Strategic
- Proactive
- Hands-on
- Scalable Processes
- Operational Excellence
- Business Impact
- Executive Visibility
- Player-Coach

67
scripts/skills_utils.py Normal file
View file

@ -0,0 +1,67 @@
"""
skills_utils.py Content filter and suggestion loader for the skills tagging system.
load_suggestions(category) list[str] bundled suggestions for a category
filter_tag(tag) str | None cleaned tag, or None if rejected
"""
from __future__ import annotations
import re
from pathlib import Path
_SUGGESTIONS_FILE = Path(__file__).parent.parent / "config" / "skills_suggestions.yaml"
# ── Content filter ─────────────────────────────────────────────────────────────
# Tags must be short, human-readable skill/domain labels. No URLs, no abuse.
_BLOCKED = {
# profanity placeholder — extend as needed
"fuck", "shit", "ass", "bitch", "cunt", "dick", "bastard", "damn",
}
_URL_RE = re.compile(r"https?://|www\.|\.com\b|\.net\b|\.org\b", re.I)
_ALLOWED_CHARS = re.compile(r"^[\w\s\-\.\+\#\/\&\(\)]+$", re.UNICODE)
def filter_tag(raw: str) -> str | None:
"""Return a cleaned tag string, or None if the tag should be rejected.
Rejection criteria:
- Blank after stripping
- Too short (< 2 chars) or too long (> 60 chars)
- Contains a URL pattern
- Contains disallowed characters
- Matches a blocked term (case-insensitive, whole-word)
- Repeated character run (e.g. 'aaaaa')
"""
tag = " ".join(raw.strip().split()) # normalise whitespace
if not tag or len(tag) < 2:
return None
if len(tag) > 60:
return None
if _URL_RE.search(tag):
return None
if not _ALLOWED_CHARS.match(tag):
return None
lower = tag.lower()
for blocked in _BLOCKED:
if re.search(rf"\b{re.escape(blocked)}\b", lower):
return None
if re.search(r"(.)\1{4,}", lower): # 5+ repeated chars
return None
return tag
# ── Suggestion loader ──────────────────────────────────────────────────────────
def load_suggestions(category: str) -> list[str]:
"""Return the bundled suggestion list for a category ('skills'|'domains'|'keywords').
Returns an empty list if the file is missing or the category is not found.
"""
if not _SUGGESTIONS_FILE.exists():
return []
try:
import yaml
data = yaml.safe_load(_SUGGESTIONS_FILE.read_text()) or {}
return list(data.get(category, []))
except Exception:
return []