Local LLMs on CPU
Developers running small local models (like phi4-mini) face a fundamental friction: the trade-off between privacy and reliability. Small models struggle with structural adherence (valid JSON) and semantic nuance (category drift). On CPU hardware, every token is expensive, making long, descriptive system prompts a performance bottleneck.
To build a production-grade system on constrained hardware, we must move from prompt engineering to system engineering.
1. Signatures as contracts
DSPy allows us to stop treating the LLM as a chatbot and start treating it as a software component. We define a signature: a typed interface that the model must satisfy. This reduces the surface area for hallucinations.
class SorterSignature(dspy.Signature):
"""Analyze input and determine its type, subject, and topic."""
input = dspy.InputField()
reasoning = dspy.OutputField(desc="Chain of thought")
type = dspy.OutputField(desc="MUST be one of: project, log, entity...")
confidence = dspy.OutputField(desc="Score 0.0-1.0")By replacing fragile prompt tweaking with this compiled signature, we moved from ~70% accuracy to a 100% pass rate on our internal BDD test suite.
2. The safety loop (confidence gates)
In a rigorous system, we acknowledge that probabilistic models will fail. The critical mechanism here is the confidence gate. Any classification with a score below 0.8 is routed to a manual review folder.
This turns a potential AI failure into a simple asynchronous triage task, ensuring that low-confidence guesses never pollute the permanent file structure.
3. The async pattern
Running on a CPU-bound machine results in ~9 seconds of latency per classification. In a synchronous UI, this is a failure. However, by running the system as a background watchdog process, the latency becomes irrelevant. The user captures a thought and returns to their work; the system handles the heavy lifting out of sight, proving that high-end GPUs are not a requirement for agentic workflows.