Kai Meredith

Personal Project / AI Infrastructure

Conductor: Local Agentic Inference Harness

Python Ollama Qwen3 ReAct Rich TUI View on GitHub →
The Mission A fully local agentic inference harness built to run on a 2021 MacBook Pro. Conductor gives a local LLM a ReAct loop, persistent memory, a sandboxed workspace, and a set of tools — file operations, URL fetching, shell commands — with no external API calls and no data leaving the machine.

System Architecture

graph TD U[User Input] --> TUI[Rich TUI] TUI --> RL[ReAct Loop] RL -->|Think| LLM[Qwen3 via Ollama] LLM -->|Act| TD2[Tool Dispatcher] TD2 --> FS[File Ops] TD2 --> WF[URL Fetch] TD2 --> SH[Shell Commands] TD2 --> MEM[Persistent Memory] FS & WF & SH & MEM -->|Observe| RL RL -->|Max 8 steps| OUT[Final Response]

Key Engineering Challenges

  • Two-model architecture rejected: Benchmarked a dispatcher model (qwen2.5-coder:7b) against running Qwen3:9b directly for tool routing. Qwen3 outperformed the smaller model on coding tasks within the 16GB memory constraint, making the added complexity unnecessary.
  • Workspace sandboxing: All file and shell tools operate within a scoped workspace directory. Paths are validated before execution to prevent traversal outside the sandbox, following OWASP guidelines for tool safety.
  • ReAct loop cap: The loop is capped at 8 reasoning steps to prevent runaway inference chains, with graceful degradation that returns the best available partial answer if the cap is hit.