Ralph Universal

The RalphUniversal class tracks adapter effectiveness across task categories and generates data-driven recommendations for adapter selection.

Interface

interface RalphUniversal {
  recordUsage(taskCategory: string, adapter: string, success: boolean): void
  getRecommendation(taskCategory: string): AdapterRecommendation
  getEffectiveness(adapter: string): AdapterEffectiveness
  getReport(): RalphReport
}
 
interface AdapterRecommendation {
  adapter: string
  confidence: number    // 0–100
  reason: string
  alternatives: Array<{ adapter: string; confidence: number }>
}
 
interface AdapterEffectiveness {
  adapter: string
  totalUses: number
  successRate: number
  byCategory: Record<string, { uses: number; successRate: number }>
  trend: 'improving' | 'stable' | 'declining'
}
 
interface RalphReport {
  totalEpisodes: number
  adapters: AdapterEffectiveness[]
  categories: Record<string, { bestAdapter: string; sampleSize: number }>
  generatedAt: Date
}

How It Works

Ralph maintains a rolling effectiveness matrix — rows are task categories, columns are adapters, values are success rates weighted by recency.

               claude-sonnet  claude-haiku  gpt-4o  local-7b
code              0.91          0.72        0.85     0.41
test              0.88          0.69        0.82     0.38
refactor          0.93          0.65        0.79     0.35
docs              0.85          0.81        0.87     0.52
research          0.89          0.74        0.91     0.44

`adapter_recommend` Tool

The recommendation engine is exposed as a tool that Nity calls during strategy planning:

{
  "name": "adapter_recommend",
  "description": "Get adapter recommendation for a task category",
  "parameters": {
    "taskCategory": {
      "type": "string",
      "enum": ["code", "test", "refactor", "docs", "research"],
      "description": "The category of task to execute"
    }
  }
}

Response

{
  "adapter": "claude-sonnet",
  "confidence": 87,
  "reason": "Highest success rate for code tasks (91% across 47 episodes)",
  "alternatives": [
    { "adapter": "gpt-4o", "confidence": 72 },
    { "adapter": "claude-haiku", "confidence": 58 }
  ]
}

Ralph recommendations improve over time. Early sessions rely on heuristics; as episode count grows, recommendations become data-driven. The confidence score reflects sample size — fewer than 5 episodes yields lower confidence regardless of success rate.

Confidence Scoring

Confidence is calculated from three factors:

Factor	Weight	Description
Sample size	40%	More episodes = higher confidence
Success rate	40%	Higher rate = higher confidence
Recency	20%	Recent episodes weighted more heavily

confidence = (sampleFactor × 0.4) + (successFactor × 0.4) + (recencyFactor × 0.2)

Where each factor is normalized to 0–100.

Cross-Adapter Learning

Ralph detects when a new adapter outperforms the current recommendation and updates the effectiveness matrix. It also identifies task categories where adapter choice doesn't matter (flat effectiveness across adapters) and reports these as "adapter-agnostic" categories to save compute.

Episode Recording Loop Engine

Episode Recording SimpleMem Integration