CIRCA is the product surface. The AI Risk Analyst is the reasoning engine underneath it.
Read together, they resolve into one system: a domain-configurable credit-risk co-analyst that turns portfolio data into cited analysis, explanations, and committee-ready outputs across chat, workspace, and canvas.
The Problem
Collections managers and CROs want to understand the health of a loan product (e.g., "how is the B2C book doing?"). Today, they have to open 5 different dashboards, scan dozens of charts, and manually reconcile numbers across delinquency, flow rates, bucket mix, and resolution trends to form their own judgment.
This system collapses that entire workflow into a single conversational question and returns a structured, analyst-quality written brief in seconds.
Three-Layer Design
1. Data Layer
A PostgreSQL database with 15 collections-specific schemas (portfolio, flows, movements, mix impact). Tables hold live monthly snapshots broken down by product, delinquency bucket, partner, and geography. It utilizes dynamic MAX() queries to handle relative timeframes without hardcoding.
2. Tool Layer
A registry of 42 tools wrapping CRUD functions. Each tool has a name, a description for the LLM, a Pydantic schema for parameters, and a markdown knowledge file explaining the data semantics. The tool-call pattern prevents hallucination: the LLM first calls getFilterOptions to discover valid values before querying.
3. Skills Layer
Sitting on top of the tools, "Skills" are named expertise packages. Triggered by keyword intent, a skill narrows the active tool whitelist (reducing LLM confusion/tokens) and injects highly specific reasoning instructions. For example, the product_overview skill enforces an exact 4-paragraph brief format (Verdict, Key Observations, Observation, Focus) and requires the LLM to run a strict analyst reconciliation checklist before generating text.
Prompt Assembly & Execution
When a query arrives, the intent router matches it to a skill. The Prompt Assembler builds the context in layers:
- The Soul: Business context, bucket lifecycles, metric thresholds, and rules.
- Schema Knowledge: Column definitions and interpretation guides.
- Skill Body: Expert reasoning instructions.
- Tool Knowledge: Blocks for specific active tools.
- Mode Rules:
risk-analystmode for compact tables vsdata-vizmode for rich charting.
The agentic loop streams via Azure OpenAI, executing tools in-process via FastAPI, and finally streaming structured markdown to the frontend.
Automated Evaluation Framework
To prevent agent drift and hallucination, I built a standalone LLM-judge evaluation studio. It tests the agent against curated lending queries, scoring responses based on four strict metrics:
| Metric | Weight | Description & Red Flags |
|---|---|---|
| Data Accuracy | 40% | Numbers cited must exactly match tool results. Hallucinated KPIs trigger an automatic failure (0 score). |
| Narrative Fidelity | 30% | Prose direction must match data. E.g., saying "delinquency is stable" when data shows a 15% rise fails immediately. |
| Format Adherence | 20% | Enforces skill templates exactly. CRO briefs must be 4 paragraphs with no raw tables in the prose. |
| Tool Sequence | 10% | Verifies correct tool usage and ensures clarification steps are not skipped. |
The Impact
Cuts ad-hoc analysis time from hours to seconds. Gives CROs a self-service layer over portfolio data. Lets engineering ship new analytics without touching core infrastructure.