System Architecture

CIRCA is the product surface. The AI Risk Analyst is the reasoning engine underneath it.

Read together, they resolve into one system: a domain-configurable credit-risk co-analyst that turns portfolio data into cited analysis, explanations, and committee-ready outputs across chat, workspace, and canvas.

The Problem

Collections managers and CROs want to understand the health of a loan product (e.g., "how is the B2C book doing?"). Today, they have to open 5 different dashboards, scan dozens of charts, and manually reconcile numbers across delinquency, flow rates, bucket mix, and resolution trends to form their own judgment.

This system collapses that entire workflow into a single conversational question and returns a structured, analyst-quality written brief in seconds.

Three-Layer Design

1. Data Layer

A PostgreSQL database with 15 collections-specific schemas (portfolio, flows, movements, mix impact). Tables hold live monthly snapshots broken down by product, delinquency bucket, partner, and geography. It utilizes dynamic MAX() queries to handle relative timeframes without hardcoding.

2. Tool Layer

A registry of 42 tools wrapping CRUD functions. Each tool has a name, a description for the LLM, a Pydantic schema for parameters, and a markdown knowledge file explaining the data semantics. The tool-call pattern prevents hallucination: the LLM first calls getFilterOptions to discover valid values before querying.

3. Skills Layer

Sitting on top of the tools, "Skills" are named expertise packages. Triggered by keyword intent, a skill narrows the active tool whitelist (reducing LLM confusion/tokens) and injects highly specific reasoning instructions. For example, the product_overview skill enforces an exact 4-paragraph brief format (Verdict, Key Observations, Observation, Focus) and requires the LLM to run a strict analyst reconciliation checklist before generating text.

Prompt Assembly & Execution

When a query arrives, the intent router matches it to a skill. The Prompt Assembler builds the context in layers:

  • The Soul: Business context, bucket lifecycles, metric thresholds, and rules.
  • Schema Knowledge: Column definitions and interpretation guides.
  • Skill Body: Expert reasoning instructions.
  • Tool Knowledge: Blocks for specific active tools.
  • Mode Rules: risk-analyst mode for compact tables vs data-viz mode for rich charting.

The agentic loop streams via Azure OpenAI, executing tools in-process via FastAPI, and finally streaming structured markdown to the frontend.

Automated Evaluation Framework

To prevent agent drift and hallucination, I built a standalone LLM-judge evaluation studio. It tests the agent against curated lending queries, scoring responses based on four strict metrics:

MetricWeightDescription & Red Flags
Data Accuracy40%Numbers cited must exactly match tool results. Hallucinated KPIs trigger an automatic failure (0 score).
Narrative Fidelity30%Prose direction must match data. E.g., saying "delinquency is stable" when data shows a 15% rise fails immediately.
Format Adherence20%Enforces skill templates exactly. CRO briefs must be 4 paragraphs with no raw tables in the prose.
Tool Sequence10%Verifies correct tool usage and ensures clarification steps are not skipped.

The Impact

Cuts ad-hoc analysis time from hours to seconds. Gives CROs a self-service layer over portfolio data. Lets engineering ship new analytics without touching core infrastructure.