What Building Agentic Workflows In-House Actually Means for RIAs
If your RIA is building AI tools in-house, you just became a product manager. A practical checklist covering scope, costs, integrations, security, compliance, and more.

Theo Katsoulis
Founder & CEO
Before Prodos, I ran operations at Playbook, a robo-advisor that scaled past 200K accounts while AUM grew 30% a month. I built our compliance program, survived an SEC exam, integrated 40+ vendors, and shipped internal tools as the ops lead. Most of what worked came from partnering closely with our lead engineer, now my co-founder at Prodos. I've made most of the mistakes below already.
If you've decided to build something in-house (an agent, a custom tool, an automation nobody else has bothered to solve), you just became a product manager. You probably didn't sign up for that; this post is about what that actually means.
The PM job has four parts. Most firms only think about the last one.
1/ Scope before you build
Firms that ship pick one problem, write it down in two sentences, and explicitly list what they are not building yet. Firms that don't ship start with "extract data from statements" and end up halfway through a reporting layer, a client portal, and a Slackbot. Pick the smallest useful thing. Ship it. Move on.
At Playbook, the smallest useful thing was Retool. The UI wasn't pretty, but as the only ops person at a startup growing AUM 30% a month, it let me build functional tools without pulling in an engineer for every change: enter a client email, run a few queries, pull account details, surface compliance flags, draft an outbound email. One input, one click. Not perfect, but it covered 80% of my day-to-day, and that's what mattered.
And before you touch code, model token costs. LLMs aren't software. They're a meter. Cap your daily spend, set alerts, and cap agent loop turns (we use ten). An agent that can't reach an answer will keep burning your budget trying. Nobody else is going to catch that for you. That's the PM job.
2/ The systems you're building on top of
Three things decide whether the build survives contact with reality: integrations, data quality, and security.
Integrations. Most fintech "integrations" don't move data. They move notifications. At Playbook we integrated 40+ vendors and learned it the hard way.
My personal favorite exchange: "Yes, we have an API." Okay, can we use it to write to a client profile? "No, that won't work. You either enter it manually or pay our services team $50 an hour to do it." Great.
It's not an integration problem; it's a product problem. Get the documentation before you sign the contract.
Data quality. This is where most builds quietly fail. AI doesn't fix bad data. It scales it. Inconsistent account numbers, three spellings of the same client's name, PDFs in ten folders nobody has touched in years. All of it gets amplified the moment you put an agent on top of it.
A concrete example: if your vendor API accepts any text value, the LLM will store a birthday as "December 1st, 1990" one day and "12-01-1990" the next. Now every downstream workflow (birthday emails, RMD timing, Social Security planning) is broken in ways that are hard to find and harder to clean.
Use a validation layer (Zod or Pydantic if you're coding, or Typeform's built-in rules) that rejects malformed values at the door.
Security. You're not running models on-prem (almost no RIA is). You're using cloud providers, and that's fine as long as you're deliberate. Two things to nail down. First, when you chain tools together, especially MCPs from newer providers, you're implicitly trusting everyone in that chain. For anything touching client PII, only use providers you can actually vet. Second, most major LLM providers offer enterprise agreements that restrict retention and don't train on your inputs. Consumer tiers don't. That's the distinction that matters, not cloud vs. on-prem. Zero-data-retention settings go further, but you'll need to store conversation context somewhere yourself. Otherwise the agent restarts blind every turn.
The bigger threat at most firms isn't a vendor breach anyway. It's employees already sharing client data over email and Teams, or falling for phishing links. Your new system should make that less likely, not more.
3/ Running it like a firm, not a dev shop
Permissions. When your agent acts under your personal login, who did it? Six months in, you won't be able to tell a human action from an agent action. That's a problem when compliance comes asking. Give the agent its own credentials and its own logging, even if the tools make it ugly. Proof of concept can share access; production can't. This is where most internal builds fail an audit. Not because the tech was bad, but because nobody owned the accountability question early enough.
Tribal knowledge. If you can't explain the process, you can't automate it. Most of how your firm actually works isn't written down. It lives in someone's head, or a Word doc nobody's updated in three years, or the habits of the one ops person who's been there since the beginning. Document the steps, edge cases, and judgment calls before you build.
If writing feels hard, record yourself explaining the process for 15 minutes and have AI extract the steps. Faster than it sounds, and it surfaces details you'd otherwise forget.
This problem scales from one-person ops teams to firms with hundreds of engineers. At MassChallenge FinTech 2025, both Citizens and MassMutual issued challenges specifically around agentic AI in regulated workflows: autonomous workflows, customer onboarding, operations automation, governance. We made MassMutual's final round. The takeaway: nobody has this figured out yet.
4/ Proving it works
Testing. AI fails in non-obvious ways. A model upgrade, a prompt tweak, a vendor change. Any of those can quietly shift outputs.
One of our integrations changed how it stored addresses and didn't notify us. Our API calls kept returning "success," but the address field was silently dropped. No error, no warning.
That's the failure mode. The process doesn't crash; it just gets quieter. We caught it quickly, but without testing or guardrails, this kind of issue is hard to detect until it turns into a compliance problem.
Define a small set of critical workflows that have to produce the same result every time (extract from a statement, fill a form, tier clients). Run them on every meaningful change. That's your regression suite.
Monitoring. Most major LLM providers have dashboards that log inputs, outputs, reasoning, and actions. If compliance signs off on the provider holding that data, use it. Lowest-effort path, and enough for most firms starting out.
Compliance by design. Take a 401(k) rollover as an example. Before recommending one, you need to show you asked the right questions (DOL guidance), understood the existing plan, and weighed fees. That evidence lives in an email thread, a meeting transcript, and a signed disclosure. If AI is already in those places, pulling the audit trail becomes a byproduct, not a project.
At Playbook, when the SEC examined us, producing a trade blotter and account detail spreadsheet took me a few SQL queries and under 30 minutes. Our engineers had built logging from day one. That's what compliance by design actually looks like. When the infrastructure is right, the hard stuff is routine.
The ask
This is harder than getting ChatGPT to generate a table. Scope, data, integrations, permissions, testing, compliance. Somebody owns every one of those. If you're building in-house, that somebody is you.
That's the PM job you didn't sign up for. Now you know.
If you're about to greenlight a build, send me the scoping doc (founders@prodos.dev). I've made these mistakes already.

