What Building Agentic Workflows In-House Actually Means for RIAs
If your RIA is building AI tools in-house, you just became a product manager. A practical checklist covering scope, costs, integrations, security, compliance, and more.

Theo Katsoulis
Founder & CEO
If you lead an RIA, and you've decided to build something in-house (an agentic workflow, a custom tool, an automation that fills a gap no vendor has bothered to solve), you've just become a product manager. Whether you know it or not.
This post isn't here to talk you out of it. We've spent the last five years building software in the wealth tech space, and if you're going down this road, we want to give you a realistic map of what you're actually signing up for.
1/ Before You Write a Line of Code
Have you actually scoped what you're building and what it will cost?
Scope / MVP
What are you building first? Firms that try to build everything end up shipping nothing. Define the smallest useful version before anything else. It's easy to start with one clear problem and watch it become fifteen features before you've tested a single one. The bigger the scope, the harder it is to debug when something breaks… and something always breaks.
Pro tip: Write down in two sentences or less the smallest problem you're trying to solve. Then write what you are explicitly NOT building yet. If you're extracting data from client documents, don't add a reporting layer until the extraction works reliably. Scope creep is where most internal builds quietly die.
LLM & API Costs
This isn't a one-time purchase; it's a meter that's always running. Every query, every document processed, every API call adds up. Most firms don't think about token costs until they see the first bill. Model it out before you build, not after.
Pro tip: Create a separate API key for testing so you can track costs by use case. Set a hard cap on agent loop turns (we default to ten) because an agent that can't reach an answer will keep trying until you stop it. Cap your daily spend at something you're comfortable with and set up alerts. Small guardrails, but they'll save you from waking up to a surprise bill because your agent looped overnight.
2/ The Technical Foundation
Can your systems actually support what you're trying to do?
Hosting & Implementation Approach
Are you using a self-hosted UI like Retool, building from scratch, hiring a dev shop, or stitching together no-code tools like Zapier or n8n? Each path has very different tradeoffs for control, cost, and long-term risk.
At Playbook, we used Retool. The UI wasn't pretty, but as the only ops person at a startup growing AUM 30% a month, it let me build functional tools without pulling in an engineer for every change: enter a client email, run a few queries, pull account details, surface compliance flags, draft an outbound email. One input, one click. Not perfect, but it covered 80% of my day-to-day, and that's what mattered.
Integrations
Does your portfolio management system, CRM, or custodian have a real API? A lot of fintech "integrations" are shallower than they look; all they do is share a notification, not data.
Side note: this is one of the most frustrating parts of building in fintech. The marketing says "we integrate with everything." The reality is that half those integrations share an email notification and nothing else. Incentives aren't aligned; the vendor gets to check a box, but you're the one who has to make it actually work. We learned this the hard way integrating over 40 external providers at Playbook. Just as importantly, can you actually access those APIs? Many firms have a backlog of requests, or sometimes don't provide access at all.
In some extreme cases, we even had vendors tell us they won't support the API anymore. Or my personal favorite: "Yes, we have an API"… Okay, can we use it to input information into a client's profile? "No, that won't work. You either have to enter it manually or pay someone on our services team $50 an hour to do it." Great.
Security
You're not running models on-prem. Almost no RIA is, and that's not a realistic option unless you're investing heavily in infrastructure. You're using cloud providers. That's fine, if you're deliberate about which ones and what you've agreed to.
Two things worth nailing down. When you chain tools together, especially MCPs from smaller or newer providers, you're implicitly trusting everyone in that chain. A major provider like Box or Salesforce has enterprise agreements, SOC 2 reports, and legal teams. A lesser-known MCP might have none of that, and its dependencies are harder to audit. If client data flows through it, you've made a compliance decision whether you meant to or not. For anything touching client PII, only use providers you can actually vet.
On the LLM side: most major providers offer enterprise agreements that restrict data retention and don't train on your inputs. Consumer and free tiers typically don't. That's the distinction that matters, not cloud vs. on-prem, but whether you're on an enterprise agreement you've actually read. Easiest option: turn off training and pay for a premium tier.
And the less talked-about risk: the bigger threat at most firms isn't a vendor breach. It's employees already sharing client data over email, Teams, or leaving printouts on a desk (or worse, falling for phishing links). Your new system should make that less likely, not more.
Data Quality
This is where most builds quietly fail. Your custodian exports have inconsistent account number formats. Your CRM has three spellings of the same client's name. PDFs scattered across folder structures nobody has touched in years. AI doesn't fix bad data, it inherits it.
A concrete example: if your vendor API accepts any text value, the LLM might store a birthday as "December 1st, 1990" one time and "12-01-1990" the next. When you go to use those dates downstream (birthday emails, Social Security timing, RMD calculations) you're stuck cleaning up a mess with no clean answer.
Pro tip: if you're building something custom, use a validation library like Zod that rejects values that don't match the expected format before they're ever stored. If you're using a workflow tool or form builder like Typeform, there are data validation settings worth enabling. Takes extra time upfront. Saves you significantly more later.
3/ Running It Like a Firm, Not a Dev Shop
How does this actually work inside your organization?
Permissions
When your agent takes an action, who did it? If it's operating under your personal login, the honest answer is: you can't tell. Six months in, you won't be able to distinguish what a human did from what the agent did... and that's a problem when compliance comes asking.
Treat the agent like a separate user as early as you can. Give it its own credentials, its own access scope, its own permission level. If your tools don't support multiple logins, build logging directly into the agent: every action, every approval, every decision point. Your proof of concept can probably get away with shared access. But by the time this is running in production, at scale, you want a clean answer to: what did the agent do, and who signed off on it?
Tribal Knowledge
Your firm has a way of doing things. Most of it isn't written down anywhere. It lives in someone's head, or in a Word doc or spreadsheet nobody's updated in three years, or in the habits of the one ops person who's been there since the beginning.
The problem: if you can't explain your process clearly to a colleague, you can't explain it to a machine. AI can only surface the right information at the right time if someone told it what "right" means for your firm. Before you build, document the process you're automating. Not at a high level, but the actual steps, the edge cases, and the judgment calls.
Pro tip: if writing it down feels hard, record yourself explaining it out loud for 10-15 minutes. Then use AI to extract the key steps. It's faster than it sounds and usually surfaces details you'd otherwise forget to document.
This problem isn't unique to small firms. At MassChallenge FinTech 2025, Citizens issued a challenge specifically around agentic AI (autonomous workflows, customer onboarding, operations automation, governance in regulated environments). MassMutual had a related challenge around onboarding as well. Two of the largest financial institutions in the country, publicly asking startups to help them figure it out. We made MassMutual's final round. The point isn't that the problem is hard, rather that everyone, from a one-person ops team to a firm with hundreds of engineers, is still working out how to make this run inside their actual organization.
Collaboration
Can multiple people contribute to an AI-assisted workflow? Can a senior advisor override a decision? These are day-one questions, not edge cases for later.
Version control helps here. Think of it like everyone working from the same template: anyone can propose changes and push them back for review before they go live. If something breaks, you roll back to the last working version instead of trying to reconstruct what changed. That's a small bit of complexity, for a valuable insurance policy when a client-facing workflow breaks.
One thing worth thinking about earlier than you'd expect: how do people actually interact with your agent? Can they forward it an email? Tag it in Slack? Use it through Claude or ChatGPT? You don't need to solve this in version one. But by the time you finish your first build, there will be options worth exploring.
4/ Staying Accountable
How do you know it's working, and how do you prove it?
Testing & QA
AI fails in non-obvious ways. A model upgrade, a vendor switch, a small prompt change - any of these can quietly shift outputs in ways you won't catch until something goes wrong with a real client.
Define a small set of critical workflows that have to produce the same result every time: extracting data from a statement, filling out a form, tiering clients by some criteria. These are your regression tests. Run them any time you make a meaningful change. If something comes back different than expected, you've caught a problem before it became one.
Monitoring
Once it's running, can you see what it's doing? Observability is the goal - you want a clear record of what went into the agent, what came out, what the reasoning was, and what actions it took.
The easiest path: most major LLM providers have built-in dashboards that log all of this. The tradeoff is that the data lives with the provider. If compliance signs off on it, this is the lowest-effort option and more than enough for most firms starting out.
If you can't use the provider's built-in logging, it gets harder. Tools like Datadog or Sentry exist, along with a growing category of AI-specific observability platforms, but they require technical setup and they'll hold more of your client data. Factor that into the decision before you pick one.
Compliance & Audit Trail
Take a rollover as a concrete example. Before recommending one, you need to show that you asked the right questions (see Department of Labor's guidance), understood the client's existing plan, and weighed the fees and rules. That evidence typically lives in three different places: an email thread, a meeting transcript, and a signed disclosure.
When AI is present in your meetings, on your email threads, and tracking your workflows, pulling that together becomes a byproduct of the work you were already doing. You still have to make the right call. You just don't have to spend equal time afterward stitching together the paperwork to prove it. That's not what you're in the business of doing.
During an SEC examination at Playbook, one of the requirements was producing a trade blotter and account detail spreadsheet. It sounded painful. But because our engineering team had built a robust logging system from the start, I ran a few SQL queries and had everything I needed in under 30 minutes. That's what compliance by design looks like - when the infrastructure is right, the hard stuff becomes routine.
Ongoing Maintenance
The real test: could you leave for three weeks and come back to a system that's still running? If not, you've built a dependency, not a tool.
Nothing you build will be fully self-sufficient. But if the logic is documented, the agent's reasoning is captured somewhere retrievable, and there's a handoff process someone new can follow, then you're close. Document as you go. Don't wait until you're about to hand it off to someone else.
Closing
This is harder than getting ChatGPT to generate a table. Building something that works across multiple users, multiple systems, and a compliance framework takes real time. If someone is expecting a production-ready build in under three months, push back. That's not pessimism, it's just accurate.
If you're reading this and nodding along, good. If you're starting to feel the weight of it, that's also normal. Either way, reach out. We're happy to talk through what this actually looks like for your firm.
This is what it actually means to be the product manager of your own firm's infrastructure. Nobody told you that's what you signed up for. Now you know.

