May 26, 2026

AI for Allocators: Why Vertical AI Wins in Private Markets

Naunidh Singh Bhalla
Co-Founder & CTO
This is part two of the two-part series, "AI for Allocators: Field Notes from a CTO". You can find part one here.

This is the second piece in a short series on AI for capital allocators. In the first, I laid out the two problems I think deserve more scrutiny than they get today: how much work it actually takes to extract real value from AI, and how durable the current AI paradigm will be tomorrow. The conclusion I landed on was that the hard part for allocators is no longer the technology question - it's the vendor question. This piece picks up there. If the underlying technology is real but the narrative is overheated, how should allocators in private markets actually think about buying AI?

My short answer: horizontal AI products and vertical AI platforms are different categories of product. The frontier labs are building extraordinary general-purpose AI – foundation models, generalized chat applications, and domain-specific extensions like Claude for Financial Services. None of those are what an allocator needs to fully run their operations. What allocators need is an operating system built for the industry – the system of record, the workflow layer, the integrations, the permissioning, and the audit trail that together make AI usable inside an institutional environment. That is where the durable value in private markets AI resides.

One disclosure up front: I'm a founder building in vertical AI, so I have skin in this argument. The case below is one I'd make even if I didn't. Here's how I'd think about it if I were sitting on your side of the table.

Vertical AI is the layer horizontal labs can't reach

We are heavy users of horizontal AI ourselves. We leverage foundational models for our AI web agents, our extraction and normalization pipelines, our insights and analytics modules, and for internal efficiency. So this isn't an argument against the frontier labs or what they're building – it is an argument about what they're not building, and why they likely never will.

A horizontal AI product is optimized to be useful across every domain, for every user, on every task. That optimization function is incompatible with what an operating system has to do. An operating system has to make trade-offs that only make sense if you're willing to be useless to 99% of the market: encode hundreds of private-markets-specific validation rules that mean nothing to anyone outside the industry, build integration infrastructure for legacy GP portals, maintain entity-level permissioning structures that match how allocators actually organize themselves, and persist data in normalized schemas that reflect how a capital account statement is structured rather than how a generic document is structured. None of this work makes a horizontal product better; it makes it narrower. Which is exactly why horizontal labs avoided this work for years – and why their recent pivot to domain-specific offerings hits a structural ceiling. Claude for Financial Services and Perplexity for Finance are real products doing real work, but they sit one shallow layer above a general-purpose foundation: a system prompt, a curated dataset, and a handful of finance-flavored integrations. They are still optimized to be broadly useful to "finance" – a category that spans banks, hedge funds, public equity research, sell-side analysts, corporate finance, and private markets. Going deeper than that would mean making the product useless to most of that market, which the economics of a horizontal lab simply don't allow.

The pattern is not new. It's what happened in the cloud. AWS became enormous and built generalized application offerings on top of its infrastructure – and yet an entire generation of SaaS companies was built on top of AWS, not displaced by it. The infrastructure layer and the industry-specific application layer turned out to be different products solving different problems with incompatible optimization functions and public-market investors have been pricing that distinction ever since: vertical SaaS consistently trades at a premium to horizontal SaaS.[1] As an allocator audience, you already know why: vertical companies tend to dominate their niches, embed deeper into customer workflows, operate more efficient GTM motions, and benefit from data and customer network effects. The underlying mechanic is even simpler at the company level. A vertical AI company spends 100% of its product, engineering, sales, and customer-success time on one type of customer's problem. A horizontal lab is, by definition, diffused across every problem at once. Focus compounds. Diffusion does not. Believing horizontal AI can do everything vertical AI can is the AI version of forcing a square peg into a round hole. It works for a demo or a scoped use case, but not for full-fledged production environments. And the same dynamic is playing out in the AI era, arguably to a larger extent given that proprietary data and embedded workflows compound over time when it comes to AI.

So what does an operating system actually do that a horizontal AI product doesn't? I get asked this often as CTO, usually framed as "aren't you afraid Claude for Financial Services or Perplexity for Finance will displace you?" The honest answer: some specific capabilities will be encroached upon, no doubt (e.g. ad-hoc analysis / tear sheet generation). But the broader operating system is a different category of product, and the gap shows up across two key pillars: Trust as well as Operational and Institutional Reality.

Pillar 1: Trust

Ultra high accuracy requirements - Private market workflows span a spectrum. Ad-hoc analysis, first-draft memo writing, and brainstorming have accuracy leeway, and are therefore okay territory for a horizontal reasoning engine. But the workflows that actually run an allocator's operations sit on the other end of the spectrum: NAV reconciliation, exposure reporting, fee accruals, capital account statements and so on. In those workflows, 99% correct is still wrong, and a hallucinated number isn't bad UX – it's potentially a compliance event. These are also the lion’s share of private market workflows, and why the accuracy bar is essentially 100%. As a result, customers in private markets have deterministic expectations (i.e., they expect with certainty 99.9% accuracy). However, LLMs are not deterministic, but rather probabilistic by nature, and are poorly calibrated on the specific failure modes that matter in private markets such as numerical precision, document provenance, and cross-document consistency. They don't reliably know when they are wrong on those dimensions, because if they did, they wouldn't hallucinate in the first place. Now add to that the need to maintain accuracy across thousands of documents and millions of data points, and you quickly realise that the probability of failure increases geometrically if you try to scale an LLM across your entire portfolio. This makes it impractical to rely on end-to-end LLM generation for complex industry workflows. An operating system closes that gap structurally. At Tetrix, ingestion, classification, extraction, and validation pipelines are purpose-built for private markets. Thousands of financial rules act as deterministic checks and balances on probabilistic outputs. Human-in-the-loop review is integrated directly into the workflow and reviewer corrections directly feed into feedback loops that make our extraction pipelines better with each iteration. This is the proprietary stack that a Vertical AI operating system ships with; the model is the easy part.

Verification of outputs - Tetrix handles the heavy lifting of verifying your data for you. That is critical because in the world of LLMs, as has often been cited by leading AI minds like Andrej Karpathy, generation is cheap and verification is expensive.[2] To expand on that, generating content has become way easier, but the methods of verifying those outputs have not. What this means is that by solely relying on offerings like Claude for Financial Services, tasks are mentally just as involved since you still need to do the verification; they are just mechanistically faster. In fact, in some cases, you are better off doing things manually because the effort needed to comb through low quality AI outputs (“AI slop”) and execute multi-turn LLM passes outweighs the effort of just having done the task on your own in the first place. This expensive verification is a compounding liability, not a fixed one because if not done right, bad outputs become bad templates for future work and you quickly devolve to a “garbage in, garbage out” scenario where your data can no longer be trusted. I'll borrow an example from engineering to underscore this point: Faros AI's AI Productivity Paradox report analyzed telemetry from over 10,000 developers across 1,255 teams and found that high-AI-adoption teams completed 21% more tasks and merged 98% more pull requests ("generation is cheap"), but PR review time increased 91%, creating a critical bottleneck at human approval. AI-generated PRs were also 154% larger and bug rates rose 9% per developer, erasing the gains at the company level entirely and in some cases producing an overall decrease in productivity ("verification is expensive").[3] An operating system inverts that math by making verification cheap. Provenance is tracked automatically, so any number traces back to its source document in one click. Reconciliation runs in the background, so cross-document inconsistencies surface to a reviewer rather than requiring a manual hunt. Rules flag the specific exceptions that need attention rather than asking a human to re-check everything. Verification stops being a tax on every output and becomes a structured workflow that gets cheaper as the system accumulates context.

Persistence of data -  LLMs and even agent frameworks treat data as session-scoped: ephemeral context loaded for a task and discarded after. There are hacky ways to persist information, whether it is expanding context windows or writing files to your local account, but every tool an agent has access to costs context-window tokens and eats into the model's working memory. Anyone running multi-tool agents at production scale has hit the same walls: bloated system prompts full of tool schemas, context windows exhausted before the actual work begins, latency that compounds with every tool call. A natural rebuttal is that MCP connectors and integrations with existing systems of record could close this gap. That works for some use cases at the edges, but it misses the substrate problem: MCP solves the protocol of access, not the underlying data layer. Someone has to build, version, normalize, and maintain the underlying data regardless, and that someone is either you or your vendor. An operating system is built around that data layer from day one. It persistently stores, relates, versions, and maintains normalized portfolio data quarter over quarter, year over year. It survives across sessions, users, and reporting periods. It accumulates institutional memory as a feature, not as a hack on top of a session-based architecture. MCP and other emerging protocols make the operating system more interoperable with reasoning engines over time, which is good for everyone – but they don't change the underlying product category distinction. A system of record is a system of record. A session-based reasoning engine is a session-based reasoning engine. They are different products, and although a lot of interesting work is happening in AI memory, as of today, it is still not a “solved problem”.

Pillar 2: Operational and Institutional Reality

Integration hell – LLMs reach external systems primarily through MCP integrations and APIs. So, what happens when the GP portal you're trying to access has neither? That is the structural reality of private markets reporting. Many fund portals are decades-old systems, with no public APIs and no MCP servers. Plenty of GPs still drop quarterly reports into shared data rooms with no programmatic access at all. Some require multi-factor authentication. Others use a series of emails to send attachments and credentials. This is not a problem frontier labs can patch with a better model. It is a fragmented, legacy-software reality of the industry and bridging it is a vendor problem, not a model problem. We've spent years building, refining and maintaining those links and continue to invest heavily in this area. 

Cross-organization workflows and permissioning – Real institutional workflows demand granular, cross-organizational permissioning on AI-generated artifacts: who can see which legal entity, fund, manager, asset, document, or underlying number; which derived metrics they can compute; which slices of data they can query; what they're allowed to export; and what gets logged for audit. Foundation models don't ship with any of that. Vertical platforms do, because we have to, given the compliance and security posture of institutional allocators. There's a second dimension that gets overlooked in this category as well: investing and operations at large capital allocators are a team sport, not an individual one. The ability to share analysis, hand off workflows, and collaborate across a tool is just as critical as the analysis itself. Passing around chat exports or one-off AI artifacts between five people on an email or Slack thread is not a forward-looking workflow.

Domain-specific edge cases and regulatory defensibility – In private markets, every value in a portfolio database should be traceable to its source document, with full version history showing how it changed, who reviewed it, what rule flagged it, and what override (if any) was applied, and the regulatory bar for this is rising fast. In Europe, The Alternative Investment Fund Managers Directive (AIFMD II), with an April 16, 2026 transposition deadline, requires comprehensive disclosure across all instruments, exposures, and delegation arrangements; non-compliance carries the loss of authorization to market to EU LPs.[4] In the US, the SEC's Form PF amendments, now compliance-dated October 1, 2026, require materially more granular and timely private-fund reporting,[5] and the SEC's 2026 exam priorities target AI governance and the accuracy of AI disclosures, asking whether automated tools produce outcomes consistent with stated strategies and whether firms have written policies governing AI use.[6] Regulatory requirements will only intensify as private markets continue to grow and retail investor participation accelerates. A horizontal LLM tracks none of this – "the model said so" is not a defensible answer to an examiner asking how a NAV moved 40bps last Tuesday, or to an LP asking which document a fee accrual came from. The edge-case problem compounds it. There are fund-specific, asset-specific, company-specific and customer-specific nuances, all of which have to be encoded and tracked as terms change, deals roll, and policies evolve. For allocators operating under fiduciary duty, the choice is between AI that can survive examination and AI that can't.

Real support and partnership – Frontier labs are extraordinary research organizations. They are not, and should not be expected to be, deep domain experts in private markets workflows. Again, this isn't a knock on them; it’s the reality of where their comparative advantage actually resides which is in advancing generalized reasoning capabilities at scale. Vertical AI platforms, on the other hand, live and die by domain depth. We sit with LPs of all shapes and sizes multiple times a day. We mould our product around their demands. When an LP comes to us mid-reporting-cycle and says "we just took on a new sub-strategy and need to track it differently across three reporting periods," that's a 24-hour turnaround for us. That partnership model – proximity, accountability, and accumulated context – is something a horizontal lab cannot offer at scale, and it's what institutional allocators actually need from an operating system that powers their workflows and insights.

Of course, foundation models are getting better every day. That's good for everyone, including us because we are power users and huge fans of frontier labs like Anthropic. But "smarter model" does not equal "system of record." It does not equal "data infrastructure." It does not equal "the institutional-grade workflow platform that runs your entire operational and investing stack." Those are products. They are built by teams who have lived inside the workflow long enough to know where the value resides, where the edge cases are and how to encode them. The honest test of all of this is straightforward: a side-by-side comparison. Try asking Claude for Financial Services, Perplexity for Finance, or any of the other domain-specific horizontal offerings to do what an institutional operating system is supposed to do across the two pillars above. They will give you a great answer for one document. Maybe ten. But at scale, they will choke hard. And are you going to really trust decisions worth millions, if not billions, of dollars in the hands of offerings that have these fundamental architectural gaps? Probably not. Run the same workflow through a platform like Tetrix and you'll get exactly what you'd expect: a system that holds up. I may come across as biased, so don't just take my word for it – the whole point of the side-by-side is that you don't have to. This is the predictable outcome of two product categories doing two different jobs – and someone has to build the operating system that institutional allocators actually need. That structural reality is why I continue to spend my time, alongside my team, building Tetrix.

Choosing a vertical AI partner

I have made the case for Vertical AI but "bet on vertical" isn't enough on its own – not every vertical AI vendor is built the same way, and the market is crowded with thin wrappers calling themselves full operating systems. If you're an allocator and every vendor in your inbox is "AI-powered," three questions tend to separate substance from marketing, and they map directly to the two pillars above:

  1. Interrogate the verification layer, not just the model layer. A headline accuracy percentage is meaningless without the methodology behind it. Ask how accuracy is measured, on what dataset, and how often it's re-tested, how edge cases are handled, how provenance and audit trails are maintained and what human review infrastructure exists when models fail. The answer reveals whether AI is integrated thoughtfully or bolted on, and whether the verification scales as the system grows. For what it's worth, we’re happy to walk any LP through our own methodology in detail.
  2. Validate workflows, not tasks. A polished demo of a single ingestion, extraction or analysis tells you very little about whether the end-to-end operational process holds together. Pressure-test the full path your team actually walks, from raw documents arriving through fragmented reporting channels, ingestion into structured systems, reconciliation across entities and reporting periods, auditability, cross-team collaboration, and the production of outputs and insights that investment committees can ultimately trust and act on.
  3. Test the speed of the vendor. In the AI era, the gap between vendors compounds via execution speed and this shows up in two places that matter: how fast they ship, and how fast they respond when something breaks or you ask for something new. Ask what they shipped most recently and what their typical turnaround time is - then, verify both with a reference customer. The vendor who ships in days and answers in hours will lap the one with twelve-month roadmaps and week-long queues – and the gap only widens from there.

The throughline is simple: the operating environment for alternatives is shifting from best-effort reporting to documented, defensible processes. Institutional allocators aren't purchasing reasoning capability – they are selecting the system that runs highly consequential financial workflows for years and decades. The platforms worth partnering with are the ones that take that responsibility seriously: deliberate on accuracy, accountable on workflows, defensible to a regulator, and fast on iteration when the workflow inevitably evolves. That's the bar we hold ourselves to at Tetrix, and it's the bar I'd encourage you to hold every AI vendor to.

Endnotes

[1] Euclid Ventures, "The Vertical SaaS Profit Premium," May 2024. https://insights.euclid.vc/p/the-vertical-saas-profit-premium

[2] Andrej Karpathy, post on the generation–verification asymmetry, X (formerly Twitter), June 2025. https://x.com/karpathy/status/1930305209747812559

[3] "The AI Productivity Paradox Research Report," Faros AI, July 2025. https://www.faros.ai/blog/ai-software-engineering

[4] "The AIFM II Directive," Autorité des marchés financiers (AMF), March 26, 2026. https://www.amf-france.org/en/news-publications/depth/aifm-ii

[5] "SEC Again Extends Form PF Compliance Deadline to October 1, 2026," Proskauer Rose LLP, September 18, 2025. https://www.proskauer.com/alert/sec-again-extends-form-pf-compliance-deadline-to-october-1-2026

[6] "Division of Examinations 2026 Examination Priorities," U.S. Securities and Exchange Commission, November 2025. https://www.sec.gov/files/2026-exam-priorities.pdf

Back to blog posts