Lean Informatics feasibility wiki
v1.0 · 2026-05-21 · Houston

Why this space, why this time

Texas first — the maximum addressable market right outside our window

Before talking about anything bigger, look at what’s addressable inside Texas alone:

Texas crude oil
5.7M bopd
February 2026. >42% of all US crude. Permian Basin (TX + NM) at 6.6M bopd is nearly half of US production.
Texas natural gas
36.2 Bcf/d
~30% of US marketed gas. Plus 4.4M bopd of natural gas liquids.
Producing wells in Texas
~238,000
154,393 oil + 83,679 gas (Feb 2026 RRC). Every well files monthly W-10 or G-10 reports to the RRC. That’s ~2.8M filings per year — today, mostly by hand.
Inactive / shut-in wells
~157,000
March 2026. 46% of active operators have >25% of their wells inactive. Plugging compliance, P-5 organization reports, and orphan-well exposure are all AI-accelerable workflows.
Texas O&G export revenue
$18.5B/month
December 2025 alone. Annual upstream economic activity in Texas runs $150–220B depending on the price deck.
Texas operators by size
~9,000 active
Roughly 1,500–3,000 mid-size (5–200 wells) — the Track 1 wedge target. ~100 enterprise operators on top. Long tail below.
Texas land services firms
50–80 firms
Houston / Midland / Dallas / San Antonio hubs. The distribution channel. Texas AAPL chapter is the largest in the country.
Texas-only MAM for LI
$150–500M/yr
Blended: $50K/yr × ~2K mid-size ops + ~$300K/yr × ~100 enterprise + ~$200K/yr × ~60 land firms + RRC compliance throughput. Before LI ever crosses a state line.
scale
Texas alone is roughly 25–30% of the US-wide SAM (the $2–5B/yr serviceable surface in §01). It is also home, drive-distance to most customers, single-regulator (Texas RRC), and the densest AAPL membership in the country. Houston-first is a feature, not a constraint. The same playbook ports to Oklahoma (OCC), New Mexico (OCD), North Dakota (NDIC), Louisiana (LDNR), and Colorado (COGCC) once the Texas reference customers are live — but we don’t need to leave Texas to hit the 24-month target.

The workflow gap, in plain terms

Upstream oil & gas is one of the largest US industries still running day-to-day work on PDFs, spreadsheets, vendor-specific schemas, and tribal knowledge. The sector generates roughly $400B in annual US revenue, employs ~600K people directly, reports to ~30 state agencies plus a half-dozen federal bodies, and pulls information from a long tail of land services firms, completion contractors, midstream counterparties, and trading desks. Almost every operator workflow — from the monthly RRC filing, to an AFE evaluation, to a JOA negotiation, to a completion-design review — still depends on people in chairs reading PDFs and re-keying values into different systems. That gap is the entire opportunity.

Why now

Three forces converged in 2025–2026 to open the window:

  1. Foundation models crossed the operator-fluency line. Frontier models (Claude Opus 4.6, GPT-5.1, Grok 4, Hermes 4 405B, Llama 4 405B) have ingested the public petroleum corpus — SPE OnePetro’s 300K+ papers, AAPG/SEG journals, state commission filings, courthouse title records, Schlumberger/Halliburton manuals, the Craft-Hawkins/Slider/Lake/Economides textbook canon. Out of the box they score 50–70% on the SPE certification subset and converse with field hands using the right vocabulary. The "you need a roughneck on staff to compete" barrier is empirically dead. §04 develops this in depth.
  2. Security and infrastructure standards have commoditized. SOC 2 Type II, ISO 27001, KMS-managed encryption, audit-grade logging, and FIPS-validated crypto are commodity table stakes in 2026. Whatever substrate the operator chooses — on-prem appliance, customer colo, AWS, Azure, GCP, or hybrid — the security perimeter is portable. The "we need our own datacenter" or "we need a private cloud" objection is now a preference question, not a technical one.
  3. The labor cycle is forcing the issue. The petroleum engineering workforce is aging out. The COVID-era exodus cost a generation of mid-career operators. Operators are running the same workflows with smaller teams against rising regulatory complexity. AI is the only path to absorb throughput demand without headcount the operators don’t want to add.

What AI brings operators, concretely

A short list of upstream workflows where AI is shipping real economic value today — not in marketing decks:

Regulatory compliance
Hours → minutes
W-10 / G-10 / H-10 monthly filings, drilling permits, plugging reports. Audit-grade traceability built in.
AFE evaluation
Weeks → hours
Authorization-for-expenditure on offsets, non-consent decisions, working-interest marketing. Ownership-validated, offset-based economics.
Completion design review
Pattern recognition
Cross-reference offset completions, sand/fluid loading, stage spacing, frac-fleet selection across thousands of reports no engineer reads end-to-end.
Geosteering & reservoir analytics
Real-time picks
MWD/LWD interpretation, formation-top picks, target-zone optimization. Catches what tired-eye geosteerers miss at 2am.
JOA & contract review
Dangerous clauses
AFE non-consent windows, marketing provisions, default cure periods, surviving obligations. Surfaces the traps before signature.
M&A diligence
Weeks → days
Lease term extraction across thousands of leases, title-chain assembly, ROW gap analysis, encumbrance review.
JSA / safety briefing
Per-job, not generic
Job-specific safety analyses keyed to live weather, site conditions, crew assignment. The work operators skip when busy now happens every time.
Well-failure pattern recognition
Cross-history
Reading the operator’s full history of post-mortems to surface recurring failure modes a field engineer can’t catch in one cycle.

Every one of these is an operator-facing workflow. Not "a platform." Not "data analytics." Specific work that engineers do today, that AI can do faster, cheaper, and with audit-grade traceability. The rest of this document is what Lean Informatics is doing about it.

trade
The core proposition. Every operator pays for efficiency. None of them will learn or perform all the things we will — and they shouldn’t have to. Let our expertise live with us: AI workflow ownership, FDE delivery, regulatory automation, audit-grade compliance posture, foundation-model fluency. Let the operator focus on what they’re best at: finding the rock, drilling the well, completing the zone, producing the curve. That is the trade Lean Informatics offers, and it is the trade every services-led vertical that ever beat a software-only incumbent has offered before us.

The role at the center — FDE, the blue-collar knowledge worker

One term recurs throughout this document and deserves a definition up front: FDE — Forward-Deployed Engineer. Origin: Palantir, mid-2000s. Adopted as a model by OpenAI’s DeployCo, Anthropic’s enterprise practice, Google, EY, and a long list of vertical-AI startups in 2024–2026. Job postings up ~800% year over year, average compensation ~$238K, hiring still outrunning supply.

An FDE is not a salesperson, not a consultant, not a customer-success rep, and not an architect on slides. The FDE is the engineer who shows up at the customer site, reads the actual workflows, writes the actual integrations, configures the actual system, owns the actual outcomes, and stays through the deployment until the customer’s operating problem is solved. The FDE’s work is the product. Headquarters ships the platform; the FDE ships the customer.

Why this role matters right now

Three structural shifts converge on the FDE in 2026:

  1. White-collar knowledge work is being commoditized by AI. The MBA-grade analyst who reads documents and produces synthesis is no longer the scarce resource — the foundation model does that in seconds for fractions of a cent. What remains scarce is the human who can sit with a customer, understand a workflow in its real operating context, and ship the implementation that actually works under the customer’s regulatory, political, and infrastructural constraints. The analyst layer compresses; the FDE layer becomes the durable layer of the org chart.
  2. An organic brain drain is in progress in upstream. The petroleum engineering workforce is aging out. The COVID-era exodus cost a generation of mid-career operators. The junior pipeline thinned through the 2014–2020 down cycles. Tribal knowledge that lived in those people is walking out the door faster than it can be transferred. Customers do not have spare engineering capacity to absorb a complex SaaS implementation. The FDE is the substitute for the institutional engineer the customer used to have on staff.
  3. Service implementation is taking over from software shipping. Palantir built a public ~$80B+ market cap on this model. OpenAI’s DeployCo, Anthropic’s enterprise JV, EY’s AI delivery practice, and every serious vertical-AI startup in 2026 is running an FDE motion because the product alone does not produce the customer outcome — only the product plus the implementation engineer does. The market is rerating “software vendor” toward “services-led product company,” and the FDE is the unit of that.

The blue-collar knowledge worker

The cleanest framing of what an FDE actually is, in 2026, is the blue-collar knowledge worker: hands-on, on-site, technical, outcome-accountable, with the operating discipline of a journeyman tradesperson applied to knowledge work. Not the analyst in the conference room. Not the architect on the slide deck. Not the consultant with the recommendations memo. The engineer who shows up with the toolkit, reads the actual job, fixes what’s broken, ships what works, and owns the result when it ships.

That shape of work used to be a lineman, a millwright, a field instrument technician. In an industrial economy, those were the people who walked into the plant, kept the machinery running, and answered the 2am phone call. In a 2026 economy where AI is eating the analyst layer of white-collar work, that same shape of work — hands-on, on-site, outcome-accountable, paid for what gets shipped not what gets recommended — is reasserting itself one altitude up the stack. It is now the FDE who walks into the operator’s office, reads the actual filing workflow, ships the actual integration, and answers the 2am call when the RRC submission breaks. The trade is the same. The toolkit changed.

durable
Why Lean Informatics is built around this role. The founder has shipped the FDE shape of work before, under regulator scrutiny, at industrial scale — details in the Founder section. Epic Systems’ ~12,000-person army of implementers is the same shape at industrial scale. Palantir’s FDE bench is the same shape at higher altitude. LI is built as an FDE-first company because that is where the durable economic value sits once foundation models commoditize the analyst layer of knowledge work. Track 1 (RRC compliance) is the wedge; Track 3 (FDE-as-service retainers) is the long-arc revenue line.

The exciting opportunity

Upstream oil & gas is one of the last under-digitized industries of its size. Texas alone produces 5.7M barrels of crude per day, files ~2.8M monthly well reports with the RRC, runs through ~9,000 active operators, and most of that workflow surface still moves on PDFs and spreadsheets. The window is open right now because three forces converged in 2025–2026: foundation models speak petroleum fluently out of the box, security and infrastructure standards have commoditized into table stakes, and the petroleum engineering workforce is aging out faster than it can be replaced. That is the opportunity. An FDE-led services company can ship audit-grade workflows to operators where they want their data to live, undercut the cloud-locked incumbent on price and deploy time, and compound the customer relationship into a multi-decade moat.

Lean Informatics is built for that window. The competitive ladder, in one line: Enverus is the old guard — cloud-locked and PE-flipped three times in seven years; Collide is the newcomer that proved the gate is open; Lean Informatics is the new approach — services-first, infrastructure-agnostic, foundation-model leverage compounding monthly. The operating discipline we lean on is the one Epic Systems built in healthcare over 45 bootstrap years — discipline, not topology (full primer in §01, full crosswalk in §C.0). The 24-month $10–15M/month target is a milestone on a multi-decade arc, not the endgame.

The sector research below (Sections 1–15 + Addenda A–B) documents the landscape we’re entering. Section C is the operating plan.

01Executive summary

core
The deal, in one line. Every operator pays for efficiency. No operator will ever learn or perform all the things Lean Informatics will — AI workflow ownership, regulatory filing automation, FDE delivery discipline, audit-grade compliance posture, foundation-model fluency — and they shouldn’t have to. We carry that expertise. The operator carries the wells, the leases, the field crews, and the production curve. Our expertise lives with us. Their expertise stays where it belongs. Both sides get to focus on what they’re actually best at. That is the trade.

Lean Informatics is the FDE-led vertical AI services company for upstream oil & gas. We meet operators where they want their data to live (on-prem appliance, customer-owned colo, customer hyperscaler tenancy, or hybrid) and own the workflows the incumbents don’t. Data location is a customer dial, not our differentiator. Security and infrastructure standards (SOC 2 Type II, ISO 27001, KMS, audit logging, FIPS-validated crypto) are commodity table stakes — we meet them everywhere we deploy. The moat is the FDE relationship, the workflow ownership, and the foundation-model fluency we bring to the customer’s problem. Houston is the starting wedge. The serviceable surface is national upstream, then international. The distribution moat is the existing infrastructure of survey and land services companies whose embedded delivery model already maps onto our FDE motion. The operating discipline we lean on is the Epic Systems pattern from healthcare — primer in the callout below, full crosswalk in §C.0.

primer
Quick primer on Epic Systems (for the unacquainted). Epic Systems is the dominant electronic health records (EHR) vendor in US healthcare. Founded 1979 in Madison, Wisconsin by Judy Faulkner with $70,000 of personal capital. Bootstrap throughout — refused venture capital and refused to IPO for 45 years. Today: roughly $5B in annual revenue, ~40% of US hospital EHR market, runs the clinical and billing systems at Mayo Clinic, Kaiser, Cleveland Clinic, Johns Hopkins, and most of the academic medical centers. Operating model: long, deep implementations (12–24 months) executed by an army of Epic-employed implementers; workflow lock-in across every clinical and billing system; engineering organization concentrated at one Verona, Wisconsin campus; annual user group meeting that draws ~25,000 attendees and acts as a moat. Recently moved to a hybrid cloud posture (Hyperdrive on Microsoft Azure / Azure Virtual Desktop / Azure Large Instances) — the deployment topology evolved; the operating discipline didn't. That set of operating disciplines, applied to upstream oil & gas in 2026 instead of healthcare in 1979, is what "the Epic Systems play" means in this document. Section C.0 develops the full playbook with the Epic-to-LI crosswalk.
Strategic posture
Epic play
Bootstrap. FDE-led services. Infrastructure-agnostic, customer-centric on data location. Houston-concentrated. Long cycles as moat.
Max addressable
$15–30B/yr
AI in O&G ($7.6B → $25B by '34) + US land services ($3–5B) + regulatory automation + agency emergency tech.
Serviceable (5-yr)
$2–5B/yr
~1,500–2,000 US upstream operators, ~150–250 land/survey firms, ~100–300 agency buyers in scope.
Obtainable (5-yr)
$200–500M ARR
Direct + channel + agency. 10-yr Epic-style: $500M–1B+ ARR.
Distribution path
Land & survey channel
10–25 established firms (BETA-class) = pull-through to 200–800 operators. White-label model.
Geographic arc
Houston → world
Houston → Gulf Coast → Permian/Eagle Ford/Bakken → US → Canada/Australia/Middle East.
Founder edge
Vertical-stack ops
Cross-vertical operator with vertical-stack engineering experience and audit-grade ops discipline transferred from prior agency-scale deployments. Full profile in Founder section.
24-mo milestone
$10–30M ARR
Base case with 1 leverage path firing. Floor for the long arc, not the endgame.
distribution
The distribution thesis — survey and land companies as the channel. Roughly 150–250 US land services firms already operate the embedded-delivery model: their landmen and abstractors sit on customer sites, handle confidential operator IP daily, manage title chains and ROW projects, and have multi-decade trust with mid-size operators. They are undermanned digitally and structurally unable to build AI in-house. They are the Lean Informatics distribution layer. White-label the appliance + workflow stack into 10–25 of them; each pulls 5–50 operator customers into our orbit without LI ever cold-calling. The 14,000-member AAPL network is the addressable surface for the channel itself. Path A (BETA Land Services) in Section C.3 is the templated first instance.

The competitive ladder — old guard, newcomer, new approach

Old guard
Enverus
Cloud-locked, PE-flipped 3× in 7 yrs (Genstar→H&F $4.25B→Blackstone $6.5B Aug 2025). Bureaucratic, pricing-opaque, structurally slow. §03.
Newcomer proof
Collide
$5M seed (Mercury Fund, Apr 2025), RIGGS LLM, FDE motion. Proves the gate is open and the category is fundable. §05.
The leveling event
Foundation models
100 yrs of SPE/AAPG/courthouse/DOE/USGS/textbook corpus in the weights. Vernacular, metrics, IP definitions — all leveled. §04.
New approach
Lean Informatics
FDE-led services. Customer chooses where the data lives. Land/survey distribution. Industry-adjacent founder with vertical-stack engineering and audit-grade ops discipline (see Founder). Bootstrap.
leveling
The freight train. Frontier models in 2026 (Claude Opus 4.6, GPT-5.1, Grok 4, Hermes 4 405B, Llama 4 405B) were trained on the public, academic, and institutional knowledge corpus of 100 years of upstream oil & gas — SPE OnePetro (300K+ papers), AAPG journals, DOE/USGS/EIA technical reports, state RRC/RRC-equivalent public filings, courthouse title records, Schlumberger/Halliburton/Baker Hughes field manuals, decades of trade press, conference proceedings, academic petroleum engineering coursework, expert-witness transcripts. The vocabulary, the decline-curve math, AFE structure, joint-operating-agreement boilerplate, lease conventions, fluid models, completion design vocabulary — all in the weights. The "you can't compete without a roughneck on staff" defense is empirically dead. Favoritism, old-buddy networks, and insider knowledge cannot stop a foundation model that already speaks the language fluently and a 30-day-deployable filing system that costs one-tenth of the incumbent. Firms that recognize this before their competitors do bend the cost curve and compress decision cycles by 10×. Firms that wait pay the late-mover tax. §04 develops this argument with citations.

Lean Informatics in one screen

  1. The Epic Systems posture is the operating frame — the disciplines, not the deployment topology. Bootstrap as long as possible. FDEs before salespeople. Long sales cycles as the moat. Annual user summit by Y3. Refuse easy SaaS-ification, easy money, and early acquisition. Houston as the geographic concentration. Epic itself moved to a hybrid cloud posture (Hyperdrive on Azure / AVD / Azure Large Instances) without abandoning the disciplines — the disciplines are what we copy. 10-year sector dominance is the goal; the 24-month $10–15M/mo target is a milestone on that arc.
  2. The competitive ladder has a top rung that's structurally slow. Enverus is the old guard — great data, $500M+ ARR, 8,000 customers, 25 years of moat-building — but three PE flips in seven years (most recently Blackstone, Aug 2025), employees publicly describing "large project management bureaucracy" and "not really 'agile' in any sense," and the Enverus ONE platform launch itself framed as defensive against latecomers. That's a target, not a fortress. §03 develops the gap.
  3. The market exists at meaningful scale and is structurally large. $15–30B/yr maximum addressable across upstream AI, land services, compliance automation, and adjacent agency emergency tech. Empirically proven by competitors, customers, and investor capital flowing in — §05 catalogs the Collide evidence.
  4. Foundation models have leveled industry expertise. The vernacular, definitions, metrics, and institutional IP that used to gate vertical-AI entry are now compressed into a $20/month API call or a free open-weight checkpoint. What still gates entry is distribution, sovereignty, and FDE reliability under regulator scrutiny — not knowledge of the sector. §04.
  5. Distribution wins, not features. The land/survey channel — 150–250 US firms with embedded customer relationships — is the unfair distribution path no cloud-first competitor can reproduce. Lean Informatics becomes the AI engine inside their existing service motion.
  6. Two-track architecture compounds. Track 1 (cloud RRC compliance, $1.5–4.5K/mo) funds the company while Track 2 (on-prem appliance, $5.5–13.5K/mo) builds the defensibility. Tracks 3 (FDE services) and 4 (agency cross-sell) extend reach as scale permits.
  7. Founder profile is industry-adjacent, vertical-stack engineering background, audit-grade operating discipline. Cross-vertical operator with prior experience shipping under regulator scrutiny. Full bio, transferable skills, and operating discipline detailed in the Founder section.
  8. Infrastructure is not the moat. The services relationship is. Security and infrastructure standards are industry-commoditized: SOC 2 Type II, ISO 27001, KMS-managed encryption, audit-grade logging, FIPS crypto, signed DPAs. We meet those standards everywhere we deploy. Data location becomes a customer preference dial — on-prem appliance for sovereignty-sensitive workloads, customer-owned colo for the middle path, hyperscaler of choice for the cost-optimized path. The on-prem reference architecture in Addendum B is one option, not our identity. The compounding moat is the FDE relationship, the workflow ownership, and the customer outcomes we own quarter over quarter. That is what makes Lean Informatics a services company first and a software vendor second — and why it's more competitive in 2026 than the cloud-or-bust playbook the incumbent and the newcomer both run.

02Core thesis

Lean Informatics' strategic question: if foundation models do most of the cognitive lifting, if open weights are free, if MLX runs them on a laptop, and if Claude Code can write the glue — what's left to charge for in 2026?

Three things, in order of defensibility:

  1. Embedded trust — the FDE services relationship. Whose engineers sit in the customer's morning meeting? Whose phone does the VP Operations call when filings break at 11pm on a Thursday? Whose post-incident review does the regulator read first? This is the compounding moat. It cannot be cloned by feature work and it cannot be acquired in a $4.25B PE transaction.
  2. Workflow-specific data accumulation, wherever the customer wants it. Every filing the system completes, every JSA it generates, every well-failure post-mortem it reads compounds into pattern recognition no foundation model has. The accumulation happens on the customer’s chosen substrate — on-prem appliance, customer-owned colo, hyperscaler of choice, or hybrid. The location of the data is the customer’s preference; the workflow ownership is ours.
  3. Regulator-grade audit posture as a portable standard. SOC 2 Type II, ISO 27001, signed DPAs, named-human signoff workflows, FIPS-validated crypto, agency procurement readiness. The boring stuff that takes 12–18 months of paperwork. The same standards travel with us across every deployment topology, because in 2026 security and infrastructure standards are industry-commoditized table stakes, not competitive differentiators. The founder’s prior procurement track record (see Founder section) means this isn’t learned from scratch — it’s transferred discipline.

Lean Informatics is structured to own all three from day one. #1 is the FDE motion. #2 is the workflow ownership pattern with location-flexible deployment (Addendum B documents the on-prem reference architecture as one option; the same workflow runs on the customer’s hyperscaler if that’s what they prefer). #3 is sequenced into the plan (SOC 2 Type I by Y1, Type II by Y2, agency-prime partnerships in parallel). This is what an Epic-style competitor looks like in vertical AI — and notice that even Epic itself has moved to a hybrid cloud posture without abandoning the operating disciplines that built the moat.

One more thing the thesis depends on, taken up in §04 in detail: foundation models have leveled the industry-expertise barrier, and security/infrastructure standards have commoditized the perimeter. A century of public, academic, and institutional O&G knowledge is in the weights of every frontier model. SOC 2 / ISO 27001 / KMS / audit logging are table-stakes everywhere. The "you need a roughneck on staff and a private datacenter" defense is empirically dead on both axes. What's left to charge for is what the models and the infrastructure cannot supply on their own — trust, distribution, FDE reliability, and ownership of the customer's workflow outcomes. That's exactly what Lean Informatics is built around.

moat
Infrastructure is not the moat. The services relationship is. Where the data lives is a customer dial in 2026, not a competitive differentiator. Security standards are industry-standard. Hosting is interchangeable across hyperscalers, customer colos, and on-prem appliances. That makes Lean Informatics a services company first. Our compounding asset is the FDE relationship and the workflow ownership it produces — the same disciplines Epic built on, applied to upstream O&G in 2026 instead of healthcare in 1979.

03Enverus — the old guard

Enverus is the structural incumbent of upstream data and analytics. Founded 1999 as DrillingInfo by Allen Gilmer, rebranded to Enverus in 2019 after multiple acquisitions, $500M+ ARR, 8,000+ customers across 50 countries, 2.7 PB of data, 350M+ courthouse records, $500B+ in annual energy transactions through its platform. Enverus is also where the wedge opens. Three private-equity flips in seven years, public employee commentary about bureaucracy and slow product cycles, two-star customer reviews on the public review sites that exist, and a product launch posture (Enverus ONE, April 7, 2026) that reads as defensive rather than confident. The data moat is real. The execution-organization is not what it was when DrillingInfo was the scrappy challenger.

Ownership history
3 PE flips, 7 yrs
Genstar (2018) → Hellman & Friedman ($4.25B, Jun 2021) → Blackstone ($6.5B, Aug 2025). Each transition: cost discipline, customer price increases, roadmap re-rationalization.
Public starting price
$275/user/mo
Floor for low-tier feature subscriptions. Enterprise tiers are RFQ-only. Reviewer-documented annual increases without negotiation transparency.
Enverus ONE launch
Apr 7, 2026
Governed AI platform. SOC 2 Type II. Isolated tenancies. Still cloud-only. Four launch "Flows" (AFE, Production Valuation, Project Siting, QuickStart).
Scale
8,000 / 50 countries
$500M+ ARR. 25-yr customer base. Real footprint — not a paper tiger. But also not a startup that can pivot on a quarter.

Where Enverus is structurally exposed

  1. PE-driven price compression on customers. Public review-site evidence: "the subscription is not worth the price and they hide the price of the annual subscription till they send you the invoice"; "constant price increases and no added value for my uses"; 90-day cancellation policies buried until the customer tries to leave; threats of "action" for non-payment after attempted cancellation. The pattern is consistent with three PE sponsors in seven years compounding subscription revenue at the customer's expense. This is the unhappy customer pool.
  2. Internal bureaucracy at PE-mature scale. Public Glassdoor commentary describes "large project management bureaucracy for a company of their size", "not really 'agile' in any sense of the word", "too much hierarchy in their management structure", "frequent internal reorganizations", and senior management protecting "pet projects" that should have been killed. A 598-review aggregate of 4.1/5 doesn't change the fact that the engineering organization is not a fast-moving target.
  3. No deployment-topology choice for the customer. Enverus ONE is cloud-only on their tenancy. The launch language — "proprietary customer data remains isolated within a private tenancy" — is the strongest sovereignty pitch they can credibly make, but it's still their cloud. The customer ships data out and has no say in the substrate. For operators who want on-prem for completion designs, M&A diligence, AFE pricing models, or JOA-sensitive negotiation data — or for operators who simply prefer to run on their existing Microsoft/AWS contract and consumption-discount stack — Enverus has one answer. Lean Informatics meets the customer where the customer wants to be. Even Epic itself moved to a hybrid cloud posture in healthcare (Hyperdrive on Azure / Azure Large Instances) once customers asked for the option. Enverus has not extended that courtesy to upstream.
  4. Acquisition-driven product sprawl. Spatial Business Systems (April 2026, utility design/engineering), Tracts.co partnership (April 2026, title automation), Xpansiv partnership expansion (May 2026, price discovery), plus legacy PRISM, MarketView, Foundations, Sphere, and now ONE. Each bolt-on carries integration debt and uneven UX. Customers reach for fewer tools, not more.
  5. The Enverus ONE pitch contains its own admission. CEO Manuj Nikhanj framed the April 7 launch with the line "the gap between the companies that move now and the companies that wait is going to be significant and it is going to compound", and CPO Jimmy Fortuna pitched ONE as "the only AI platform that can" reason with O&G operating context. That language is defensive against the very thing Lean Informatics' thesis predicts: foundation models compressing the 25-year data moat into a deployable product. When the incumbent's CEO has to publicly insist on the gap, the gap is shrinking.
  6. The long tail of mid-size operators is undersold. Enverus's center of gravity is enterprise: supermajors, capital markets, large independents, midstream majors. The 5–200-well operator who needs RRC compliance and AFE evaluation but can't justify a $50K+/yr Enverus contract is structurally underserved. That is exactly Track 1's wedge customer.
opening
The opening, stated plainly. Enverus is great at being Enverus. It is not built to deliver an FDE-led, customer-premises-deployed, sovereignty-first product to a 50-well operator in Midland in 30 days. The data moat doesn't translate into delivery agility. Three PE flips have made the company organizationally and pricing-wise exactly the kind of incumbent that bootstrap, founder-led, premises-deployed challengers historically displace at the edges. The mid-size operator and the land/survey channel are the edges.

04The foundation-model leveling event

The most important fact in this entire document, and the one that makes Lean Informatics possible at all, is this: frontier foundation models trained between 2023 and 2026 absorbed roughly a century of public, academic, and institutional oil & gas knowledge into their weights. The "you can't build vertical AI for upstream without 25 years of proprietary data and a roster of petroleum engineers" defense, repeated by every incumbent for the last decade, is no longer empirically true. The vernacular, the definitions, the metrics, the workflows, and the institutional IP that used to gate entry are now a $20/month API call or a free open-weight checkpoint.

What's in the weights

By 2026, the major closed-weight frontiers (Claude Opus 4.6, GPT-5.1, Grok 4) and the major open-weight families (Hermes 4 405B, Llama 4 405B, Qwen 3.5) have ingested, in some combination:

  • SPE OnePetro corpus — ~300,000 peer-reviewed petroleum engineering papers and conference proceedings.
  • AAPG and SEG journals — the geology and geophysics literature stretching back to the 1920s.
  • DOE, USGS, EIA technical reports — reservoir studies, basin assessments, methodology papers, regulatory technical bases.
  • State commission filings — Texas RRC, Oklahoma OCC, New Mexico OCD, North Dakota Industrial Commission, Louisiana Office of Conservation. Decades of public W-10/G-10/H-10 equivalents, drilling permits, completion reports, plugging records.
  • Courthouse public records — title abstracts, lease assignments, ROW grants, unit designations, pooling orders. Much of this is web-indexable.
  • Industry textbooks and field manuals — Schlumberger Oilfield Glossary, Halliburton/Baker Hughes operations manuals (where public), classic Craft-Hawkins, Slider, Lake, Ahmed, Economides petroleum engineering texts.
  • Trade press and conference proceedings — Hart Energy, JPT, World Oil, E&P Monthly, ARC Group, Wood Mackenzie public reports, IHS Markit precursors.
  • Academic petroleum engineering coursework — MIT OpenCourseWare, Stanford SCRF, Texas A&M, UT Austin, Colorado School of Mines, Tulsa graduate-level course materials where publicly posted.
  • Expert witness transcripts and litigation discovery — PACER and state court systems carry decades of technical expert depositions on well control, fluid mechanics, completion failure, royalty disputes.
  • YouTube engineering channels — everything from Practical Engineering and Real Engineering down to ChevronTexaco operator training videos and Oilfield Joe explainers. Petroleum vocabulary is in the audio transcripts.

The result, validated repeatedly in 2025-2026 benchmarks: frontier models hold their own on petroleum engineering coursework, score 50–70% on the SPE certification subset out of the box (Claude Sonnet 4.5: 52.5%; Grok 4: 62.5%; Collide's RIGGS at 67.5% is a 5-point fine-tune lift on top of a model that already knew the material), and can fluently produce AFE narratives, JOA boilerplate, geosteering interpretations, completion-design rationale, and regulator-grade filing language. The domain-tuned embeddings (PetroVec-style) that incumbents marketed as defensible IP can be reproduced by a competent ML engineer in a weekend.

What's been leveled

Vocabulary barrier
Gone
The model knows the difference between a swab cup and a packer, a frac plug and a bridge plug, an AFE and a JIB. It can converse with the field hand.
Vendor data lock-in
Gone
Foundation models ingest any format: PDFs, scanned permits, EDI feeds, LAS files. The "our data schema is the moat" pitch is over.
Roughneck-on-staff defense
Gone
The model has been trained on more SPE papers than any single petroleum engineer has read. It is not a replacement for an SME, but it is a replacement for the gatekeeping function of "you can't enter our industry."
Institutional IP
Gone (mostly)
Definitions, metrics, taxonomies, standard formulas, decline curves, fluid models, AFE templates, JOA boilerplate — all in the public corpus and therefore in the weights.

What still gates entry

  1. Customer-specific proprietary data — which is exactly what stays on the customer's premises. The foundation model can speak the language fluently; it cannot tell you what's in this operator's 2024 completion report or AFE history. That gap is the buying trigger for the on-prem appliance.
  2. Distribution — relationships, channel, trust. The 150–250 US land services firms have multi-decade trust with mid-size operators that no foundation model can manufacture in 20 quarters. That is why Lean Informatics goes through that channel, not around it.
  3. FDE reliability under regulator scrutiny. SOC 2, signed DPAs, auditable signoff chains, named human accountability, agency procurement readiness. Foundation models don't sign DPAs. People do.
  4. Workflow ownership and process design. Knowing what the AFE evaluation should look like is one thing. Designing the workflow so a 50-well operator can run it in 30 days without firing their landman is another. The model helps; the human still owns the process.
freight train
The call to action, stated for the firms that will read this. If you are a mid-size operator, a land services firm, or a vertically-integrated independent and you are waiting until "the AI thing settles down," you are paying the late-mover tax already. Frontier-model leverage is compounding monthly. Your competitors who deploy now bend their cost curve and compress decision cycles by 5–10×. Old-buddy networks, favoritism in vendor selection, and tribal industry knowledge cannot stop a system that already knows the language and costs one-tenth of what the incumbent is charging. The window in which "we'll wait and see" was a defensible posture closed somewhere between Claude Opus 4 and Enverus ONE. Lean Informatics is built to be the first call you make when you decide the wait is over.

Why this is good news for Lean Informatics specifically

If industry expertise had still been the moat, an industry-adjacent founder like Jonathan would be at a structural disadvantage. The foundation-model leveling event turns that on its head. The wedge becomes:

  • Cross-vertical operating discipline transfers into upstream’s regulator-scrutiny posture better than 20 years of petroleum-only career.
  • Industry-adjacent positioning means familiarity with the operating cadence and audit posture without inheriting industry orthodoxy on how things "have always been done."
  • Vertical-stack engineering background — the discipline of owning the full stack from substrate to operator portal — maps directly onto the on-prem appliance pattern in Addendum B. Cloud-native incumbents cannot retrofit this.
  • Foundation-model fluency as the operator’s translator. The founder doesn’t have to be a 20-year reservoir engineer; the model is. The founder has to be the operator-engineer who knows how to ship the system to the customer’s environment and keep it running.

The full founder profile, including the prior agency-scale deployments that prove these patterns, lives in the Founder section.

The cross-vertical thesis (Addendum B.9) catalogues the eight SaaS winners who succeeded as outsiders in their target industries (Veeva, Toast, Procore, Carta, Stripe, Snowflake, Datadog, Persefoni). Every one of them won by treating institutional jargon and tribal knowledge as a layer to be learned, not a wall to be respected. Foundation models have flattened that layer further. The McLelland framing on X that "outsiders can't compete in O&G" is — in 2026 — self-serving marketing for a previous era.

05Market evidence — Collide as proof of fit

Collide sits one rung below Enverus on the competitive ladder. If §03 is the old guard, Collide is the newcomer whose mere existence and trajectory proves the gate has opened: a single founder team, a $5M Mercury Fund seed (April 2025), a public FDE motion, a domain-tuned LLM, and credible reference customers (Winn Resources, ConocoPhillips-affiliated deployment chatter) inside 12 months. The relevant signal is not "Collide will win the category" — it is that the category is now fundable, sellable, and executable in 2026 by teams that didn't exist in 2024. Lean Informatics' positioning, distribution, architecture, and sequencing are deliberately different (see Addendum B and Section C). The takeaway: the newcomer has proved the wedge, and the foundation-model leveling event (§04) means the wedge is wider than Collide's own positioning admits.

From the public stack diagram, Collide is a three-layer architecture flanked by Forward-Deployed Engineers (FDEs) who handle deployment, configuration, and change management. The "proprietary" labels in the diagram mark what they consider defensible.

LAYER 01 · INGESTION & DATA

Document classification → Domain pipelines → Petroleum embeddings → Security

Reads drilling reports, well logs, completion procedures, scanned land leases, SCADA exports, third-party plant statements. Their "Petroleum Embeddings Model" is marketed at +34% accuracy vs. OpenAI on petroleum terminology — a domain-tuned contrastive embedding (sentence-transformers or proprietary). proprietary proprietary proprietary off-the-shelf security

LAYER 02 · DOMAIN INTELLIGENCE

Agentic orchestration → RIGGS LLM → Knowledge base → Basket of LLMs

RIGGS is the petroleum-tuned LLM, trained on "Spindletop hardware" (their internal training rig). 67.5% on SPE exam subset. Validation/reasoning layer wraps it. Agentic orchestration handles regulatory, production, well-failure flows. A "basket of LLMs" (GPT, Claude, others) is used for general reasoning when domain isn't needed. proprietary proprietary proprietary off-the-shelf LLMs

LAYER 03 · OUTCOMES & APPLICATIONS

Automated workflows → GIS & mapping → Continuous improvement

The user-facing surface. Texas RRC G-10/W-10/H-10 filings, production reconciliation, lease term extraction, dynamic JSAs (job safety analyses) keyed to live weather, ESP failure root-cause. GIS layer lets users chat with maps. Continuous improvement = RLHF on FDE refinement and SME ranking. proprietary proprietary proprietary

FDE
Running alongside all three layers: Forward-Deployed Engineers (geophysicists, completions engineers, landmen) who locate data sources, configure workflows, and own change management. This is the Palantir playbook. It is also where 30–50% of Collide's gross margin lives or dies.

06Claims vs. reality

Treat the marketing surface as untrusted input. Verify before you assume.

ClaimVerdictHonest read
RIGGS beats GPT-5.1 / Grok 4 / Claude Sonnet 4.5 on SPE exam PARTIALLY TRUE 67.5% > 62.5% (Grok) > 52.5% (Sonnet) is plausible. GPT-5.1 at 4% is implausible as a knowledge claim — almost certainly a refusal/format failure on a particular prompt mode. Cite cautiously. The subset is only 40 questions.
Petroleum Embeddings: +34% vs OpenAI DIRECTIONALLY TRUE Domain embeddings beat general embeddings on domain tasks — well-established in the literature (PetroVec, FinBERT, BioBERT pattern). The specific 34% number on what benchmark? Unstated. A solo can replicate the direction of this result on a weekend.
Texas RRC filing: 99.4% time reduction CREDIBLE Winn Resources case study (50 wells / 20 min vs hours). Forms are structured, the RRC publishes EDI specs, the failure mode is tedium not intelligence. Reproducible by a competent script.
"AI-native platform purpose-built for the oilfield" MOSTLY MARKETING The architecture is RAG + agents + fine-tuned model + workflow UI. The positioning is what's purpose-built — FDEs, founder pedigree, vocabulary. Not a fundamentally novel architecture.
"First GenAI platform for energy" CONTESTED Enverus, C3.ai, and others were there first at different sizes. "First" is a positioning claim, not a fact. Enverus ONE (Apr 2026) is the actual category-defining incumbent.
RIGGS as "the intelligence layer underneath every operator's workflow" ASPIRATIONAL Unproven at scale. Single named customer (Winn Resources) so far in public materials. Distribution is the open question, not the model.

07Where the moat actually lives

Strip the tech and look at what's hard to copy:

Five moat candidates, scored

MoatStrengthWhy it matters
Founder distribution — McLelland is ex-roughneck with 111K-view tweets, Chuck Yates is a known oilman. Digital Wildcatters has 8,000+ professional members in 122 countries. UNREPLICATABLE This is the actual moat. Cold outreach as a stranger to an upstream operator is a closed door. Coming in as McLelland is a phone call.
FDE service motion — embed engineers onsite, deliver custom config, learn the patterns, push back into product. HARD Palantir invented this. Real moat — if you have the product spine. A solo can do this for ~2 accounts max.
Workflow-specific data flywheel — every filing they handle, every well-failure pattern, makes the next one easier. EMERGENT Compounds only with multiple customers and clear permission to learn cross-customer. Not yet realized at $5M seed scale.
Regulator audit posture — SOC 2 Type II, signed BAAs, audit trails, named human signoff. EXPENSIVE ~$80–150K and 12–18 months for SOC 2 Type II. Doable for funded company, painful for solo.
RIGGS & petroleum embeddings — their proprietary model + embedding stack. COMMODITIZABLE Open weights + good corpus + MLX = ~80% of the gap closed in 60 days. Not a real moat; marketing-led defensibility.
read
If you internalize one thing: Collide is a distribution company with a model on top, not a model company with distribution on top. That's why the Digital Wildcatters community pre-existed the AI pivot. Knowing this changes how you compete.

08Your toolstack, mapped

The stack you described — Claude, data ingestion, open models, MLX, Hermes, report structure, Houston — is unusually well-suited to this problem. Concretely, here's what it gives you:

Claude
Brain
Frontier reasoning for orchestration, code generation, document understanding. Use Sonnet for production agents, Opus for hard reasoning.
MLX (M5/M4)
Edge
Local 70B inference at 153 GB/s. Fine-tune Qwen-14B or Llama-8B for petroleum on a laptop. Zero per-token cost in the field.
Hermes 4
Steerable
Open-weight, tool-use-ready, ChatML, no refusal tax on operational content. Good base for a "RIGGS-equivalent" fine-tune.
Data ingestion
Pipeline
PDF/CSV/SCADA → chunk → embed → vector store. Standard ware; the value is in the petroleum-specific schema and entity extraction.
Report structure
Output
Templated outputs (filings, JSAs, decision memos) with human-in-the-loop signoff. Closes the trust gap fast.
Houston
Geo
Proximity to operators, RRC, the conferences, the wildcatters. This is non-trivial. You can drive to a wellsite.
tip
The Houston piece is doing more work than you think. The AI-in-O&G conference (500+ execs) is in Houston. CERAWeek is in Houston. The RRC is in Austin (3hr drive). The mid-size E&P operators have HQs in the Galleria, downtown, and The Woodlands. Distribution is a network problem and you're standing inside the network.

09Layer-by-layer solo build

Mapping the Collide architecture to your toolstack, in order of build sequence.

Layer 01 · Ingestion & petroleum embeddings

Solo equivalent:

# corpus: SPE papers, OnePetro abstracts, RRC filing PDFs, lease templates,
# completion procedures, Daily Drilling Reports (scrub PII)
# ~50-200M tokens of petroleum-domain text is the sweet spot

# embedding stack
- BAAI/bge-large-en-v1.5  (start here)
- contrastive fine-tune on petroleum SME-labeled triplets (~3-5K pairs)
- evaluate on a holdout of well-name / formation / equipment queries
- target: +20-30% recall@10 over base on petroleum queries

# pipeline
unstructured.io  → tika fallback for OCR-heavy PDFs
LlamaIndex or LangChain (whichever you tolerate)
Qdrant or pgvector (start with pgvector, scale later)

Verdict: SOLO ACHIEVABLE. Two to four weeks of focused work matches Collide's claim direction. The hard part is corpus curation, not the embedding training.

Layer 02 · RIGGS-equivalent domain model

Solo equivalent:

# base model choices (ranked)
1. Qwen 3.5 14B-Instruct       — best fine-tune ROI, MLX-ready
2. Hermes 4 14B                 — steerable, no refusal tax
3. Llama 3.3 70B                — heaviest, best ceiling, slower iter

# training data (this is the real work)
- SPE papers (you'll need licensing for some)
- Public RRC filings (millions of records)
- Drilling reports (anonymized via customer #1)
- Synthetic Q/A pairs generated from petroleum textbooks → graded by an SME

# infrastructure
- LoRA + MLX-LM on M5 Max for first 2-3 rounds
- Lambda Labs / RunPod for the final full SFT pass
- Eval against the SPE PE exam (you can buy the practice exams)

# realistic target
- 55-60% on the SPE PE exam in 60 days of focused work
- 65-70% in 6 months with domain SME feedback loop

Verdict: SOLO ACHIEVABLE WITH FOCUS. You probably can't match RIGGS's 67.5% in 60 days, but you can get within striking distance — and for the actual customer workflow (filling out a W-10), you don't need to. The eval ≠ the product.

Layer 03 · Agentic orchestration

Solo equivalent:

# the agent loop, kept boring on purpose
- Claude Agent SDK / LangGraph for the high-stakes flow
- Hermes 4 (local) for cheap intra-tool reasoning
- Tool registry: rrc_lookup, well_master_query, scada_pull, pdf_extract,
                  jsa_template_fill, ssa_compute, gis_proximity_check
- Guardrails: every output that mutates state has a "human sign here" step

# observability (do this from day 1, not day 100)
- Langfuse or Helicone for trace replay
- Every prompt versioned in git
- Every customer interaction snapshotted for the eval set

Verdict: SOLO ACHIEVABLE. This is exactly where Claude Code + the Agent SDK shines. One person can ship this faster than a team because there's no coordination tax.

Layer 04 · Outcome surface (the actual product)

Solo equivalent:

  • Pick ONE workflow. Not all of Collide's. Just one. The W-10/G-10 filing is the canonical wedge — it has clear ROI ($300/well/yr in filing labor), public APIs, and Collide has already proven the demand.
  • Build it as a CLI first, then a web app. The first customer doesn't need a polished UI — they need a working pipeline.
  • Templated outputs with human signoff. The user reviews and approves each filing before submission. This is both compliance posture and trust-building.
  • GIS only if customer asks. Leaflet + RRC shapefiles get you 80% of the way. Don't pre-build it.

Verdict: SOLO ACHIEVABLE IN 30 DAYS. The W-10/G-10 use case specifically.

Layer 05 · The FDE motion (you, in person)

Solo equivalent: You are the FDE. You drive to the customer's office in Midland or The Woodlands. You sit with their operations director. You watch them file W-10s by hand. You build the integration in front of them. This is how you win against Collide on a single account: you ship faster because you don't have to schedule a meeting with yourself.

Verdict: SOLO ACHIEVABLE, BUT IT CAPS YOU AT 2–3 ACCOUNTS. This is the fundamental scaling constraint.

10Solo vs. company verdict matrix

The honest gap, capability by capability.

CapabilityCollide (company)You (solo)Gap
Document classification + ingestionCustom pipelines, scaledunstructured.io + Claude + pgvectorclose to zero
Petroleum embeddingsRIGGS embeddings, claimed +34%bge fine-tune on petroleum corpusclosable in weeks
Petroleum LLM (RIGGS)67.5% SPE, trained on SpindletopQwen/Hermes LoRA, 55-65% SPEclosable in months
Agentic orchestrationInternal frameworkClaude Agent SDK + LangGraphsolo may ship faster
GIS mapping over O&G dataProprietary, chat-with-mapLeaflet + RRC shapefiles + Claudeslower polish
W-10/G-10 filing automationLive with Winn Resources30-day build to first customerparity achievable
JSA generation (weather-aware)LiveBuildable in 2 weeksclose to zero
Dynamic well-failure RCAClaimed, low public detailHarder — needs OEM specs + sensor historydepends on data access
SOC 2 Type II postureImplied, presumed in progress~$80–150K + 12–18 mostructural disadvantage
Forward-Deployed Engineers (scaled)Hiring multiple FDEsYou, one human, 2–3 accountscaps your growth
Brand & community distributionDigital Wildcatters, 8K+ members, 111K-view tweetsBuild from zeromonths/years to close
Capital cushion$5M seed runwayPersonal runway + first invoicestructural
Speed of single-customer iterationInternal coordination costYou ship same daySOLO ADVANTAGE
Burn rate~$150–300K/mo blended~$8–15K/mo all-inSOLO ADVANTAGE
Pricing flexibilityEnterprise floor (~$50K/yr+)$2K–15K/mo, flexibleSOLO ADVANTAGE

11What a solo can't do (honest)

This section exists so you don't fool yourself.

  1. You can't scale FDEs. Past 2–3 accounts, you're either turning customers away or becoming a consultancy. Plan the bottleneck.
  2. You can't sign with a supermajor. No procurement department will sign with a Texas LLC of one without a SOC 2 letter, named executives, and references. Aim middle: 5–50 well operators.
  3. You can't outrun a brand-distribution flywheel. McLelland's tweets do free pipeline generation. You'll need either a personal brand strategy or a partnership.
  4. You can't easily defend against open-source. Everything you build, someone could open-source 6 months later. The defense is customer entrenchment and workflow ownership, not novel IP.
  5. You can't compete on "platform" framing. Don't try. Compete on "outcome": a number on a contract, signed in dollars saved.
  6. You can't ignore Enverus. Enverus ONE (April 2026, with Astra model + SOC 2 Type II + 25 years of data + Continental/BPX/Chord partnerships) is the real giant. Position around them, not Collide.
honest
If you still find this attractive after reading the can't-do list — that's the signal you should do it. Most people won't sit with this list.

12The Houston wedge

The right way to do this is to pick the narrowest defensible wedge and own it before anyone notices.

The wedge: Texas RRC compliance for mid-size operators

  • Why this customer: Mid-size Texas operators (5–200 wells) file W-10, G-10, PR forms monthly. Most do it by hand or with Excel. The pain is real, recurring, and quantifiable.
  • Why this workflow: Public forms, public data, public APIs (RRC EDI), low political risk if it breaks (filing is reviewed before submission). Easy to demo, easy to price.
  • Why this geography: The RRC is in Austin. Texas operators are in Houston, Midland, Tyler, Fort Worth. You can be on a wellsite in 5 hours.
  • Why this moment: Collide has proven the demand with Winn Resources. The market is now educated. You don't need to evangelize; you need to be cheaper, faster, and closer for operators below their floor.

Positioning against Collide

We're the RRC compliance pipeline for operators too small for Collide and too tired of spreadsheets to keep doing it by hand. White-glove, fixed-price, deployed in your office in two weeks. We don't sell a platform — we sell completed filings. — draft positioning statement

Three customer profiles to target

  • Family-owned Permian operator, 10–40 wells. Owner runs the filings themselves. Hates it. Will pay $1500/mo to make it go away.
  • Midstream gathering company, 50–150 wells. Has a controller doing this 3 days/month. Math is obvious.
  • Mineral rights manager / landman service company. Filings for multiple clients. They'd resell your tool as part of their service.

1390 / 180 / 540 day plan

Days 1–30 · Build the wedge

  • Set up the project structure: monorepo, pgvector, Claude Agent SDK, unstructured.io pipeline.
  • Get 100 sample W-10 and G-10 filings from the public RRC archive.
  • Build the end-to-end pipeline against your own well data (synthetic if needed): read SCADA exports → reconcile production → generate filing → human signoff → submit via EDI.
  • Write the eval set: 25 wells, known correct answers, runs in 5 minutes.
  • Buy domain. Build a 1-page landing page. Start writing on LinkedIn / X about RRC pain.

Days 31–90 · First customer, in person

  • Get one paying customer at $1.5–5K/month. Houston-area, friend-of-friend, or community connection.
  • Drive to their office. Sit with their ops person. Watch the manual process for half a day.
  • Ship the integration in 2 weeks. Use the next 2 to harden against their actual edge cases.
  • Document everything: a runbook, a one-pager, a case study with hours-saved math.
  • Start the SOC 2 readiness conversation with Vanta or Drata. Don't start the audit yet.

Days 91–180 · Second customer + the eval moat

  • Use case study #1 to land customer #2 and #3 (target: 3 customers at $36K–120K ARR by day 180).
  • Start the petroleum embedding fine-tune in earnest — you now have real corpus from customer data (with permission).
  • Begin the petroleum domain LLM fine-tune. Target: 55% on SPE practice exam.
  • Apply to TX Railroad Commission as an authorized filer / EDI participant if not already.
  • Hire a fractional ops person to handle customer onboarding so you stay on engineering.

Days 181–540 · Decide what you are

  • Path A — Lifestyle consultancy. 5–10 customers, $400K–1.2M ARR, you run forever. No outside capital. Houston gold.
  • Path B — Productize and raise. The same workflow, repackaged as self-serve. Raise a small angel round to hire 2 FDEs. Compete with Collide directly.
  • Path C — Acquihire / partner. Sell to Collide, Enverus, or Quorum as a workflow module. Your code + your customers = their next-month roadmap.
  • Decide which based on the market signal: how fast customers came, what they're asking for next.

14Costs, pricing, margins

Cost stack (monthly, year one)

Line itemMonthlyNotes
Claude API (Sonnet + Opus mix)$400–1500Scales with customer count and document volume
Embedding API / self-hosted$50–200Mostly free after MLX local serve
Vector DB (Qdrant Cloud / pgvector on Hetzner)$50–200Self-hosted dirt cheap
Observability (Langfuse, Sentry)$100Start free tier
Cloud compute (1 small VPS, occasional GPU rent)$200–500Hetzner + Lambda Labs on demand
Vanta / Drata (SOC 2 readiness)$1000Start month 4
LLC, accounting, insurance$300E&O insurance becomes essential customer 2+
You (founder draw)$6000–10000Houston cost of living
Total$8–14K/movs. Collide's est. $150–300K/mo

Pricing menu

TierPriceIncludes
Pilot (30 days)$5,000 one-timeOnsite setup, one workflow, 50 filings
Operator$1,500/moUp to 50 wells, monthly filings, JSAs, support SLA
Operator+$4,500/moUp to 200 wells, custom workflows, GIS, dedicated Slack
FDE engagement$15K/mo flatHalf your time on one account, custom build

Path to $400K ARR

5 Operator+ ($270K) + 4 Operator ($72K) + one $5K pilot/month ($60K). Achievable in 12–18 months from a standing start with disciplined sales discipline. Gross margin: ~85%. Net margin (incl. your salary): ~30–40%.

15Risk register

RiskSeverityMitigation
Collide drops price to floor or open-sources commoditized layersmediumCompete on white-glove + smaller account fit, not platform
Enverus ONE crushes the small-operator segment with a sub-$1K tiermediumMove fastest in the small-operator gap and entrench before they do
RRC changes filing format / EDI specmediumSubscribe to RRC bulletins, build adapter pattern, charge for the migration
You burn out being a one-person FDEhighCap at 3 active customers, hire fractional ops by month 4
Procurement rejects you for lack of SOC 2highStart Vanta month 1, get Type I within 6 months
Hallucinated filing causes customer regulatory issueexistentialHuman-in-loop on every submission, E&O insurance, no auto-submit ever
Foundation model price war pulls floor outlowYou benefit — lower inference cost. Open weights insulate.
Collide acqui-offers and you say yesgood problemNegotiate from the position of running cash-flow positive

16Sources

Every claim in this document traces back to one of the following. When in doubt, prefer the primary source over the secondary.

Collide.io primary

Competitive landscape — Enverus (the old guard)

Foundation-model leveling — petroleum corpus & domain LLMs

Forward-Deployed Engineer model & vertical AI

Solo toolstack & open models

Persona profiles

Texas RRC, filings, regulatory

APersona deep dive — Digital Wildcatters & the operator class

The §07 verdict — Collide's real moat is distribution, not architecture — only matters if you can name the people who make distribution real. This addendum profiles four reference points: the Digital Wildcatters flywheel that birthed Collide, plus three named operators / executives whose patterns are worth modeling. For each, the question is the same: where does the FDE model fit when you're standing where they're standing?

The four-way fit framework

For each persona, evaluate four roles:

  1. As a buyer — would they pay for your FDE engagement?
  2. As an advisor — what would they teach you?
  3. As a partner / distribution — would they help you reach customers?
  4. As an acquirer — would they (or someone like them) eventually buy you?

A.1Digital Wildcatters — anatomy of the flywheel

You can't understand Collide without understanding the machine that made it. Digital Wildcatters was founded by Collin McLelland (ex-roughneck, "Fracslap" on X) and Jake Corley in 2019 as a podcast. Chuck Yates — ex-$8B energy fund manager, "fired in April 2020" — joined and lent industry credibility. Collide was spun out of the community, not built into it.

Members
9,000+
Hand-vetted petroleum engineers, operators, founders across 122 countries.
Seed funding
$2.5M
For Digital Wildcatters (separate from Collide's later $5M seed).
Flagship event
Fuze
Annual Houston conference. Past attendees: Devon, Halliburton, Accenture, FTI, AWS, McKinsey.
Surface area
Podcasts + events + community + software
Oil and Gas Startups, Chuck Yates Got A Job, Energy Tech Night, DW Power Hour, Collide product.

How the flywheel actually spins

  1. Podcast as top-of-funnel. Every episode is a free industry interview. Operators come for the stories, stay because they like the hosts.
  2. Community as middle. The 9,000-member network is where the actual relationships form — deal flow, hiring, vendor referrals.
  3. Events as conversion. Fuze and Energy Tech Night convert relationships into pipeline. Sponsors pay; vendors prospect; founders pitch.
  4. Collide as monetization. The AI product sells back into the community that already trusts the brand. Pre-built warm market.
key
The order matters. Audience first, product second. Most vertical AI startups try to do this in reverse and fail. Collide had a 9,000-person warm market on day one of the AI pivot. That's why a $5M seed felt like enough.

What this means for your build

  • You can't replicate the flywheel in 12 months. Don't try. A podcast takes 3 years to build an audience, and McLelland/Yates have a 7-year head start.
  • You can plug into the flywheel. Be on Energy Tech Night as a startup pitcher. Sponsor a single Fuze track. Get on a podcast.
  • You can build a counter-flywheel in a sub-niche they don't cover. Compliance officers? Landmen? Lease analysts? Pick a verb the Wildcatters don't own.
  • You can use their members as customer discovery. The community is searchable, the conversations are public, the pain points are documented.

A.2Allen Gilmer — the DrillingInfo / Enverus playbook

If Collide has a north star, it's Allen Gilmer. He's done a version of this exact arc, only with structured data instead of LLMs, two decades earlier.

The pattern

  • 1999: Co-founded DrillingInfo in Austin with Mark Nibbelink. Started by physically collecting Texas drilling permits daily and turning them into a searchable database.
  • 2010s: Layered analytics, GIS, and well-economics onto the permit core. Became the de-facto E&P data platform.
  • 2018: Acquired by Genstar Capital (San Francisco PE).
  • 2019: Rebranded to Enverus on the 20-year anniversary. Now the energy industry's data & AI exec layer.
  • 2021: Gilmer retired from the Enverus board.
  • 2025: Joined Oil & Gas Asset Clearinghouse (OGAC) as Principal Partner. Active at MaScience. Also runs Tiki Tāne Pictures (film production, with industry vets) — he's deliberately diversified out of pure energy.

What he'd teach you (and what Collide already learned)

Data acquisition is the moat. Everything else — the UI, the model, the analytics — can be replaced. But if you're the only one who systematically captures the daily permit, the daily filing, the daily completion report — you become indispensable in seven years, not seven months. — Gilmer playbook, paraphrased from his public commentary

The four-way fit

RoleFitReasoning
Buyerunlikely directHe's an investor/advisor now, not an operator buying tooling. But he'd be the kind of person who endorses you to operators.
AdvisoridealHe has lived the data-to-platform arc. A 30-minute call with Gilmer about "should I build the corpus first or the model first" would be worth more than 100 hours of YouTube.
Partner / distributionvia OGACOGAC's clearinghouse customer base is upstream sellers/buyers — an adjacent audience to whoever buys your filing automation.
Acquirernot him; his portfolio peersEnverus itself is the obvious strategic acquirer of a workflow-AI tuck-in. Genstar's playbook is to consolidate. Build with that exit logic in mind.
action
If you can get an introduction to Gilmer through Digital Wildcatters or the OGAC/MaScience network, do it before you build. One conversation reshapes your roadmap.

A.3Jim Flores @ Sable Offshore — the single-asset political operator

Flores is a different category entirely. He runs Sable Offshore Corp (NYSE: SOC), HQ Houston, with a single concentrated asset: the Santa Ynez Unit (SYU) in federal waters offshore California. Platform Harmony was producing ~22,000 gross bopd as of March 2026 after restart.

Why Sable is unlike a Texas E&P

  • Federal waters, not state. Regulator is BOEM (Bureau of Ocean Energy Management) + BSEE for safety, not the Texas RRC.
  • Political asset. Restart was contested by California Coastal Commission; required Defense Production Act invocation by the Trump Administration in March 2026. DOJ hearing on Consent Decree modification set for June 1, 2026 in C.D. Cal.
  • Pipeline blocked. The Las Flores Pipeline System remains under dispute. Sable submitted a revised plan to BOEM in October 2025 proposing an offshore storage & treating (OS&T) strategy with shuttle tankers as a workaround.
  • Single-asset risk profile. One platform's compliance posture is the entire company's compliance posture.

What an FDE engagement at Sable would look like

Sable's compliance burden is high-stakes, low-volume, and bespoke. This is the opposite end of the spectrum from the W-10/G-10 high-volume Texas RRC wedge. The right product for Sable isn't a filing automation pipeline — it's:

  • A regulatory document understanding system that ingests the Consent Decree, NEPA filings, BOEM correspondence, CCC briefs, and surfaces obligations + deadlines.
  • A stakeholder mapping tool that tracks every party (DOJ, BOEM, CCC, Santa Barbara County, NGOs, plaintiff coalitions) and what they've said publicly.
  • A scenario modeling engine for "if the pipeline restarts vs. if we go shuttle tanker, what's the production curve and the compliance trail."

The four-way fit

RoleFitReasoning
Buyernarrow, high-valueSable could pay $250K–1M annually for a high-quality regulatory-intelligence FDE engagement. But this is not your wedge product — it's a sidecar consulting line at best.
AdvisorlimitedFlores's deal-making is uniquely his. The lessons don't generalize to a solo software founder. Useful as a case study, not a mentor pattern.
Partner / distributionnoSable is single-asset. Their network isn't a fit for a Texas RRC compliance product.
AcquirernoOperators don't buy software companies. Wrong logic chain.
guard
Don't chase a Sable-shaped logo as your first customer. It's seductive (high contract value, name recognition) but the workflow doesn't generalize, the sales cycle is 9–18 months, and the political surface is dangerous. Stay in the mid-market upstream wedge until you have product-market fit, then maybe open a regulatory-intelligence sidecar.

A.4Bryan Hanks @ BETA Land Services — the existing FDE business

This is the most strategically interesting profile of the four. BETA Land Services is already a Forward-Deployed Engineer business — they just use landmen instead of software engineers. The model isn't theoretical for Hanks. He's been running it for 30+ years.

Founded
Lafayette LA
With Texas + Gulf Coast operations.
Acres developed
4.3M+
Across thousands of wells.
M&A managed
$40B+
120+ corporate transactions since 2010.
Hanks experience
41 years
CPL (Certified Professional Landman).

What BETA actually does

Land & lease acquisition, due diligence, abstracting, title research, title curative work, right-of-way / pipeline projects, and increasingly: solar, wind, transmission line, carbon sequestration, and battery storage. They sell completed landwork. Customer hands them an acreage block; BETA returns a clean title, an executed lease bundle, a ROW package.

This is exactly the value proposition you'd want for your AI-powered version: completed filings, not a platform. The shape of the offering is identical. Only the technology stack changes.

The strategic question

The question for you is whether BETA is a competitor, a customer, a partner, or a template. The honest answer is: all four, depending on framing.

  • Competitor framing: If you sell title-curative AI to operators, BETA's landmen lose hours. They have inertia, relationships, and brand on their side. You have margin and speed.
  • Customer framing: Sell to BETA, not around them. Your AI becomes their internal force multiplier. Same landmen, 3x throughput. They keep the customer relationship, you take a per-acre fee.
  • Partner framing: White-label your filing automation as "BETA Compliance" or "BETA Digital." They distribute, you build. Their existing customers get an instant upgrade.
  • Template framing: Even if you never talk to Hanks, BETA's service motion — embedded specialists, fixed-scope deliverables, charge by the project — is the right model to copy.

The four-way fit

RoleFitReasoning
Buyervery strongBETA already pays for tooling that makes their landmen more productive. The pitch writes itself: "30% throughput on title work, same headcount." Test this in 2 months.
AdvisorunderratedHanks has lived the FDE motion at scale. He's seen what wins and loses in the field for 41 years. Worth 10x more than any VC partner on this specific question.
Partner / distributionstructural fitBETA's customers are precisely the mid-size operators you want. They sit between you and them today. Be the AI engine inside their service.
Acquirercredible at scaleA profitable land services company expanding into adjacencies (solar, carbon) needs digital capability. Strategic acquisition logic is real if you reach $1M ARR.
play
The BETA move: Once you have one mid-size operator running on your RRC pipeline, build a "land package" extension — lease term extraction, title chain assembly, ROW document generation — and pitch BETA as your second customer. They have the workflow, the volume, and the budget. They are the missing-link distribution.

A.5Cross-walk: who matters for what

One table, the operative summary of the four profiles.

PersonaBest framingWhat they unlockHow to engage
Digital Wildcatters
community + Collide
Distribution flywheel 9,000-member warm market, Fuze access, podcast reach Sponsor Energy Tech Night, pitch as a startup, get on a podcast, lurk in the community
Allen Gilmer
DrillingInfo / Enverus / OGAC
Advisor + acquisition logic Strategic playbook from the founder who did this 20 years ago in structured data Warm intro via Wildcatters, OGAC, or Austin energy scene; one 30-min advisory call
Jim Flores @ Sable
Sable Offshore Corp
High-CV sidecar customer (eventually) Regulatory-intelligence niche if you ever want to expand off the Texas wedge Don't chase. Park as a future opportunity once you have a SOC 2 letter and 3 case studies
Bryan Hanks @ BETA
BETA Land Services
Partner / distribution / acquirer Existing FDE business with customer base and operations; structural fit for AI tooling Cold email after customer #1 lands. Pitch as "internal throughput multiplier." Houston/Lafayette is a 4-hour drive.

The deeper takeaway

These four profiles span the full operator class:

  • Digital Wildcatters is the distribution archetype — how you reach the customer.
  • Allen Gilmer is the founder archetype — how you build the long game.
  • Jim Flores is the edge-case customer archetype — how you don't get distracted.
  • Bryan Hanks is the scaled FDE archetype — how you avoid hiring twenty engineers by partnering with someone who already has them.
order
Engagement sequence, ranked: (1) Hanks — he unlocks the most leverage per conversation. (2) Wildcatters community — cheapest distribution. (3) Gilmer — one call, lifetime payoff. (4) Flores — never until you're ready, then carefully.

BLean Informatics — on-prem architecture & founder fit

This addendum specifies the reference architecture for the deployment topology where customer data never leaves the customer’s perimeter. It is one option in a customer-centric menu — alongside customer-owned colo, customer-managed hyperscaler tenancy, and hybrid — not Lean Informatics’ identity. We meet the same security standards (SOC 2 Type II, ISO 27001, KMS, audit logging) across every topology because those standards are industry-commoditized in 2026. We document this option in depth because (a) the founder’s prior art makes it unusually well-matched to LI’s delivery DNA, and (b) for the sovereignty-sensitive subset of upstream workloads — completion designs, M&A diligence, AFE pricing models, JOA-sensitive negotiation data — on-prem remains the buying trigger.

posture
This section is intentionally not promotional. The architecture works for specific wedges and breaks down for others. Read §B.8 before committing to it as the company default.

B.1Architectural prior art — vertical-stack pattern that translates

The architectural pattern that underwrites the on-prem appliance is not invented for this project. It is an adaptation of a vertical-stack design pattern Lean Informatics’ founder has shipped before at industrial scale. The founder profile that backs this is detailed in the Founder section; this addendum addresses the architecture only.

The pattern, abridged

  • Vertically integrated data plane. Origination → transport → addressed endpoint. End-to-end ownership of the substrate; central infrastructure never needs to be implicitly trusted.
  • Opaque, addressed payloads. Structured data on the wire, meaningless without the receiver-side codebook. The codebook lives at the endpoint, not in the transport layer.
  • Edge-resident logic. Decode, routing, and failover all run on the endpoint. Resilience is a property of the endpoint, not of central infrastructure.
  • Multi-source failover. Primary path plus independent secondary paths. No single point of failure between origination and endpoint.

What that translates to in the AI architecture

Vertical-stack patternOn-prem AI analogWhy it matters here
End-to-end vertical-stack ownershipVertical stack ownership (appliance → model → output portal)The whole vertical ships as one product. Most software competitors only own the top of the stack.
Opaque addressed payloads on the wireToken/codebook layer at the customer edgeMeaning lives at the endpoint, not in the transport. External observers see structured noise.
Receiver-side logic + auto fail-overAppliance-side routing logic + paired failover unitThe smarts live where the data lives. Central infrastructure does not need to be trusted.
Primary + independent secondary pathsLocal inference primary + optional cloud burst for non-sensitive secondaryTwo independent paths, customer-controlled which is used per workload.
Audit-grade procurement postureSame procurement DNA: audit trails, chain-of-custody, named operator accountabilityThe compliance posture is transferred discipline, not a learning curve.
edge
The competitive read: cloud-first vertical-AI competitors are retrofitting enterprise security from a flat starting point. Lean Informatics can credibly offer customer-premises and customer-substrate options because the vertical-stack pattern is direct prior art, not a learning curve.

B.2The architecture

Customer-Owned Inference Appliance + Lean Informatics FDE Operation. Each customer gets a dedicated unit. No cross-customer data pooling. Lean Informatics holds no raw customer data on its own infrastructure at any time.

PLANE 01 · CUSTOMER PREMISES (TRUST DOMAIN)

The appliance — single GPU node + tokenization layer

2U or 4U server lives in customer's server room, on-site closet, or a customer-paid cage at a Tier-II colo of the customer's choosing. Holds: raw documents, tokenization codebook, embeddings, vector index, knowledge graph, agent runtime, audit log. This box is the trust boundary. No raw data leaves it.

PLANE 02 · CONTROL CHANNEL (AUDITED, NARROW)

Outbound: telemetry, attestations, signed updates only

From appliance to Lean Informatics: signed health telemetry, model attestation hashes, log integrity proofs, and signed software-update receipts. No customer data, no embeddings, no document content. Wireguard tunnel with mutual TLS, customer-controlled kill switch.

PLANE 03 · LEAN INFORMATICS HQ (NO CUSTOMER DATA)

Model lab + signed-update bus + FDE workstation

Houston-side: model fine-tuning against synthetic + customer-anonymized eval sets, signed model+config builds, FDE remote-ops workstation. Lean Informatics never holds a customer's raw documents. If subpoenaed, LI has nothing to produce. That is a feature, not an inconvenience.

Operating modes per customer

  • Air-gapped: Appliance has no network path to LI. FDE flies in monthly for updates via signed offline media. Highest sovereignty, slowest iteration.
  • DMZ / kill-switch: Outbound-only control channel, customer-controlled physical disconnect. Default mode for most customers.
  • Hybrid burst: Local inference primary, optional cloud burst (LLM API) for non-sensitive secondary workloads, gated by customer policy per workflow. Used only when customer explicitly opts in.

B.3Reference appliance BOM

One appliance generation per customer cohort. Designed to be unremarkable hardware that any enterprise IT department recognizes and can maintain.

ComponentReference partNotes
ChassisSupermicro 4U GPU server (4124GS-TNR or similar)Standard rack, dual PSU, IPMI
CPU2× AMD EPYC 9354 (32C)Headroom for tokenizer + retrieval + agent pipelines
GPU (option A)1× NVIDIA L40S (48 GB)Runs 30–70B quantized; ~$8–10K street
GPU (option B)2× NVIDIA H100 PCIe (80 GB)Heavy inference + light fine-tune; ~$50–60K street
RAM512 GB DDR5 ECCKnowledge-graph + vector index resident
Storage (data)2× 7.68 TB NVMe (LUKS, mirror)Self-encrypting, FIPS 140-3 SED preferred
Storage (OS)2× 480 GB NVMe (mirror, encrypted)Read-only mount post-boot
NetworkDual 10/25 GbE + IPMIOut-of-band on isolated VLAN
SecurityTPM 2.0, Secure Boot, IPMI-disabled-by-defaultMeasured boot, attestation chain
Key mgmtYubiHSM 2 or external HSM (customer choice)Document encryption keys never leave HSM
Failover unitIdentical spare in cold standbyManual cutover; 30-min RTO. Optional.

Per-unit cost (Lean Informatics side)

  • BOM (Option A, L40S): ~$28–38K street, ~$22–30K negotiated direct.
  • BOM (Option B, dual H100): ~$95–120K street, ~$75–95K negotiated.
  • Burn-in + provisioning + customer-specific imaging: ~$2–4K of LI time per unit.
  • Shipping & installation logistics: ~$1–2K (white-glove freight, on-site time).

B.4Data plane & opacity layer

The opacity discipline carried over from the founder’s prior architectural work (see B.1 and Founder section), applied to AI:

Tokenization at ingest

  • Customer-identifying entities — well names, API numbers, lease IDs, operator names, person names — pass through a tokenization layer on the appliance before any model sees them.
  • The codebook lives on the appliance HSM. Lean Informatics never possesses it.
  • Even if a model output or log were exfiltrated, identifying entities are opaque without the local codebook. Same principle as the prior-art transport layer: structured data, meaningless without the receiver-side codebook.

What crosses to Lean Informatics (and what does not)

DataCrosses?Why
Customer documents (PDF, CSV, SCADA exports)noStay on appliance
Embeddings of customer documentsnoReversible — treated as sensitive
Knowledge-graph nodes/edgesnoStay on appliance
Model weights (LI-provided)yes, signedPushed from LI to appliance, attested
Health metrics (CPU, GPU, disk %)yesFor SLA + remote diagnosis
Tokenized eval-set statistics (counts, accuracy on tokenized fixtures)yes, with consentOpt-in. Used to improve next model rev.
Raw error traces with contentnoLogs are redacted on appliance before any export

B.5Operational model (FDE-as-service)

The Forward-Deployed Engineer is no longer optional — it's the unit of value delivery.

Engagement phases

  1. Scoping (Week 0–2): NDA, SOW, single workflow defined. Tabletop walkthrough with customer's IT, ops, and (when applicable) compliance.
  2. Provisioning (Week 2–4): Appliance imaged at LI Houston bench. Customer-specific tokenization codebook generated and burned into HSM. Burn-in + integration tests against synthetic data.
  3. Onsite deployment (Week 4–5): FDE flies in. Rack-and-stack, network handoff, customer-witnessed key ceremony, smoke tests.
  4. Workflow onboarding (Week 5–8): FDE resident or daily-onsite. Real customer documents ingested onto appliance, never leaving. First production outputs delivered with customer sign-off on each.
  5. Steady state (Month 3+): Remote operation via audited tunnel. Monthly onsite cadence. Quarterly key rotation. Annual physical security audit jointly with customer IT.

Audit posture from day one

  • Every action by LI's FDE on the appliance is logged with cryptographic chain-of-custody.
  • Customer's SIEM (if they have one) receives a live audit feed. If they don't, the appliance generates a signed weekly audit summary.
  • Customer holds the kill switch. Severing the control channel does not impair the appliance — it just freezes updates and remote support.
  • LI carries E&O + cyber liability insurance from day one (~$15–25K/yr at the appropriate coverage).
posture
The discipline above is roughly the same posture used in cross-domain-solution (CDS) and other agency-grade deployments. The founder has shipped to this bar (see Founder). Re-using the muscle memory is the actual moat, not the hardware.

B.6Pricing & unit economics

Customer-facing menu

TierOne-timeRecurringWhat's included
Appliance Lite (L40S)$25K onboarding$5,500/moSingle L40S unit, one workflow, monthly onsite, business-hours support
Appliance Pro (dual H100)$45K onboarding$11,000/moDual H100, up to 3 workflows, bi-weekly onsite, 24/7 critical-issue line
Air-gap Compliance$60K onboarding$13,500/moPro tier + paired failover unit + quarterly physical security audit + customer-witnessed key ceremonies
Custom (agency-style high-assurance)quotedquotedStatement of work, SCIF-compatible options, classification handling

Lean Informatics unit economics (per Appliance Pro customer)

  • Year-1 revenue: $45K onboarding + $132K recurring = $177K.
  • Year-1 cost of delivery:
    • Hardware (amortized over 3 years): ~$28–32K/yr
    • FDE time (founder, ~30% allocation): equivalent ~$45–60K/yr
    • Travel (onsite cadence): ~$8–12K/yr
    • Cyber insurance allocation: ~$3K/yr
    • Tooling, monitoring, software allocation: ~$4K/yr
  • Y1 gross margin: ~50–55%. Improves to ~65–70% in year 2 (no onboarding cost, hardware partially amortized, FDE allocation drops as ops becomes routine).
  • Three customers at Pro tier = ~$530K Y1 ARR, ~$1.0–1.2M run-rate by Y2 if onboarding spreads.
limit
FDE cadence caps a solo at 3–4 active Pro accounts. If the on-prem play wins, you must hire a second FDE by customer 4. Plan for that hire to be a cross-vertical field engineer with audit-grade ops background, not a generalist software engineer.

B.7Wedge realignment under the on-prem model

The on-prem architecture is more expensive and slower to ship than the cloud wedge in §12. It only makes sense for workflows where data sovereignty is the buying trigger. That is not the Texas RRC filing wedge.

Two-track strategy

TrackWedgeArchitecturePurpose
Track 1 — cash flowTexas RRC W-10 / G-10 compliance for mid-size operators (§12)Cloud-native (Claude + pgvector + Hetzner)Public data, fast deploy, 30-day sales cycle. Funds the company.
Track 2 — defensibilityConfidential workflows: completion design optimization, geosteering, M&A diligence, lease portfolio strategy, subsurface modelingLean Informatics appliance (this addendum)High-CV, sticky, true moat. Competes against Collide/Enverus on sovereignty, not features.

Recommended Track-2 first wedge candidates, ranked

  1. Confidential M&A diligence for upstream transactions. Buyer-side data room ingestion, well-by-well economics, lease overlap, environmental liability surface. Tied to BETA Land Services partnership in §A.4 — they handle 120+ deals annually. Sovereignty is non-negotiable because every party in a deal room is a competitor of the others.
  2. Completion design optimization for Permian/Eagle Ford independents. Operator's frac design + offset performance + lateral spacing — the actual secret sauce. No operator will put this in Collide or Enverus.
  3. Geosteering interpretation + subsurface modeling. Live interpretation during drilling operations. Latency-sensitive (favoring edge inference) and IP-sensitive (favoring on-prem).
  4. Public-safety adjacencies. Wildland-urban-interface (WUI) fire risk to operator assets, hurricane evacuation logistics for offshore crews, emergency-management integration. The founder’s prior network opens this category (see Founder section).

B.8Verdict — when this architecture wins and when it loses

Wins

  • Where the customer's well/completion/lease data is treated as competitive IP. Mid-size Permian and Eagle Ford operators competing with majors. Yes.
  • Where the buyer's compliance/risk officer signs off, not just the engineer. The on-prem story wins that signoff.
  • Where the founder’s prior agency-grade procurement experience is the differentiator (see Founder). No vertical-AI competitor in O&G credibly has it.
  • Where the architectural prior art (see B.1) demonstrates “this team has shipped sovereignty-first systems before.” That story is unfakeable.

Loses

  • For Texas RRC filings, where the data is public anyway. Cloud wedge wins.
  • For any wedge where time-to-first-value < 30 days is the buying criterion. On-prem can't beat SaaS on speed.
  • For customers below ~$300K revenue from this single workflow. Onboarding cost amortizes wrong.
  • If Lean Informatics tries to run this and the cloud wedge at solo headcount without sequencing. Sequence Track-1 first; Track-2 starts after first paying Track-1 customer.

Decision criteria for committing to Track 2

  • One Track-1 customer live and reference-able (≈Month 4).
  • Two qualified Track-2 prospects with budget authority identified (target: one BETA-channel, one direct operator).
  • First appliance BOM purchased only after a signed LOI on the first Track-2 customer.
read
The honest call: Track 2 is the actual long-term business. Track 1 is the cash-flow bridge that buys the credibility to sell Track 2. Treat them as two products with two architectures, sold to overlapping customers, with the founder’s prior agency-grade track record (see Founder) as the wedge that makes Track 2 credible in year one rather than year three.

B.9The cross-vertical founder thesis

Collide's marketing leans on a specific implicit claim: that oil & gas software fails when built by outsiders, because outsiders automate workflows without understanding why those workflows exist. McLelland says this plainly on the company blog. It is a self-serving framing. It also doesn't survive contact with the venture data.

What the evidence actually says

The best vertical SaaS companies of the last fifteen years were largely built by cross-vertical founders:

CompanyVerticalFounder background
Veeva SystemsPharma CRMPeter Gassner — Salesforce, not pharma
ToastRestaurant POSThree founders — none were restaurant operators
ProcoreConstruction mgmtTooey Courtemanche — real estate, not construction
CartaCap tables / equityHenry Ward — finance generalist, not equity admin
StripePayments infraCollison brothers — outsiders to payments
SnowflakeData warehouseMuglia (Microsoft), Dageville (Oracle) — outside the incumbent OLAP world
DatadogCloud monitoringPomel, Lê-Quôc — ex-Wireless Generation, an education company
PersefoniCarbon accountingFounders from finance, not climate science

What is — and isn't — truly vertical-specific

  • Compliance frameworks are universal. SOC 2 is SOC 2. GAAP is GAAP. SOX, audit posture, E&O insurance, evidence handling, chain-of-custody — portable across industries. A founder who has shipped under agency-grade audit requirements doesn’t need to relearn that discipline because the customer industry changes.
  • Cyclic economies share structure. Capex-intensive, M&A-driven, regulated industries on commodity cycles all behave the same way at the business-model layer. The shape of the cycle in upstream O&G is more dramatic than the FM/notification cycle, but the dynamics — cost-cutting in down years, capex unlocked in up years, M&A peaks at cycle bottoms — are recognizable.
  • Vernacular and culture are learnable. Six to twelve months of intentional customer discovery, time at industry events, and FDE onsite hours gets a startup veteran functionally fluent. This is empirically how most successful vertical SaaS founders learned their domain.
  • What insiders genuinely have: a head start on the first 5–10 customer conversations, faster intuition for "this won't work in the field" failure modes, and existing brand. None of these are insurmountable. All three can be addressed by partnering with an industry-veteran advisor and through the FDE motion.

The Lean Informatics reframe

Stated honestly, Lean Informatics' founder profile is:

  • Cross-vertical pattern recognition. Having shipped to multiple distinct procurement contexts — each with its own vernacular, vendor approval process, and political surface — is exactly the muscle that lets a founder enter a new vertical and avoid the rookie traps faster than someone who only knows one vertical. The specific track record is documented in the Founder section.
  • Startup operational discipline. Going from zero to one is its own profession. Founders who have done it before do it faster on the second and third attempt regardless of vertical.
  • Vertical-integration experience. Building a full stack from satellite through transmitter to receiver firmware is a rare skill set in the AI-for-O&G field. Most competitors stop at the application layer.
  • Autodidactic learning posture. The single trait that most reliably predicts cross-vertical success. Learning a new vocabulary, reading the trade publications, attending the conferences, sitting on the wellsite — this is a 6–12 month effort, not a 6–12 year one.

The honest read on McLelland's framing: it's true that outsiders who skip customer discovery build operationally useless tools. It's not true that outsiders who do the work can't compete. The history of vertical SaaS is mostly outsiders who did the work.

posture
Operative implication: Lean Informatics doesn’t compete by pretending to be an oil-patch native. It competes by showing up with deep operating discipline, vertical-stack engineering experience, and a willingness to sit with the operator until the workflow works. That’s a different mode than Collide is running, and it’s defensible precisely because most competitors can’t do the first two. The founder profile that backs this posture is detailed in the Founder section.

Founder & Lead Service Expert: Jonathan Adams

Houston-based startup veteran. Founder of Lean Informatics. Lead service architect on customer engagements until the FDE bench is staffed to take over — per the Epic Systems and Palantir patterns, the founder does the first deployments personally to set the bar for the implementers who follow.

Background

Prior company: founder and operator of a mass-notification platform built on FM RBDS Group 7A (Radio Broadcasting Data System). The platform integrated satellite-fed message origination with a network of terrestrial FM transmitters and addressed receivers; PIN + service-code addressing rode on the 57 kHz subcarrier; receivers held the decode logic and auto-failed over to the next FM tower carrying the customer’s PIN if the current carrier dropped.

Customer base spanned four procurement contexts, each with distinct vernacular, audit posture, and political surface: US Department of Defense, state homeland security offices, county emergency management and sheriffs’ offices, and fire-warning networks. Many of the deployments concentrated in Texas, Louisiana, and Gulf Coast jurisdictions co-located with active upstream oil & gas operations.

What that prior work proves

Vertical-stack ownership
Sat → tower → receiver
Shipped the full vertical end-to-end: satellite origination, transmitter network, receiver firmware, operator portal. Most software founders only own the top of the stack.
Audit-grade operating discipline
Agency-grade ops
Chain-of-custody logs, named-human accountability, multi-source failover, signed customer agreements, FedRAMP-adjacent procurement readiness. Transfers directly to SOC 2 Type II, ISO 27001, and Texas RRC audit requirements.
Sovereignty-first architecture
Edge-resident logic
Receiver-side codebooks. Addressed opaque payloads. The smarts live where the data lives. The same pattern underwrites the on-prem appliance architecture in Addendum B.
Cross-vertical operator
4 procurement contexts
Federal, state, county/local, and fire/EMS. Has learned and shipped to four sets of agency vernacular, vendor approval processes, and political surfaces. Vertical skill is portable; the FM RBDS deployments prove it.

Industry-adjacent on upstream O&G

Not a petroleum engineer. Not a roughneck. The mass-notification deployments concentrated in O&G-dense jurisdictions — Permian counties, Eagle Ford parishes, Houston-area emergency management, fire-warning networks downwind of refining corridors. That positioning provides operational familiarity with upstream’s worst-day shape: blowout coordination, well-pad evacuations, spill comms, pipeline-incident sheriff coordination, weather-emergency response in oil-services towns. Enough exposure to be credible with operator IT and field-ops leadership without inheriting the industry orthodoxy that says outsiders can’t compete. Combined with the foundation-model leveling in §04, that profile is well-fit for the role of services-led builder rather than industry insider.

Why this matters for Lean Informatics

  1. The FDE motion is something the founder has actually done. Show up, integrate the system, train the operator, own the outcome, take the 2am call. That is not a skill that can be acquired by reading about Palantir — it is muscle memory built through years of agency deployments. The FDE primer in §Why-now describes the role; the founder profile is the evidence that LI can deliver it from day one.
  2. Compliance, sovereignty, and audit posture are transferred discipline, not a learning curve. SOC 2 Type II, ISO 27001, signed DPAs, named-human accountability, chain-of-custody — the same shape of work that wins agency contracts. Most cloud-first AI startups in 2026 are learning this for the first time; LI is not.
  3. Vertical-stack engineering background means the on-prem appliance is a natural product, not an aspiration. The Addendum B architecture — customer-premises hardware with edge-resident codebooks — is the same architectural pattern translated one altitude up. The discipline is direct, not analogical.
  4. Cross-vertical operator pattern matches the cross-vertical founder thesis (B.9). Veeva, Toast, Procore, Carta, Stripe, Snowflake, Datadog, Persefoni — all built by outsiders who learned the target vertical. The pattern is empirically validated.

Plan beyond the founder

The Epic Systems and Palantir playbooks both require an army of implementers. LI scales by hiring an FDE bench into the discipline the founder is setting:

  • Months 0–6: founder-led delivery on Track 1 customers. Sets the bar for what an LI FDE engagement looks like.
  • Months 6–12: first FDE hire. Profile: cross-vertical field engineer, agency or industrial-controls background preferred, comfortable on customer premises, audit posture familiar.
  • Months 12–24: 2–4 FDEs. Houston-concentrated. Customers selected for proximity until the bench can travel.
  • Years 2–5: 10–25 FDEs, the operating shape of an Epic-style implementer army at vertical-AI scale.
contact
For investor or customer follow-up: reach out directly. The founder profile, the deployment patterns, the procurement track record, and the architectural prior art are all on the record and available for diligence. The FDE bench plan and the Houston operating model are documented in Section C.4.

CLean Informatics — vision & business plan

Sections 1–13 plus Addenda A–B were the sector research. Section C is the operating plan that follows from it. Lean Informatics is the protagonist. Collide is a peer in the sector. Enverus is the cloud incumbent. The plan is bootstrap-first, founder-led, and aims at sector dominance on a multi-decade timeline.

The Epic Systems play — the disciplines, not the deployment topology

The strategic posture is Epic Systems for upstream oil & gas. Judy Faulkner founded Epic in 1979 with $70K in Madison Wisconsin. Refused venture capital. Refused to go public. Ran 45 years of bootstrap growth to roughly $5B in annual revenue. Owns ~40% of US hospital EHR. Mayo Clinic runs on Epic. Kaiser runs on Epic. Cleveland Clinic runs on Epic. Once an institution deploys Epic, the lock-in is decades. That is the model.

What we copy from Epic is the operating discipline, not the deployment topology. Epic itself has evolved to a hybrid posture — Hyperdrive on Microsoft Azure, Azure Virtual Desktop, Azure Large Instances — without abandoning what made Epic Epic: bootstrap, implementers before salespeople, customer-deep workflows, refuse easy SaaS-ification, refuse early acquisition, annual user summit, geographic concentration, decades-long customer lock-in. That set of disciplines is what made Epic dominant. The on-prem-only piece was an artifact of the 1979 starting point, not the source of the moat. Lean Informatics is built around the disciplines; the deployment topology is whatever the customer prefers.

Lean Informatics will not match Epic's $5B revenue in 24 months. No one does. The 24-month $120–180M target is a milestone on a multi-decade arc. What we copy from Epic is the structural posture, not the financial trajectory.

What Epic owns and how (the playbook)

Epic patternHow it worksLean Informatics analog
Customer-centric on data locationEpic originally ran in hospital data centers. As of 2025 Epic also runs on Microsoft Azure (Hyperdrive on AVD, Azure Large Instances) — customer chooses. ~15% of Epic sessions are Hyperdrive on Azure as of May 2025.Customer dial: on-prem appliance (Addendum B), customer-owned colo, hyperscaler tenancy, or hybrid. Same SOC 2 / ISO 27001 standards every topology. LI’s identity is the FDE relationship, not the box.
Long, deep implementations12–24 month deployments by Epic-employed implementers ("Implementation Services"). Customers pay millions over years.The FDE motion. Founder-led at first, then a small army of cross-vertical field engineers. The implementation is the product.
Workflow depth, not breadthEpic touches every clinical and billing workflow. Once it owns the workflow, the data, and the user training, displacement requires re-running the entire workflow.Track 1 owns the RRC filing workflow. Track 2 owns the confidential-IP workflows. Together, they own the operator's day.
Institutional anchor customersMayo, Kaiser, Johns Hopkins, Cleveland Clinic. Their reputations carry the brand.Land Diamondback, Pioneer, EOG, or a supermajor as a reference account. One halo customer per year.
Bootstrap, private, employee-ownedFaulkner refused VC and IPO. Profits stay in the company. Decisions stay with the engineers.Bootstrap as long as possible. Take outside capital only when the alternative is losing a strategic window. Retain control.
Geographic concentrationMadison/Verona Wisconsin. All Epic developers in one place. Culture is the asset.Houston. All Lean Informatics FDEs and engineers based here. Drive distance to the customer base. Culture matters.
Annual user group meetingEpic UGM at the Verona campus — 25,000+ attendees, legendary in healthcare. Builds the community as a moat.Launch the Lean Informatics annual summit by year 3. Operator IT directors, compliance leads, FDE alumni. Houston venue.
Refuse the easy SaaS-ificationEpic stayed enterprise, deep, expensive, and slow to release — even after going hybrid on infrastructure. The discipline didn't change; the substrate did. Held the line against SaaS competitors who looked faster but couldn't go deep.Don't pivot to a self-serve / freemium model when pressure comes. The deep FDE-embedded services relationship is the moat. The underlying infrastructure can move with the customer.
Long sales cycles are the moatEpic deployments take 18 months. That barrier keeps faster competitors out and locks customers in for decades.Same logic applies. A 6-month FDE-led deployment is a feature, not a bug. It selects for customers who will stay for 10 years.

What this changes about the 24-month plan

  • Customer selection becomes prestige-aware. Every Track-2 customer in years 1–2 should be a name that future customers will recognize. One Pioneer or Diamondback is worth ten anonymous mid-size operators in brand terms.
  • Don't apologize for being expensive. Epic doesn't. The pricing menu is correct. Hold the floor.
  • Hire the implementers first, not the salespeople. Epic's army of implementers (Forward-Deployed Engineers, in modern parlance) is what makes the deployments stick. The same hire ranking applies.
  • Geographic concentration is a feature, not a constraint. Houston-based FDEs serving Houston-clustered operators is faster, cheaper, and more resilient than a distributed workforce.
  • Long-term thinking over short-term ARR. The 24-month target is a milestone. If hitting it requires sacrificing depth, customer fit, or post-sale trust — don't.
  • Refuse the wrong investors. If raising, take capital that's patient with the Epic-style timeline. Founder-friendly seeds. Strategic capital from energy LPs. Not growth equity that demands SaaS multiples and short payback.

Mission & positioning vs. the field

Mission: build the FDE-led services company for confidential industrial AI workflows, starting in upstream oil & gas, on the Epic Systems operating pattern — customer-centric on data location, deep on workflow ownership.

CompetitorTheir angleLean Informatics' counter-angle
Enverus ONECloud-only on their tenancy. Governed AI, SOC 2 Type II, 25-year data heritage. No customer choice on substrate.Meet the customer where the customer wants to be: their colo, their hyperscaler, our recommended on-prem appliance, or hybrid. Same security standards either way. Win on FDE depth and pricing discipline.
CollideVertical-AI startup, RIGGS LLM, FDE motion, Digital Wildcatters distribution. Cloud-first.Compete in adjacent positioning — operator IT directors, sovereignty-sensitive customers, agency-adjacent customers. Offer deployment-topology choice they don't. Not a head-on Wildcatters-community fight.
Palantir FoundryEnterprise FDE incumbent, government heritage, full-stack platform.Smaller, faster, operator-shaped pricing. Their floor is $1M; ours starts at $5.5K/mo. Same FDE DNA, lower altitude.
Big Consulting (Accenture, McKinsey)Strategic advisory + delivery. No proprietary AI stack.We ship a product, not slides. Foundation-model + workflow + FDE bundled. Customer can host wherever they want.
Novi Labs / Quorum / C3.aiSpecialized point solutions or large legacy platforms.Workflow-specific, services-led, infrastructure-agnostic. Deployment-topology choice is a feature none of them offer.

What we sell

  • Track 1 — cloud SaaS: Texas RRC compliance + production reconciliation + JSAs for mid-size operators. $1.5–4.5K/mo.
  • Track 2 — on-prem appliance: confidential workflow AI (M&A diligence, completion design, geosteering, reservoir analytics). $5.5–13.5K/mo + onboarding.
  • Track 3 — services: Forward-Deployed Engineer engagements for high-stakes accounts. $15–50K/mo retainers.
  • Track 4 — agency cross-sell: AI for adjacent public-safety customers, leveraging the founder’s prior network (see Founder section). Multi-year contracts $1–10M.

C.1The 24-month target & honest math

$10–15M/month total revenue, 24 months, bootstrap. That equals $120–180M annualized. The math has to work somehow. The table below shows it doesn't work via direct sales alone.

Avg deal size (ARR)Deals needed for $150MFeasibility at solo + bootstrap
$50K3,000impossible
$200K750impossible
$500K300requires 100+ FTE sales org
$2M75requires channel partner doing volume
$5M anchor + $500K small10 anchors + 60 smallpossible with right anchor + channel + ~5 FTE
$20M agency contract + small2–3 agency + 40 smallpossible if gov network closes

The conclusion the math forces: $120–180M in 24 months bootstrap requires one or more of — an anchor enterprise/gov deal ($10–30M ARR each), a high-velocity partner channel (BETA-style, 100+ small deals through their existing customer base), or a productized SKU sold through resellers. Section C.3 maps the four leverage paths.

read
A target this aggressive should be either lowered, lengthened, or matched with a willingness to raise capital in months 6–12 if leverage paths require it. The plan below optimizes for optionality — we build to be ready for the target without burning if it doesn't materialize. And, critically — the Epic Systems posture means we'd rather hit $50M ARR with a 10-year compounding lock-in than $150M ARR with shallow customers.

C.2Base case — solo bootstrap without leverage paths

What happens if Lean Informatics ships Track 1 and Track 2 cleanly but none of the leverage paths fire. This is the floor.

MonthTrack 1 customersTrack 2 customersTrack 3 retainersRun-rate ARR
31 (pilot)00$30K
6201$240K
9401$400K
12611$700K
15912$1.1M
181222$1.8M
211633$2.8M
242043$3.8M

Solo bootstrap floor: $3–5M ARR by month 24. Roughly $300K–420K/month. That's 2–4% of the stated $120–180M target. The gap is real and the rest of the plan is about closing it.

Notable: this floor is still a credible high-growth bootstrap result and a real business. From an Epic Systems lens, this is the equivalent of Faulkner's first 3–5 years — small revenue, deep customers, building the foundation. Don't mistake it for failure.

C.3The four leverage paths

Each path is independent. Each is leveraged by something the founder already has. The plan opens all four in parallel and gates further investment on leading indicators.

Path A — BETA Land Services partner channel

MechanismWhite-label or revenue-share into BETA's 4.3M-acre customer base. LI's appliance becomes the digital throughput multiplier inside BETA's existing FDE motion.
Revenue model20–30% rev share on pull-through transactions, or per-acre / per-deal fee.
UpsideIf 10% of BETA's annual transactions pull LI tools, that's plausibly $20–50M/yr to LI.
Timeline3–9 months to first pilot, 12–18 months to material rev share.
Pre-conditionsWorking Track-2 appliance demo, BETA pilot agreement, channel contract with exclusivity language.
RiskBETA builds internally instead of partnering; takes longer than expected.
Leverage sourceBETA's appetite for AI tooling; Hanks's 41 years of FDE-business operating muscle.

Path B — public-safety / agency contract

MechanismReposition the on-prem appliance + audit-grade operating discipline for agency emergency management. WUI fire risk, agency-side AI for emergency notification, sheriff incident triage, county hazard modeling. Architectural prior art: see B.1; founder track record: see Founder.
Revenue modelMulti-year contracts at $1–10M ACV. Annual recurring + services.
Upside1–2 agency contracts in 24 months = $5–30M revenue.
Timeline12–18 months to first contract dollar (gov procurement is slow).
Pre-conditionsSmall Business contracting registration (SAM.gov, UEI), partner with existing GSA-schedule prime, optional SBIR/STTR grant for non-dilutive bridge.
RiskGov procurement timing; scope shifts during contracting.
Leverage sourceFounder’s prior agency-grade customer base (federal, state, county/local, fire/EMS). See Founder section for the specific procurement track record. This door is open only because the prior company existed.

Path C — Anchor enterprise (T2 operator or supermajor)

MechanismOne operator at $5–15M ARR via Track-2 multi-workflow deployment plus heavy Track-3 services. Target list: Diamondback, Pioneer, Devon, EOG, Continental, Plains, Targa, Enbridge. This is the "Mayo Clinic of operators" play.
Revenue model3-year enterprise contract, $500K–1.5M onboarding, $5–15M annual.
UpsideSingle anchor moves the needle materially AND becomes the brand-halo reference.
Timeline9–18 months to close, 6 months to deploy.
Pre-conditionsSOC 2 Type II (required), 1–2 reference customers, exec-level intro.
RiskSingle point of failure; one customer dependence; slow procurement.
Leverage sourceAllen Gilmer's network if a warm intro materializes; Houston operator network broadly; Wildcatters community indirectly.

Path D — Productized appliance + reseller program

MechanismPackage the Track-2 appliance as a SKU. Landmen, O&G consultants, regional IT integrators, and BETA-like service firms resell it under a margin split.
Revenue model30–50% margin to reseller, 50–70% to LI. Per-unit revenue $50–100K + recurring.
Upside50–200 units/yr at $200K blended = $10–40M revenue.
Timeline12–18 months to launch the program; 18–24 months to material volume.
Pre-conditionsStable Track-2 product, reseller training program, channel agreements, support tier infrastructure.
RiskChannel conflict with direct sales; support burden scales with units.
Leverage sourceThe appliance product itself + LI’s vertical-stack hardware/firmware experience (see B.1 and Founder).
read
Most plausible combination for the stated target: A (BETA) + B (one gov contract) + C (one anchor enterprise). A alone could plausibly deliver $30–50M. B alone $5–30M. C alone $5–15M plus halo. The three combined with steady direct sales put $80–130M ARR in reach by month 24. Adding D in year 2 closes the rest. The Epic-style discipline: don't chase D until A and B and C are stable.

C.4Org & hiring schedule

Bootstrap discipline: hire only when revenue justifies it. Hire implementers (FDEs) before salespeople, on the Epic pattern. Cap at <15 FTE until $10M ARR.

TriggerHireProfileComp range
Customer #1 live (~Month 3)Fractional ops admin (10–15 hr/wk)Houston-based, contracts/AP/customer onboarding$2–4K/mo
Customer #4 or BETA pilot signedSecond FDE (full-time)Cross-vertical field engineer or O&G ops engineer; audit-grade ops background preferred$180–220K base + equity
$2M ARR or Track-2 customer #2Third FDE + 1 full-stack engineerEngineer profile: pipelines/data/agents$150–200K each
$5M ARR or Path A activationChannel/sales leadIndustry vet from BETA, Enverus, Wildcatters network; closer not BDR$180–250K base + commission + equity
$10M ARR or Path B contractCompliance/contracts/SOC officer + 2 more engineers + 1 more FDESOC 2 / FedRAMP / agency-procurement-literate$200K+ each

By month 24 in the stretch scenario: ~12–18 FTE. By month 24 in the base case: ~3–5 FTE. Compare to Epic's first 5 years: Faulkner kept the company under 10 people for the first half-decade. Discipline is the asset.

C.5Sales motion

Year 1 (founder-led, Epic-style)

  • First 5–10 customers: founder closes 100%. Mostly Houston / Texas mid-size operators via Wildcatters community, Houston energy network, and friend-of-friend warm intros. Treat each customer like a future Mayo Clinic reference — depth, not speed.
  • Conference presence: NAPE (Houston, Feb), CERAWeek (Houston, March), Fuze (Houston, Oct), AAPL Houston meetings. Speaking slots wherever offered.
  • Inbound content: LinkedIn writing on (a) substrate-flexible vs. cloud-only trade-offs for operator IP, (b) FDE economics, (c) architectural prior art applied to AI sovereignty. ~2 substantive posts/week. Goal: become recognizable to operator CIOs and Wildcatters community by month 9.
  • Outbound: Warm intros via Digital Wildcatters, OGAC, Houston energy meetups, BETA-channel referrals.

Year 2 (channels open)

  • BETA channel motion: co-sell, joint case studies, white-labeled materials. Hanks's name on the joint pitch deck. Quarterly business reviews.
  • Gov procurement track: separate sales cycle. Founder-led but with prime contractor partner. SBIR application as a non-dilutive bridge.
  • Direct enterprise: sales lead hired Month 15+ owns supermajor and T2 conversations. Founder remains the executive sponsor on top accounts.
  • Reseller program: soft launch Month 18 with 3–5 hand-picked reseller partners (one of which is BETA).

Year 3+ (the Epic UGM equivalent)

  • Annual Lean Informatics Summit in Houston. Operator IT directors, compliance leads, FDE alumni. Single track. Invitation-only first 2 years. This becomes the cultural moat — the "Epic UGM" for upstream.
  • Customer Advisory Board: 6–8 anchor customers meeting quarterly. Influence the roadmap. Loyalty through inclusion.

C.6Product roadmap (24 months)

MonthsBuildWhy now
1–3Track 1 v1 — RRC W-10 / G-10 automationThe wedge. Public data, fast deploy, $5K pilots within 30 days.
3–6Track 1 v2 — production reconciliation, JSAs, lease term extractionPer-customer ARR expansion. Same customers, more workflows. Epic-style: own the day.
6–9Track 1 v3 — workover reports, ESP RCA via OEM data integrationEngineering credibility. Moves LI from filing automation to operational decision support.
6–12Track 2 v1 — M&A diligence appliance (BETA pilot)Opens Path A. M&A diligence has the cleanest sovereignty case.
9–12Petroleum-domain LLM fine-tune (Qwen 14B / Hermes 14B base)RIGGS-equivalent. Target: 55–65% on SPE PE exam subset.
12–18Track 2 v2 — completion design optimization for Permian operatorsHighest customer-IP-sensitivity workflow. Pure on-prem play.
12–18Track 4 v1 — public-safety / agency cross-sell (WUI fire risk, county hazard modeling)Opens Path B. Leverages founder’s prior network (see Founder).
18–24Track 2 v3 — geosteering + reservoir analyticsLive-during-drilling latency play. Edge inference advantage.
18–24Reseller program launchOpens Path D after Track 2 stabilized.

C.7Financial model

Per-customer unit economics, by operator tier

The unit economics are extremely sensitive to operator size and workflow mix. The headline numbers below are blended Y1 estimates — full sensitivity analysis lives in memory/projects/unit-economics-expanded.md.

TierWellsRecommended architecturePlausible priceFDE allocationY1 contribution / customerCustomers per FDE
Small (Track 1 wedge)5–50Cloud or commodity hybrid$1.5–5K/mo + $5–18K onboarding0.03–0.10 FTE$1–3K/mo10–25
Medium (bread-and-butter)50–500Hybrid (Mac Studio M5 / single L40S + frontier handoff)$5–12K/mo + $12–28K onboarding0.15–0.30 FTE$3–6K/mo3–6
Large (anchor / enterprise)500–5,000+Hybrid or pure on-prem (paired failover at the top)$20–80K/mo + $50–350K onboarding0.5–2.0 FTE$5–30K/mo0.5–2.0

Architecture verdict: hybrid wins on margin for medium and large tiers because 70–85% of agentic O&G workflow (retrieval, classification, OCR, structured extraction, drafting) runs fine on commodity local hardware (Apple M5 or single L40S). Frontier-model spend is reserved for the hard reasoning passes (offset economics, JOA clause risk, regulator-grade language polish). The ~88–92% gross margin shows up there.

labor
LI is a labor company augmented by machine intelligence — this is honest, not a problem. Customers-per-FDE plateaus around 4–5 blended, not infinity. The AI absorbs the repetitive cognitive work (compliance filing, structured extraction, first-draft generation, pattern recognition) but it does not absorb morning meetings, 2am calls, customer relationship maintenance, auditor handoffs, or operator-staff training. Those activities are why the FDE-per-customer ratio has a floor. Target revenue per FDE at maturity: $400–800K/year — Palantir-shape, not pure-SaaS-shape (Palantir runs ~$750K rev/employee, Epic Systems ~$385K rev/employee, pure SaaS like Snowflake/Datadog $400–650K rev/employee). The unit economics work; they just work at services-company shape, not software-company shape. The headline $10–15M/month target requires 15–25 FDEs by month 24 plus a customer mix skewed toward large anchors and government — which is exactly what C.1 and C.3 already specify.

Aggregated portfolio math (24-month target sanity check)

Customer mix scenarioMonthly revenueFDE headcountVerdict
1,500 small at $3K/mo (Track 1 only)$4.5M100–125infeasible for bootstrap
50 mid + 500 small (mid-heavy)$2.2M25–40stretches the bootstrap
50 mid + 5 large anchors$800K18–30healthy bootstrap shape
5 mid + 15 large anchors @ $50K (Path C firing)$1M18–25strong path to $30M ARR
10 large @ $50K + 1 gov contract @ $1M (Paths B + C)$1.5M15–25$15–18M ARR — the headline target shape

Implication: the headline $10–15M/month target is achievable only with disciplined customer mix and at least one of (Path B government, Path C anchor enterprise) firing. Chasing small-operator logo count breaks the FDE economics. The plan is correct — the unit economics just sharpen what has to be true for it to land.

Three scenarios, ARR run-rate

MonthConservative bootstrapBase + 1 leverage pathStretch + 2–3 paths
6$240K$500K$1.2M
12$700K$3M$12M
18$1.8M$12M$45M
24$3.8M$25–35M$100–180M

Cost structure

LineYear 1 ($)Year 2 ($)Notes
Founder draw$100–150K$180–250KHouston cost of living, family-first scheduling
FDE / engineering hires$200–400K$700K–1.8M1–2 in Y1, 3–7 in Y2 depending on scenario
Hardware (appliances on inventory)$50–200K$300K–1.2MTied to Track-2 customer pipeline; lease-back option
Cloud / API costs$15–40K$60–200KAnthropic + Hetzner + observability
SOC 2 readiness + audit$50–80K$30–50KVanta + auditor. Type I Y1, Type II Y2.
E&O + cyber insurance$15–25K$30–50KMaterial from day one
Legal, accounting, tooling$30–50K$60–100KTexas LLC or Delaware C-corp depending on raise posture
Travel + conferences$30–50K$60–100KCritical for customer-facing FDE motion
Year total burn$490K–995K$1.4–3.5MY2 scales with scenario

Break-even (monthly): Conservative ~Month 18–20. Base + 1 path ~Month 9–12. Stretch ~Month 4–6.

Cash on hand requirement: ~$200–400K personal / savings to cover first 6–9 months before revenue covers burn (conservative case).

C.8Risk register — plan-specific

RiskSeverityMitigation
Gov procurement timing > 18 monthshighStart SBIR/STTR pipeline Month 1. Partner with GSA-schedule prime by Month 6.
BETA builds internally instead of partneringmedium-highLock channel agreement with exclusivity clauses. Bring working demo to first conversation.
Anchor enterprise deal slips or kills momentumhighNever depend on single anchor. Diversify by Month 12.
Founder burnouthighCap weekly hours, mandatory PTO, family-first scheduling. Second FDE hire is for resilience, not just capacity.
Hallucinated filing causes regulatory issueexistentialHuman-in-loop on every submission. E&O insurance Day 1. No auto-submit ever.
Cyber incident on customer applianceexistentialInsurance, audit posture, hot spare, customer SIEM integration. Tabletop exercise quarterly.
Pressure to SaaS-ify / commoditizehigh — Epic-relevantHold the line on the appliance / FDE / sovereignty model. Don't follow Enverus into cloud-only.
Enverus drops sub-$1K tiermediumDon't compete on price. Compete on sovereignty + FDE motion.
Hardware supply chain disruptionmediumMaintain 3–6 month BOM inventory. Dual-source GPUs (L40S + H100 paths).
Channel conflict (reseller vs. direct)mediumStrict territory and account rules in channel agreements.
Acquisition offer mid-arcgood problemThe Epic answer is "no." Decline unless valuation reflects 10-year lock-in moat.

C.9Trigger conditions — when to raise, slow, or pivot

When to raise capital (against the Epic instinct)

  • $1–3M angel — if Path A or B requires capital to capture a closing window (a BETA exclusive deal, a specific gov RFP).
  • $3–5M seed — if anchor enterprise deal requires SOC 2 Type II completion within 6 months, or if FDE hiring needs to outpace customer growth to win an account.
  • $10–15M Series A — if two leverage paths are firing simultaneously and the constraint is execution capacity. Target investors: Mercury Fund (Collide's lead, ironically the most relevant), Energy Innovation Capital, S2G Ventures, EIV Capital, plus a strategic from the Allen Gilmer / OGAC network.
  • Default position: don't raise. Faulkner didn't. Bootstrap discipline pays decade dividends.

When to slow down (hold hiring, focus on retention)

  • Track 1 customer count < 4 by Month 9.
  • Track 2 first appliance customer not deployed by Month 12.
  • Burn-to-ARR ratio above 3.0.
  • Customer churn above 10% annualized in the first 12 customers.

When to pivot or revise the goal

  • BETA pilot fails or BETA passes: rebuild distribution via direct + Wildcatters and reset target to $15–25M ARR in 36 months.
  • Gov procurement track produces no LOI / contract pipeline by Month 18: deprioritize Track 4, focus on commercial.
  • Sustained burn-to-ARR > 5.0 by Month 18: open acquihire conversation (Enverus, Quorum, Collide, or BETA itself) only as a last resort — against the Epic principle.
  • Hallucinated filing or appliance incident: do not pivot — address it as an incident-response operation. Trust loss is harder to recover than revenue.

Leading indicators to monitor monthly

IndicatorHealthyWarningStop
Customer count growth+1/mo by M6, +2/mo by M12<0.5/mo for 2 months0 net adds for 90 days
Cash runway>9 months6–9 months<4 months
BETA pilot progressNDA→SOW→deploy on trackSlippage >30 daysBETA stops returning calls
Gov pipeline1+ active RFI/RFP0 active, 1 in conversation0 conversations for 90 days
Net Revenue Retention>110%95–110%<95%
Hallucination / incident rate0Any reportable incident2nd reportable incident in 90 days

C.10Honest verdict on the target & the long game

$10–15M/month total revenue in 24 months from a solo bootstrap start is at the right tail of the distribution of what has actually happened in B2B vertical AI plays. Not impossible — Glean, Hebbia, and a handful of vertical AI plays have hit comparable numbers — but those almost universally involved either significant funding, an unusual viral channel, or a category-creating product positioning.

The thesis, restated for the verdict. Every operator pays for efficiency. None of them will learn or perform all the things Lean Informatics will. Their expertise stays in the production curve where it belongs; our expertise stays in the AI workflow, the FDE delivery, and the audit posture. The trade is simple, and it’s the deal a $400B industry has been waiting to make since the analyst layer of knowledge work started getting eaten by foundation models.

The wind at our back, in plain terms. Two structural tailwinds compound on each other. First, the foundation-model leveling event (§04) compresses 25 years of institutional industry knowledge into the weights of every frontier model, killing the "you need a roughneck to compete" defense. Second, security and infrastructure standards (SOC 2 Type II, ISO 27001, KMS, audit logging) are industry-commoditized in 2026 — the perimeter is no longer a moat, the hyperscalers and on-prem stacks meet the same bar. Both of those moats are gone for the incumbent. What remains as defensible is the FDE services relationship and the workflow ownership it produces — which is exactly what Lean Informatics is built around. Three private-equity flips have made the incumbent (Enverus) pricing-opaque and organizationally slow at exactly the moment those two moats evaporated. The newcomer (Collide) has proven a single founder team can raise, build, and sell in this category in twelve months. Old-buddy networks, favoritism in vendor selection, and tribal industry knowledge cannot stop a services-led delivery that already speaks the language fluently, meets the customer where the customer wants to host, deploys in 30 days, and costs one-tenth of what the incumbent is charging. That is not a slogan. That is the structural read.

Realistic Y2 endpoint without leverage paths firing: $3–5M ARR ($300K–420K/month). That is 24–36× below the stated $10–15M/month target. To hit the target, at least two of (A) BETA channel + (B) gov contract + (C) anchor enterprise + (D) reseller program must fire in 24 months.

The 12-month checkpoint

If by month 12 we have:

  • 5+ Track-1 customers ($500K+ ARR)
  • 1 Track-2 customer live
  • BETA pilot signed
  • A gov contract in the procurement pipeline (RFI or RFP stage)

...then $30–80M ARR by month 24 is on the table. Consider raising at this point if execution capacity is the bottleneck.

If by month 12 we have:

  • 2–3 Track-1 customers
  • No Track-2 customer
  • BETA passed or stalled
  • No gov pipeline

...then $5–10M ARR by month 24 is the realistic ceiling. The right move is to revise the goal to $25M ARR in 36 months, or raise capital to accelerate — but only if the leverage paths require it.

The Epic Systems lens on the verdict

Through the Epic lens, the 24-month target is the wrong unit of measurement. Faulkner's 5-year revenue was probably in the low millions. By year 10 Epic had maybe ~$50M. By year 20, several hundred million. The compounding kicked in once the institutional anchors and the workflow lock-in were established. Lean Informatics should optimize for the same shape: deep customers, lock-in workflows, geographic concentration, refused easy money, refused easy SaaS-ification, refused acquisition before the moat is built.

Under this lens:

  • 24-month goal: 1–2 anchor customers + BETA partnership locked + first gov contract in pipeline + 15–20 Track-1 customers. Revenue $10–30M ARR. Foundation set.
  • 60-month goal: 8–15 anchor customers, BETA channel running, 3–5 gov contracts, 100+ Track-1 customers, reseller program live. Revenue $80–200M ARR. Recognized as the sovereignty-first vertical AI vendor.
  • 10-year goal: The Epic of upstream. Default vendor at most US E&P CIOs. $500M–1B+ ARR. Still private. Still bootstrap-ratio-disciplined. Still in Houston.
commit
What we commit to: build to optionality, optimize for depth. Ship Track 1 fast. Land BETA conversation by Month 4. Open SBIR pipeline by Month 6. First Track-2 customer by Month 12. Revisit the $10–15M/mo target at Month 12 with hard data. The Epic Systems posture is the long-term frame: own the sector, not the quarter.

17Methodology & epistemic posture

This report was produced through a structured first-principles analysis on May 21, 2026, by an analyst working in the Knuth–Ousterhout–Karpathy mode: rigor, complexity reduction, verifiability.

  • Primary sources (Collide.io, Enverus, RRC, McLelland's X presence) were preferred over secondary commentary where they conflict.
  • Quantitative claims were spot-checked against multiple sources where possible. Where a single source is cited, treat the number as indicative.
  • The "GPT-5.1 scored 4%" benchmark number is flagged as suspect because it deviates implausibly from public model performance; the surrounding numbers (Grok 4, Sonnet 4.5, RIGGS) are internally consistent.
  • The solo-feasibility verdicts are based on the 2026 tooling landscape (MLX 0.31+, Hermes 4, Qwen 3.5, Claude Opus/Sonnet 4.6). They will look different in 2027.
  • "Solo achievable" never means "trivial." It means: one disciplined operator with the listed toolstack can reach equivalent customer outcomes for a narrow workflow, within the timeframes given.
  • This is not legal, financial, or regulatory advice. RRC filing is a regulated activity. Get a Texas-licensed compliance professional before automating submissions.