Lean Informatics — sector strategy & business plan

Vertical AI for upstream oil & gas, built FDE-first. A $400B US industry still runs most of its daily workflow surface on PDFs, spreadsheets, and tribal knowledge — and in 2026, foundation models can finally read those PDFs, speak the vocabulary, and ship audit-grade outputs. Lean Informatics is the services company that meets operators where they want their data to live and owns the workflows the incumbents don’t. Sections 1–15 plus Addenda A–B are the sector research that backs the plan; Section C is the plan itself. Audience: founder, analyst, advisor.

Why this space, why this time

Texas first — the maximum addressable market right outside our window

Before talking about anything bigger, look at what’s addressable inside Texas alone:

Texas crude oil

5.7M bopd

February 2026. >42% of all US crude. Permian Basin (TX + NM) at 6.6M bopd is nearly half of US production.

Texas natural gas

36.2 Bcf/d

~30% of US marketed gas. Plus 4.4M bopd of natural gas liquids.

Producing wells in Texas

~238,000

154,393 oil + 83,679 gas (Feb 2026 RRC). Every well files monthly W-10 or G-10 reports to the RRC. That’s ~2.8M filings per year — today, mostly by hand.

Inactive / shut-in wells

~157,000

March 2026. 46% of active operators have >25% of their wells inactive. Plugging compliance, P-5 organization reports, and orphan-well exposure are all AI-accelerable workflows.

Texas O&G export revenue

$18.5B/month

December 2025 alone. Annual upstream economic activity in Texas runs $150–220B depending on the price deck.

Texas operators by size

~9,000 active

Roughly 1,500–3,000 mid-size (5–200 wells) — the Track 1 wedge target. ~100 enterprise operators on top. Long tail below.

Texas land services firms

50–80 firms

Houston / Midland / Dallas / San Antonio hubs. The distribution channel. Texas AAPL chapter is the largest in the country.

Texas-only MAM for LI

$150–500M/yr

Blended: $50K/yr × ~2K mid-size ops + ~$300K/yr × ~100 enterprise + ~$200K/yr × ~60 land firms + RRC compliance throughput. Before LI ever crosses a state line.

scale

Texas alone is roughly 25–30% of the US-wide SAM (the $2–5B/yr serviceable surface in §01). It is also home, drive-distance to most customers, single-regulator (Texas RRC), and the densest AAPL membership in the country. Houston-first is a feature, not a constraint. The same playbook ports to Oklahoma (OCC), New Mexico (OCD), North Dakota (NDIC), Louisiana (LDNR), and Colorado (COGCC) once the Texas reference customers are live — but we don’t need to leave Texas to hit the 24-month target.

The workflow gap, in plain terms

Upstream oil & gas is one of the largest US industries still running day-to-day work on PDFs, spreadsheets, vendor-specific schemas, and tribal knowledge. The sector generates roughly $400B in annual US revenue, employs ~600K people directly, reports to ~30 state agencies plus a half-dozen federal bodies, and pulls information from a long tail of land services firms, completion contractors, midstream counterparties, and trading desks. Almost every operator workflow — from the monthly RRC filing, to an AFE evaluation, to a JOA negotiation, to a completion-design review — still depends on people in chairs reading PDFs and re-keying values into different systems. That gap is the entire opportunity.

Why now

Four forces converged in 2025–2026 to open the window:

Foundation models crossed the operator-fluency line. Frontier models (Claude Opus 4.6, GPT-5.1, Grok 4, Hermes 4 405B, Llama 4 405B) have ingested the public petroleum corpus — SPE OnePetro’s 300K+ papers, AAPG/SEG journals, state commission filings, courthouse title records, Schlumberger/Halliburton manuals, the Craft-Hawkins/Slider/Lake/Economides textbook canon. Out of the box they score 50–70% on the SPE certification subset and converse with field hands using the right vocabulary. The "you need a roughneck on staff to compete" barrier is empirically dead. §04 develops this in depth.
Security and infrastructure standards have commoditized. SOC 2 Type II, ISO 27001, KMS-managed encryption, audit-grade logging, and FIPS-validated crypto are commodity table stakes in 2026. Whatever substrate the operator chooses — on-prem appliance, customer colo, AWS, Azure, GCP, or hybrid — the security perimeter is portable. The "we need our own datacenter" or "we need a private cloud" objection is now a preference question, not a technical one.
The labor cycle is forcing the issue. The petroleum engineering workforce is aging out. The COVID-era exodus cost a generation of mid-career operators. Operators are running the same workflows with smaller teams against rising regulatory complexity. AI is the only path to absorb throughput demand without headcount the operators don’t want to add.
The agentic engineering vocabulary caught up. Andrej Karpathy named “agentic engineering” at the Sequoia AI Ascent in early 2026. The operating model we describe here — agent harnesses, software factories, tokenomics, AFK agents, agentic access — is the same set of disciplines the most advanced AI teams in the industry have rallied around. The window where this operating model is a differentiator closes by end of 2026; by 2027 it is the default. Upstream is two cycles behind on this and will reward whoever brings the discipline first.

What AI brings operators, concretely

A short list of upstream workflows where AI is shipping real economic value today — not in marketing decks:

Regulatory compliance

Hours → minutes

W-10 / G-10 / H-10 monthly filings, drilling permits, plugging reports. Audit-grade traceability built in.

AFE evaluation

Weeks → hours

Authorization-for-expenditure on offsets, non-consent decisions, working-interest marketing. Ownership-validated, offset-based economics.

Completion design review

Pattern recognition

Cross-reference offset completions, sand/fluid loading, stage spacing, frac-fleet selection across thousands of reports no engineer reads end-to-end.

Geosteering & reservoir analytics

Real-time picks

MWD/LWD interpretation, formation-top picks, target-zone optimization. Catches what tired-eye geosteerers miss at 2am.

JOA & contract review

Dangerous clauses

AFE non-consent windows, marketing provisions, default cure periods, surviving obligations. Surfaces the traps before signature.

M&A diligence

Weeks → days

Lease term extraction across thousands of leases, title-chain assembly, ROW gap analysis, encumbrance review.

JSA / safety briefing

Per-job, not generic

Job-specific safety analyses keyed to live weather, site conditions, crew assignment. The work operators skip when busy now happens every time.

Well-failure pattern recognition

Cross-history

Reading the operator’s full history of post-mortems to surface recurring failure modes a field engineer can’t catch in one cycle.

Every one of these is an operator-facing workflow. Not "a platform." Not "data analytics." Specific work that engineers do today, that AI can do faster, cheaper, and with audit-grade traceability. The rest of this document is what Lean Informatics is doing about it.

The exciting opportunity

Upstream oil & gas is one of the last under-digitized industries of its size. Texas alone produces 5.7M barrels of crude per day, files ~2.8M monthly well reports with the RRC, runs through ~9,000 active operators, and most of that workflow surface still moves on PDFs and spreadsheets. The window is open right now because three forces converged in 2025–2026: foundation models speak petroleum fluently out of the box, security and infrastructure standards have commoditized into table stakes, and the petroleum engineering workforce is aging out faster than it can be replaced. That is the opportunity. An FDE-led services company can ship audit-grade workflows to operators where they want their data to live, undercut the cloud-locked incumbent on price and deploy time, and compound the customer relationship into a multi-decade moat.

Lean Informatics is built for that window. The competitive ladder, in one line: Enverus is the old guard — cloud-locked and PE-flipped three times in seven years; Collide is the newcomer that proved the gate is open; Lean Informatics is the new approach — services-first, infrastructure-agnostic, foundation-model leverage compounding monthly. The operating discipline we lean on is the one Epic Systems built in healthcare over 45 bootstrap years — discipline, not topology (full primer in §01, full crosswalk in §C.0). The 24-month $10–15M/month target is a milestone on a multi-decade arc, not the endgame.

The sector research below (Sections 1–15 + Addenda A–B) documents the landscape we’re entering. Section C is the operating plan.

01Executive summary

Lean Informatics is the FDE-led vertical AI services company for upstream oil & gas. We meet operators where they want their data to live (on-prem appliance, customer-owned colo, customer hyperscaler tenancy, or hybrid) and own the workflows the incumbents don’t. Data location is a customer dial, not our differentiator. Security and infrastructure standards (SOC 2 Type II, ISO 27001, KMS, audit logging, FIPS-validated crypto) are commodity table stakes — we meet them everywhere we deploy. The moat is the FDE relationship, the workflow ownership, and the foundation-model fluency we bring to the customer’s problem. Houston is the starting wedge. The serviceable surface is national upstream, then international. The distribution moat is the existing infrastructure of survey and land services companies whose embedded delivery model already maps onto our FDE motion. The operating discipline we lean on is the Epic Systems pattern from healthcare — primer in the callout below, full crosswalk in §C.0.

primer

Quick primer on Epic Systems (for the unacquainted). Epic Systems is the dominant electronic health records (EHR) vendor in US healthcare. Founded 1979 in Madison, Wisconsin by Judy Faulkner with $70,000 of personal capital. Bootstrap throughout — refused venture capital and refused to IPO for 45 years. Today: roughly $5B in annual revenue, ~40% of US hospital EHR market, runs the clinical and billing systems at Mayo Clinic, Kaiser, Cleveland Clinic, Johns Hopkins, and most of the academic medical centers. Operating model: long, deep implementations (12–24 months) executed by an army of Epic-employed implementers; workflow lock-in across every clinical and billing system; engineering organization concentrated at one Verona, Wisconsin campus; annual user group meeting that draws ~25,000 attendees and acts as a moat. Recently moved to a hybrid cloud posture (Hyperdrive on Microsoft Azure / Azure Virtual Desktop / Azure Large Instances) — the deployment topology evolved; the operating discipline didn't. That set of operating disciplines, applied to upstream oil & gas in 2026 instead of healthcare in 1979, is what "the Epic Systems play" means in this document. Section C.0 develops the full playbook with the Epic-to-LI crosswalk.

Strategic posture

Epic play

Bootstrap. FDE-led services. Infrastructure-agnostic, customer-centric on data location. Houston-concentrated. Long cycles as moat.

Max addressable

$15–30B/yr

AI in O&G ($7.6B → $25B by '34) + US land services ($3–5B) + regulatory automation + agency emergency tech.

Serviceable (5-yr)

$2–5B/yr

~1,500–2,000 US upstream operators, ~150–250 land/survey firms, ~100–300 agency buyers in scope.

Obtainable (5-yr)

$200–500M ARR

Direct + channel + agency. 10-yr Epic-style: $500M–1B+ ARR.

Distribution path

Land & survey channel

10–25 established firms (BETA-class) = pull-through to 200–800 operators. White-label model.

Geographic arc

Houston → world

Houston → Gulf Coast → Permian/Eagle Ford/Bakken → US → Canada/Australia/Middle East.

Founder edge

FM RBDS + DoD/HS

Vertical-stack engineering, opacity-first design, agency procurement experience. Hard to fake.

24-mo milestone

$10–30M ARR

Base case with 1 leverage path firing. Floor for the long arc, not the endgame.

distribution

The distribution thesis — survey and land companies as the channel. Roughly 150–250 US land services firms already operate the embedded-delivery model: their landmen and abstractors sit on customer sites, handle confidential operator IP daily, manage title chains and ROW projects, and have multi-decade trust with mid-size operators. They are undermanned digitally and structurally unable to build AI in-house. They are the Lean Informatics distribution layer. White-label the appliance + workflow stack into 10–25 of them; each pulls 5–50 operator customers into our orbit without LI ever cold-calling. The 14,000-member AAPL network is the addressable surface for the channel itself. Path A (BETA Land Services) in Section C.3 is the templated first instance.

The competitive ladder — old guard, newcomer, new approach

Old guard

Enverus

Cloud-locked, PE-flipped 3× in 7 yrs (Genstar→H&F $4.25B→Blackstone $6.5B Aug 2025). Bureaucratic, pricing-opaque, structurally slow. §03.

Newcomer proof

Collide

$5M seed (Mercury Fund, Apr 2025), RIGGS LLM, FDE motion. Proves the gate is open and the category is fundable. §05.

The leveling event

Foundation models

100 yrs of SPE/AAPG/courthouse/DOE/USGS/textbook corpus in the weights. Vernacular, metrics, IP definitions — all leveled. §04.

New approach

Lean Informatics

FDE-led services. Customer chooses where the data lives. Land/survey distribution. Industry-adjacent founder, sidecar-exposed via safety-systems deployment. Bootstrap.

leveling

The freight train. Frontier models in 2026 (Claude Opus 4.6, GPT-5.1, Grok 4, Hermes 4 405B, Llama 4 405B) were trained on the public, academic, and institutional knowledge corpus of 100 years of upstream oil & gas — SPE OnePetro (300K+ papers), AAPG journals, DOE/USGS/EIA technical reports, state RRC/RRC-equivalent public filings, courthouse title records, Schlumberger/Halliburton/Baker Hughes field manuals, decades of trade press, conference proceedings, academic petroleum engineering coursework, expert-witness transcripts. The vocabulary, the decline-curve math, AFE structure, joint-operating-agreement boilerplate, lease conventions, fluid models, completion design vocabulary — all in the weights. The "you can't compete without a roughneck on staff" defense is empirically dead. Favoritism, old-buddy networks, and insider knowledge cannot stop a foundation model that already speaks the language fluently and a 30-day-deployable filing system that costs one-tenth of the incumbent. Firms that recognize this before their competitors do bend the cost curve and compress decision cycles by 10×. Firms that wait pay the late-mover tax. §04 develops this argument with citations.

Lean Informatics in one screen

The Epic Systems posture is the operating frame — the disciplines, not the deployment topology. Bootstrap as long as possible. FDEs before salespeople. Long sales cycles as the moat. Annual user summit by Y3. Refuse easy SaaS-ification, easy money, and early acquisition. Houston as the geographic concentration. Epic itself moved to a hybrid cloud posture (Hyperdrive on Azure / AVD / Azure Large Instances) without abandoning the disciplines — the disciplines are what we copy. 10-year sector dominance is the goal; the 24-month $10–15M/mo target is a milestone on that arc.
The competitive ladder has a top rung that's structurally slow. Enverus is the old guard — great data, $500M+ ARR, 8,000 customers, 25 years of moat-building — but three PE flips in seven years (most recently Blackstone, Aug 2025), employees publicly describing "large project management bureaucracy" and "not really 'agile' in any sense," and the Enverus ONE platform launch itself framed as defensive against latecomers. That's a target, not a fortress. §03 develops the gap.
The market exists at meaningful scale and is structurally large. $15–30B/yr maximum addressable across upstream AI, land services, compliance automation, and adjacent agency emergency tech. Empirically proven by competitors, customers, and investor capital flowing in — §05 catalogs the Collide evidence.
Foundation models have leveled industry expertise. The vernacular, definitions, metrics, and institutional IP that used to gate vertical-AI entry are now compressed into a $20/month API call or a free open-weight checkpoint. What still gates entry is distribution, sovereignty, and FDE reliability under regulator scrutiny — not knowledge of the sector. §04.
Distribution wins, not features. The land/survey channel — 150–250 US firms with embedded customer relationships — is the unfair distribution path no cloud-first competitor can reproduce. Lean Informatics becomes the AI engine inside their existing service motion.
Two-track architecture compounds. Track 1 (cloud RRC compliance, $1.5–4.5K/mo) funds the company while Track 2 (on-prem appliance, $5.5–13.5K/mo) builds the defensibility. Tracks 3 (FDE services) and 4 (agency cross-sell) extend reach as scale permits.
Founder profile is industry-adjacent with sidecar exposure. Jonathan deployed FM RBDS mass-notification systems to DoD, state HS offices, counties, sheriffs, and fire-warning networks — many of them in O&G-dense Texas jurisdictions. He saw upstream from the customer's worst-day angle: incident response, blowout coordination, spill comms, well-pad evacuations. Not a roughneck — the engineer who showed up when the roughneck's day went sideways. Knows enough to be dangerous, not enough to be captive to industry orthodoxy.
Infrastructure is not the moat. The services relationship is. Security and infrastructure standards are industry-commoditized: SOC 2 Type II, ISO 27001, KMS-managed encryption, audit-grade logging, FIPS crypto, signed DPAs. We meet those standards everywhere we deploy. Data location becomes a customer preference dial — on-prem appliance for sovereignty-sensitive workloads, customer-owned colo for the middle path, hyperscaler of choice for the cost-optimized path. The on-prem reference architecture in Addendum B is one option, not our identity. The compounding moat is the FDE relationship, the workflow ownership, and the customer outcomes we own quarter over quarter. That is what makes Lean Informatics a services company first and a software vendor second — and why it's more competitive in 2026 than the cloud-or-bust playbook the incumbent and the newcomer both run.

02Core thesis

Lean Informatics' strategic question: if foundation models do most of the cognitive lifting, if open weights are free, if MLX runs them on a laptop, and if Claude Code can write the glue — what's left to charge for in 2026?

Three things, in order of defensibility:

Embedded trust — the FDE services relationship. Whose engineers sit in the customer's morning meeting? Whose phone does the VP Operations call when filings break at 11pm on a Thursday? Whose post-incident review does the regulator read first? This is the compounding moat. It cannot be cloned by feature work and it cannot be acquired in a $4.25B PE transaction.
Workflow-specific data accumulation, wherever the customer wants it. Every filing the system completes, every JSA it generates, every well-failure post-mortem it reads compounds into pattern recognition no foundation model has. The accumulation happens on the customer’s chosen substrate — on-prem appliance, customer-owned colo, hyperscaler of choice, or hybrid. The location of the data is the customer’s preference; the workflow ownership is ours.
Regulator-grade audit posture as a portable standard. SOC 2 Type II, ISO 27001, signed DPAs, named-human signoff workflows, FIPS-validated crypto, agency procurement readiness. The boring stuff that takes 12–18 months of paperwork. The same standards travel with us across every deployment topology, because in 2026 security and infrastructure standards are industry-commoditized table stakes, not competitive differentiators. Our founder's DoD/HS background means this isn't learned from scratch — it's transferred discipline.

Lean Informatics is structured to own all three from day one. #1 is the FDE motion. #2 is the workflow ownership pattern with location-flexible deployment (Addendum B documents the on-prem reference architecture as one option; the same workflow runs on the customer’s hyperscaler if that’s what they prefer). #3 is sequenced into the plan (SOC 2 Type I by Y1, Type II by Y2, agency-prime partnerships in parallel). This is what an Epic-style competitor looks like in vertical AI — and notice that even Epic itself has moved to a hybrid cloud posture without abandoning the operating disciplines that built the moat.

One more thing the thesis depends on, taken up in §04 in detail: foundation models have leveled the industry-expertise barrier, and security/infrastructure standards have commoditized the perimeter. A century of public, academic, and institutional O&G knowledge is in the weights of every frontier model. SOC 2 / ISO 27001 / KMS / audit logging are table-stakes everywhere. The "you need a roughneck on staff and a private datacenter" defense is empirically dead on both axes. What's left to charge for is what the models and the infrastructure cannot supply on their own — trust, distribution, FDE reliability, and ownership of the customer's workflow outcomes. That's exactly what Lean Informatics is built around.

moat

Infrastructure is not the moat. The services relationship is. Where the data lives is a customer dial in 2026, not a competitive differentiator. Security standards are industry-standard. Hosting is interchangeable across hyperscalers, customer colos, and on-prem appliances. That makes Lean Informatics a services company first. Our compounding asset is the FDE relationship and the workflow ownership it produces — the same disciplines Epic built on, applied to upstream O&G in 2026 instead of healthcare in 1979.

03Enverus — the old guard

Enverus is the structural incumbent of upstream data and analytics. Founded 1999 as DrillingInfo by Allen Gilmer, rebranded to Enverus in 2019 after multiple acquisitions, $500M+ ARR, 8,000+ customers across 50 countries, 2.7 PB of data, 350M+ courthouse records, $500B+ in annual energy transactions through its platform. Enverus is also where the wedge opens. Three private-equity flips in seven years, public employee commentary about bureaucracy and slow product cycles, two-star customer reviews on the public review sites that exist, and a product launch posture (Enverus ONE, April 7, 2026) that reads as defensive rather than confident. The data moat is real. The execution-organization is not what it was when DrillingInfo was the scrappy challenger.

Ownership history

3 PE flips, 7 yrs

Genstar (2018) → Hellman & Friedman ($4.25B, Jun 2021) → Blackstone ($6.5B, Aug 2025). Each transition: cost discipline, customer price increases, roadmap re-rationalization.

Public starting price

$275/user/mo

Floor for low-tier feature subscriptions. Enterprise tiers are RFQ-only. Reviewer-documented annual increases without negotiation transparency.

Enverus ONE launch

Apr 7, 2026

Governed AI platform. SOC 2 Type II. Isolated tenancies. Still cloud-only. Four launch "Flows" (AFE, Production Valuation, Project Siting, QuickStart).

Scale

8,000 / 50 countries

$500M+ ARR. 25-yr customer base. Real footprint — not a paper tiger. But also not a startup that can pivot on a quarter.

Where Enverus is structurally exposed

PE-driven price compression on customers. Public review-site evidence: "the subscription is not worth the price and they hide the price of the annual subscription till they send you the invoice"; "constant price increases and no added value for my uses"; 90-day cancellation policies buried until the customer tries to leave; threats of "action" for non-payment after attempted cancellation. The pattern is consistent with three PE sponsors in seven years compounding subscription revenue at the customer's expense. This is the unhappy customer pool.
Internal bureaucracy at PE-mature scale. Public Glassdoor commentary describes "large project management bureaucracy for a company of their size", "not really 'agile' in any sense of the word", "too much hierarchy in their management structure", "frequent internal reorganizations", and senior management protecting "pet projects" that should have been killed. A 598-review aggregate of 4.1/5 doesn't change the fact that the engineering organization is not a fast-moving target.
No deployment-topology choice for the customer. Enverus ONE is cloud-only on their tenancy. The launch language — "proprietary customer data remains isolated within a private tenancy" — is the strongest sovereignty pitch they can credibly make, but it's still their cloud. The customer ships data out and has no say in the substrate. For operators who want on-prem for completion designs, M&A diligence, AFE pricing models, or JOA-sensitive negotiation data — or for operators who simply prefer to run on their existing Microsoft/AWS contract and consumption-discount stack — Enverus has one answer. Lean Informatics meets the customer where the customer wants to be. Even Epic itself moved to a hybrid cloud posture in healthcare (Hyperdrive on Azure / Azure Large Instances) once customers asked for the option. Enverus has not extended that courtesy to upstream.
Acquisition-driven product sprawl. Spatial Business Systems (April 2026, utility design/engineering), Tracts.co partnership (April 2026, title automation), Xpansiv partnership expansion (May 2026, price discovery), plus legacy PRISM, MarketView, Foundations, Sphere, and now ONE. Each bolt-on carries integration debt and uneven UX. Customers reach for fewer tools, not more.
The Enverus ONE pitch contains its own admission. CEO Manuj Nikhanj framed the April 7 launch with the line "the gap between the companies that move now and the companies that wait is going to be significant and it is going to compound", and CPO Jimmy Fortuna pitched ONE as "the only AI platform that can" reason with O&G operating context. That language is defensive against the very thing Lean Informatics' thesis predicts: foundation models compressing the 25-year data moat into a deployable product. When the incumbent's CEO has to publicly insist on the gap, the gap is shrinking.
The long tail of mid-size operators is undersold. Enverus's center of gravity is enterprise: supermajors, capital markets, large independents, midstream majors. The 5–200-well operator who needs RRC compliance and AFE evaluation but can't justify a $50K+/yr Enverus contract is structurally underserved. That is exactly Track 1's wedge customer.

opening

The opening, stated plainly. Enverus is great at being Enverus. It is not built to deliver an FDE-led, customer-premises-deployed, sovereignty-first product to a 50-well operator in Midland in 30 days. The data moat doesn't translate into delivery agility. Three PE flips have made the company organizationally and pricing-wise exactly the kind of incumbent that bootstrap, founder-led, premises-deployed challengers historically displace at the edges. The mid-size operator and the land/survey channel are the edges.

04The foundation-model leveling event

The most important fact in this entire document, and the one that makes Lean Informatics possible at all, is this: frontier foundation models trained between 2023 and 2026 absorbed roughly a century of public, academic, and institutional oil & gas knowledge into their weights. The "you can't build vertical AI for upstream without 25 years of proprietary data and a roster of petroleum engineers" defense, repeated by every incumbent for the last decade, is no longer empirically true. The vernacular, the definitions, the metrics, the workflows, and the institutional IP that used to gate entry are now a $20/month API call or a free open-weight checkpoint.

What's in the weights

By 2026, the major closed-weight frontiers (Claude Opus 4.6, GPT-5.1, Grok 4) and the major open-weight families (Hermes 4 405B, Llama 4 405B, Qwen 3.5) have ingested, in some combination:

SPE OnePetro corpus — ~300,000 peer-reviewed petroleum engineering papers and conference proceedings.
AAPG and SEG journals — the geology and geophysics literature stretching back to the 1920s.
DOE, USGS, EIA technical reports — reservoir studies, basin assessments, methodology papers, regulatory technical bases.
State commission filings — Texas RRC, Oklahoma OCC, New Mexico OCD, North Dakota Industrial Commission, Louisiana Office of Conservation. Decades of public W-10/G-10/H-10 equivalents, drilling permits, completion reports, plugging records.
Courthouse public records — title abstracts, lease assignments, ROW grants, unit designations, pooling orders. Much of this is web-indexable.
Industry textbooks and field manuals — Schlumberger Oilfield Glossary, Halliburton/Baker Hughes operations manuals (where public), classic Craft-Hawkins, Slider, Lake, Ahmed, Economides petroleum engineering texts.
Trade press and conference proceedings — Hart Energy, JPT, World Oil, E&P Monthly, ARC Group, Wood Mackenzie public reports, IHS Markit precursors.
Academic petroleum engineering coursework — MIT OpenCourseWare, Stanford SCRF, Texas A&M, UT Austin, Colorado School of Mines, Tulsa graduate-level course materials where publicly posted.
Expert witness transcripts and litigation discovery — PACER and state court systems carry decades of technical expert depositions on well control, fluid mechanics, completion failure, royalty disputes.
YouTube engineering channels — everything from Practical Engineering and Real Engineering down to ChevronTexaco operator training videos and Oilfield Joe explainers. Petroleum vocabulary is in the audio transcripts.

The result, validated repeatedly in 2025-2026 benchmarks: frontier models hold their own on petroleum engineering coursework, score 50–70% on the SPE certification subset out of the box (Claude Sonnet 4.5: 52.5%; Grok 4: 62.5%; Collide's RIGGS at 67.5% is a 5-point fine-tune lift on top of a model that already knew the material), and can fluently produce AFE narratives, JOA boilerplate, geosteering interpretations, completion-design rationale, and regulator-grade filing language. The domain-tuned embeddings (PetroVec-style) that incumbents marketed as defensible IP can be reproduced by a competent ML engineer in a weekend.

What's been leveled

Vocabulary barrier

Gone

The model knows the difference between a swab cup and a packer, a frac plug and a bridge plug, an AFE and a JIB. It can converse with the field hand.

Vendor data lock-in

Gone

Foundation models ingest any format: PDFs, scanned permits, EDI feeds, LAS files. The "our data schema is the moat" pitch is over.

Roughneck-on-staff defense

Gone

The model has been trained on more SPE papers than any single petroleum engineer has read. It is not a replacement for an SME, but it is a replacement for the gatekeeping function of "you can't enter our industry."

Institutional IP

Gone (mostly)

Definitions, metrics, taxonomies, standard formulas, decline curves, fluid models, AFE templates, JOA boilerplate — all in the public corpus and therefore in the weights.

What still gates entry

Customer-specific proprietary data — which is exactly what stays on the customer's premises. The foundation model can speak the language fluently; it cannot tell us what is in this operator's 2024 completion report or AFE history. That gap is the buying trigger for the on-prem appliance.
Distribution — relationships, channel, trust. The 150–250 US land services firms have multi-decade trust with mid-size operators that no foundation model can manufacture in 20 quarters. That is why Lean Informatics goes through that channel, not around it.
FDE reliability under regulator scrutiny. SOC 2, signed DPAs, auditable signoff chains, named human accountability, agency procurement readiness. Foundation models don't sign DPAs. People do.
Workflow ownership and process design. Knowing what the AFE evaluation should look like is one thing. Designing the workflow so a 50-well operator can run it in 30 days without firing their landman is another. The model helps; the human still owns the process.

freight train

The call to action, stated for the firms that will read this. If you are a mid-size operator, a land services firm, or a vertically-integrated independent and you are waiting until "the AI thing settles down," you are paying the late-mover tax already. Frontier-model leverage is compounding monthly. Your competitors who deploy now bend their cost curve and compress decision cycles by 5–10×. Old-buddy networks, favoritism in vendor selection, and tribal industry knowledge cannot stop a system that already knows the language and costs one-tenth of what the incumbent is charging. The window in which "we'll wait and see" was a defensible posture closed somewhere between Claude Opus 4 and Enverus ONE. Lean Informatics is built to be the first call you make when you decide the wait is over.

Why this is good news for Lean Informatics specifically

If industry expertise had still been the moat, an industry-adjacent founder like Jonathan would be at a structural disadvantage. The foundation-model leveling event turns that on its head. The wedge becomes:

Cross-vertical operating discipline — FM RBDS, DoD, state HS, county, sheriff, fire-warning deployment experience — transfers into upstream's regulator-scrutiny posture better than 20 years of petroleum-only career.
Industry-adjacent sidecar exposure — deploying safety systems to Texas counties, sheriffs, and fire-warning networks meant standing next to O&G operations during their worst days: blowouts, spills, well-pad incidents, evacuations. Knows enough to be dangerous; not enough to be captive to industry orthodoxy.
Sovereignty-first architecture from RBDS prior art — the opacity layer, addressed payloads, vertical-stack ownership pattern — maps directly onto on-prem AI delivery. The incumbents grew up cloud-native; they cannot retrofit this.
Foundation-model fluency as the operator's translator. Our founder doesn't have to be a 20-year reservoir engineer; the model is. Our founder has to be the operator-engineer who knows how to ship the system to the customer's server room and keep it running.

The cross-vertical thesis (Addendum B.9) catalogues the eight SaaS winners who succeeded as outsiders in their target industries (Veeva, Toast, Procore, Carta, Stripe, Snowflake, Datadog, Persefoni). Every one of them won by treating institutional jargon and tribal knowledge as a layer to be learned, not a wall to be respected. Foundation models have flattened that layer further. The McLelland framing on X that "outsiders can't compete in O&G" is — in 2026 — self-serving marketing for a previous era.

05Market evidence — Collide as proof of fit

Collide sits one rung below Enverus on the competitive ladder. If §03 is the old guard, Collide is the newcomer whose mere existence and trajectory proves the gate has opened: a single founder team, a $5M Mercury Fund seed (April 2025), a public FDE motion, a domain-tuned LLM, and credible reference customers (Winn Resources, ConocoPhillips-affiliated deployment chatter) inside 12 months. The relevant signal is not "Collide will win the category" — it is that the category is now fundable, sellable, and executable in 2026 by teams that didn't exist in 2024. Lean Informatics' positioning, distribution, architecture, and sequencing are deliberately different (see Addendum B and Section C). The takeaway: the newcomer has proved the wedge, and the foundation-model leveling event (§04) means the wedge is wider than Collide's own positioning admits.

From the public stack diagram, Collide is a three-layer architecture flanked by Forward-Deployed Engineers (FDEs) who handle deployment, configuration, and change management. The "proprietary" labels in the diagram mark what they consider defensible.

LAYER 01 · INGESTION & DATA

Document classification → Domain pipelines → Petroleum embeddings → Security

Reads drilling reports, well logs, completion procedures, scanned land leases, SCADA exports, third-party plant statements. Their "Petroleum Embeddings Model" is marketed at +34% accuracy vs. OpenAI on petroleum terminology — a domain-tuned contrastive embedding (sentence-transformers or proprietary). proprietary proprietary proprietary off-the-shelf security

↓

LAYER 02 · DOMAIN INTELLIGENCE

Agentic orchestration → RIGGS LLM → Knowledge base → Basket of LLMs

RIGGS is the petroleum-tuned LLM, trained on "Spindletop hardware" (their internal training rig). 67.5% on SPE exam subset. Validation/reasoning layer wraps it. Agentic orchestration handles regulatory, production, well-failure flows. A "basket of LLMs" (GPT, Claude, others) is used for general reasoning when domain isn't needed. proprietary proprietary proprietary off-the-shelf LLMs

↓

LAYER 03 · OUTCOMES & APPLICATIONS

Automated workflows → GIS & mapping → Continuous improvement

The user-facing surface. Texas RRC G-10/W-10/H-10 filings, production reconciliation, lease term extraction, dynamic JSAs (job safety analyses) keyed to live weather, ESP failure root-cause. GIS layer lets users chat with maps. Continuous improvement = RLHF on FDE refinement and SME ranking. proprietary proprietary proprietary

FDE

Running alongside all three layers: Forward-Deployed Engineers (geophysicists, completions engineers, landmen) who locate data sources, configure workflows, and own change management. This is the Palantir playbook. It is also where 30–50% of Collide's gross margin lives or dies.

06Claims vs. reality

We treat the marketing surface as untrusted input and verify before assuming.

Claim	Verdict	Honest read
RIGGS beats GPT-5.1 / Grok 4 / Claude Sonnet 4.5 on SPE exam	PARTIALLY TRUE	67.5% > 62.5% (Grok) > 52.5% (Sonnet) is plausible. GPT-5.1 at 4% is implausible as a knowledge claim — almost certainly a refusal/format failure on a particular prompt mode. Cite cautiously. The subset is only 40 questions.
Petroleum Embeddings: +34% vs OpenAI	DIRECTIONALLY TRUE	Domain embeddings beat general embeddings on domain tasks — well-established in the literature (PetroVec, FinBERT, BioBERT pattern). The specific 34% number on what benchmark? Unstated. A solo can replicate the direction of this result on a weekend.
Texas RRC filing: 99.4% time reduction	CREDIBLE	Winn Resources case study (50 wells / 20 min vs hours). Forms are structured, the RRC publishes EDI specs, the failure mode is tedium not intelligence. Reproducible by a competent script.
"AI-native platform purpose-built for the oilfield"	MOSTLY MARKETING	The architecture is RAG + agents + fine-tuned model + workflow UI. The positioning is what's purpose-built — FDEs, founder pedigree, vocabulary. Not a fundamentally novel architecture.
"First GenAI platform for energy"	CONTESTED	Enverus, C3.ai, and others were there first at different sizes. "First" is a positioning claim, not a fact. Enverus ONE (Apr 2026) is the actual category-defining incumbent.
RIGGS as "the intelligence layer underneath every operator's workflow"	ASPIRATIONAL	Unproven at scale. Single named customer (Winn Resources) so far in public materials. Distribution is the open question, not the model.

07Where the moat actually lives

Strip the tech and look at what's hard to copy:

Five moat candidates, scored

Moat	Strength	Why it matters
Founder distribution — McLelland is ex-roughneck with 111K-view tweets, Chuck Yates is a known oilman. Digital Wildcatters has 8,000+ professional members in 122 countries.	UNREPLICATABLE	This is the actual moat. Cold outreach as a stranger to an upstream operator is a closed door. Coming in as McLelland is a phone call.
FDE service motion — embed engineers onsite, deliver custom config, learn the patterns, push back into product.	HARD	Palantir invented this. Real moat — if a firm has the product spine. A solo can do this for ~2 accounts max.
Workflow-specific data flywheel — every filing they handle, every well-failure pattern, makes the next one easier.	EMERGENT	Compounds only with multiple customers and clear permission to learn cross-customer. Not yet realized at $5M seed scale.
Regulator audit posture — SOC 2 Type II, signed BAAs, audit trails, named human signoff.	EXPENSIVE	~$80–150K and 12–18 months for SOC 2 Type II. Doable for funded company, painful for solo.
RIGGS & petroleum embeddings — their proprietary model + embedding stack.	COMMODITIZABLE	Open weights + good corpus + MLX = ~80% of the gap closed in 60 days. Not a real moat; marketing-led defensibility.

read

One thing to internalize: Collide is a distribution company with a model on top, not a model company with distribution on top. That's why the Digital Wildcatters community pre-existed the AI pivot. Knowing this changes how we compete.

08The Lean Informatics toolstack, mapped

Our stack — Claude, data ingestion, open models, MLX, Hermes, report structure, Houston field presence — is unusually well-suited to this problem. Concretely, here is what it gives us:

Claude

Brain

Frontier reasoning for orchestration, code generation, document understanding. Use Sonnet for production agents, Opus for hard reasoning.

MLX (M5/M4)

Edge

Local 70B inference at 153 GB/s. Fine-tune Qwen-14B or Llama-8B for petroleum on a laptop. Zero per-token cost in the field.

Hermes 4

Steerable

Open-weight, tool-use-ready, ChatML, no refusal tax on operational content. Good base for a "RIGGS-equivalent" fine-tune.

Data ingestion

Pipeline

PDF/CSV/SCADA → chunk → embed → vector store. Standard ware; the value is in the petroleum-specific schema and entity extraction.

Report structure

Output

Templated outputs (filings, JSAs, decision memos) with human-in-the-loop signoff. Closes the trust gap fast.

Houston

Geo

Proximity to operators, RRC, the conferences, the wildcatters. This is non-trivial. We can drive to a wellsite.

tip

The Houston piece is doing more work than it sounds. The AI-in-O&G conference (500+ execs) is in Houston. CERAWeek is in Houston. The RRC is in Austin (3hr drive). The mid-size E&P operators have HQs in the Galleria, downtown, and The Woodlands. Distribution is a network problem and we are standing inside the network.

09Layer-by-layer solo build

Mapping the Collide architecture to our toolstack, in order of build sequence.

Layer 01 · Ingestion & petroleum embeddings

Solo equivalent:

# corpus: SPE papers, OnePetro abstracts, RRC filing PDFs, lease templates,
# completion procedures, Daily Drilling Reports (scrub PII)
# ~50-200M tokens of petroleum-domain text is the sweet spot

# embedding stack
- BAAI/bge-large-en-v1.5  (start here)
- contrastive fine-tune on petroleum SME-labeled triplets (~3-5K pairs)
- evaluate on a holdout of well-name / formation / equipment queries
- target: +20-30% recall@10 over base on petroleum queries

# pipeline
unstructured.io  → tika fallback for OCR-heavy PDFs
LlamaIndex or LangChain (chosen pragmatically)
Qdrant or pgvector (start with pgvector, scale later)

Verdict: SOLO ACHIEVABLE. Two to four weeks of focused work matches Collide's claim direction. The hard part is corpus curation, not the embedding training.

Layer 02 · RIGGS-equivalent domain model

Solo equivalent:

# base model choices (ranked)
1. Qwen 3.5 14B-Instruct       — best fine-tune ROI, MLX-ready
2. Hermes 4 14B                 — steerable, no refusal tax
3. Llama 3.3 70B                — heaviest, best ceiling, slower iter

# training data (this is the real work)
- SPE papers (some require licensing)
- Public RRC filings (millions of records)
- Drilling reports (anonymized via customer #1)
- Synthetic Q/A pairs generated from petroleum textbooks → graded by an SME

# infrastructure
- LoRA + MLX-LM on M5 Max for first 2-3 rounds
- Lambda Labs / RunPod for the final full SFT pass
- Eval against the SPE PE exam (practice exams are publicly purchasable)

# realistic target
- 55-60% on the SPE PE exam in 60 days of focused work
- 65-70% in 6 months with domain SME feedback loop

Verdict: SOLO ACHIEVABLE WITH FOCUS. Matching RIGGS's 67.5% in 60 days is unlikely, but striking distance is reachable — and for the actual customer workflow (filling out a W-10), we do not need to. The eval ≠ the product.

Layer 03 · Agentic orchestration

Solo equivalent:

# the agent loop, kept boring on purpose
- Claude Agent SDK / LangGraph for the high-stakes flow
- Hermes 4 (local) for cheap intra-tool reasoning
- Tool registry: rrc_lookup, well_master_query, scada_pull, pdf_extract,
                  jsa_template_fill, ssa_compute, gis_proximity_check
- Guardrails: every output that mutates state has a "human sign here" step

# observability (do this from day 1, not day 100)
- Langfuse or Helicone for trace replay
- Every prompt versioned in git
- Every customer interaction snapshotted for the eval set

Verdict: SOLO ACHIEVABLE. This is exactly where Claude Code + the Agent SDK shines. One person can ship this faster than a team because there's no coordination tax.

Layer 04 · Outcome surface (the actual product)

Solo equivalent:

Pick ONE workflow. Not all of Collide's. Just one. The W-10/G-10 filing is the canonical wedge — it has clear ROI ($300/well/yr in filing labor), public APIs, and Collide has already proven the demand.
Build it as a CLI first, then a web app. The first customer doesn't need a polished UI — they need a working pipeline.
Templated outputs with human signoff. The user reviews and approves each filing before submission. This is both compliance posture and trust-building.
GIS only if customer asks. Leaflet + RRC shapefiles cover 80% of the need. Do not pre-build it.

Verdict: SOLO ACHIEVABLE IN 30 DAYS. The W-10/G-10 use case specifically.

Layer 05 · The FDE motion (founder-led, in person)

Solo equivalent: Our founder is the FDE. Our founder drives to the customer's office in Midland or The Woodlands, sits with their operations director, watches them file W-10s by hand, and builds the integration in front of them. This is how Lean Informatics wins against Collide on a single account: we ship faster because there is no meeting to schedule.

Verdict: SOLO ACHIEVABLE, BUT CAPS THE BUSINESS AT 2–3 ACCOUNTS. This is the fundamental scaling constraint that defines our hiring schedule.

10Solo vs. company verdict matrix

The honest gap, capability by capability.

Capability	Collide (company)	Lean Informatics (solo)	Gap
Document classification + ingestion	Custom pipelines, scaled	unstructured.io + Claude + pgvector	close to zero
Petroleum embeddings	RIGGS embeddings, claimed +34%	bge fine-tune on petroleum corpus	closable in weeks
Petroleum LLM (RIGGS)	67.5% SPE, trained on Spindletop	Qwen/Hermes LoRA, 55-65% SPE	closable in months
Agentic orchestration	Internal framework	Claude Agent SDK + LangGraph	solo may ship faster
GIS mapping over O&G data	Proprietary, chat-with-map	Leaflet + RRC shapefiles + Claude	slower polish
W-10/G-10 filing automation	Live with Winn Resources	30-day build to first customer	parity achievable
JSA generation (weather-aware)	Live	Buildable in 2 weeks	close to zero
Dynamic well-failure RCA	Claimed, low public detail	Harder — needs OEM specs + sensor history	depends on data access
SOC 2 Type II posture	Implied, presumed in progress	~$80–150K + 12–18 mo	structural disadvantage
Forward-Deployed Engineers (scaled)	Hiring multiple FDEs	Founder, one human, 2–3 accounts	caps our growth
Brand & community distribution	Digital Wildcatters, 8K+ members, 111K-view tweets	Build from zero	months/years to close
Capital cushion	$5M seed runway	Personal runway + first invoice	structural
Speed of single-customer iteration	Internal coordination cost	We ship same day	SOLO ADVANTAGE
Burn rate	~$150–300K/mo blended	~$8–15K/mo all-in	SOLO ADVANTAGE
Pricing flexibility	Enterprise floor (~$50K/yr+)	$2K–15K/mo, flexible	SOLO ADVANTAGE

11What a solo can't do (honest)

This section exists to keep us honest with ourselves and with investors.

We cannot scale FDEs past the founder. Past 2–3 accounts the firm is either turning customers away or becoming a consultancy. The hiring plan addresses this bottleneck explicitly.
We cannot sign with a supermajor in Year 1. No procurement department will sign with a Texas LLC of one without a SOC 2 letter, named executives, and references. We aim middle: 5–50 well operators.
We cannot outrun a brand-distribution flywheel. McLelland's tweets do free pipeline generation. We will need either a personal brand strategy or a partnership.
We cannot easily defend against open-source. Anything we build, someone could open-source 6 months later. The defense is customer entrenchment and workflow ownership, not novel IP.
We do not compete on "platform" framing. We compete on outcome: a number on a contract, signed in dollars saved.
We cannot ignore Enverus. Enverus ONE (April 2026, with Astra model + SOC 2 Type II + 25 years of data + Continental/BPX/Chord partnerships) is the real giant. We position around them, not Collide.

honest

A reader who still finds this attractive after the can't-do list is the right reader. Most people will not sit with this list.

12The Houston wedge

The right way to do this is to pick the narrowest defensible wedge and own it before anyone notices.

The wedge: Texas RRC compliance for mid-size operators

Why this customer: Mid-size Texas operators (5–200 wells) file W-10, G-10, PR forms monthly. Most do it by hand or with Excel. The pain is real, recurring, and quantifiable.
Why this workflow: Public forms, public data, public APIs (RRC EDI), low political risk if it breaks (filing is reviewed before submission). Easy to demo, easy to price.
Why this geography: The RRC is in Austin. Texas operators are in Houston, Midland, Tyler, Fort Worth. We can be on a wellsite in 5 hours.
Why this moment: Collide has proven the demand with Winn Resources. The market is now educated. We do not need to evangelize; we need to be cheaper, faster, and closer for operators below their floor.

Positioning against Collide

We are the RRC compliance pipeline for operators too small for Collide and too tired of spreadsheets to keep doing it by hand. White-glove, fixed-price, deployed in the operator's office in two weeks. We don't sell a platform — we sell completed filings. — draft positioning statement

Three customer profiles to target

Family-owned Permian operator, 10–40 wells. Owner runs the filings themselves. Hates it. Will pay $1500/mo to make it go away.
Midstream gathering company, 50–150 wells. Has a controller doing this 3 days/month. Math is obvious.
Mineral rights manager / landman service company. Filings for multiple clients. They would resell our tool as part of their service.

1390 / 180 / 540 day plan

Days 1–30 · Build the wedge

Set up the project structure: monorepo, pgvector, Claude Agent SDK, unstructured.io pipeline.
Get 100 sample W-10 and G-10 filings from the public RRC archive.
Build the end-to-end pipeline against synthetic well data: read SCADA exports → reconcile production → generate filing → human signoff → submit via EDI.
Write the eval set: 25 wells, known correct answers, runs in 5 minutes.
Buy domain. Build a 1-page landing page. Start writing on LinkedIn / X about RRC pain.

Days 31–90 · First customer, in person

Get one paying customer at $1.5–5K/month. Houston-area, friend-of-friend, or community connection.
Drive to their office. Sit with their ops person. Watch the manual process for half a day.
Ship the integration in 2 weeks. Use the next 2 to harden against their actual edge cases.
Document everything: a runbook, a one-pager, a case study with hours-saved math.
Start the SOC 2 readiness conversation with Vanta or Drata. Don't start the audit yet.

Days 91–180 · Second customer + the eval moat

Use case study #1 to land customer #2 and #3 (target: 3 customers at $36K–120K ARR by day 180).
Start the petroleum embedding fine-tune in earnest — real corpus from customer data is now in hand (with permission).
Begin the petroleum domain LLM fine-tune. Target: 55% on SPE practice exam.
Apply to TX Railroad Commission as an authorized filer / EDI participant if not already.
Hire a fractional ops person to handle customer onboarding so our founder stays on engineering.

Days 181–540 · Decide what Lean Informatics becomes

Path A — Lifestyle consultancy. 5–10 customers, $400K–1.2M ARR, founder-run indefinitely. No outside capital. Houston gold.
Path B — Productize and raise. The same workflow, repackaged as self-serve. Raise a small angel round to hire 2 FDEs. Compete with Collide directly.
Path C — Acquihire / partner. Sell to Collide, Enverus, or Quorum as a workflow module. Our code + our customers = their next-month roadmap.
Decide which based on the market signal: how fast customers came, what they're asking for next.

14Costs, pricing, margins

Cost stack (monthly, year one)

Line item	Monthly	Notes
Claude API (Sonnet + Opus mix)	$400–1500	Scales with customer count and document volume
Embedding API / self-hosted	$50–200	Mostly free after MLX local serve
Vector DB (Qdrant Cloud / pgvector on Hetzner)	$50–200	Self-hosted dirt cheap
Observability (Langfuse, Sentry)	$100	Start free tier
Cloud compute (1 small VPS, occasional GPU rent)	$200–500	Hetzner + Lambda Labs on demand
Vanta / Drata (SOC 2 readiness)	$1000	Start month 4
LLC, accounting, insurance	$300	E&O insurance becomes essential customer 2+
Founder draw	$6000–10000	Houston cost of living
Total	$8–14K/mo	vs. Collide's est. $150–300K/mo

Pricing menu

Tier	Price	Includes
Pilot (30 days)	$5,000 one-time	Onsite setup, one workflow, 50 filings
Operator	$1,500/mo	Up to 50 wells, monthly filings, JSAs, support SLA
Operator+	$4,500/mo	Up to 200 wells, custom workflows, GIS, dedicated Slack
FDE engagement	$15K/mo flat	Half the founder's time on one account, custom build

Path to $400K ARR

5 Operator+ ($270K) + 4 Operator ($72K) + one $5K pilot/month ($60K). Achievable in 12–18 months from a standing start with disciplined sales discipline. Gross margin: ~85%. Net margin (incl. founder salary): ~30–40%.

15Risk register

Risk	Severity	Mitigation
Collide drops price to floor or open-sources commoditized layers	medium	Compete on white-glove + smaller account fit, not platform
Enverus ONE crushes the small-operator segment with a sub-$1K tier	medium	Move fastest in the small-operator gap and entrench before they do
RRC changes filing format / EDI spec	medium	Subscribe to RRC bulletins, build adapter pattern, charge for the migration
Founder burns out as one-person FDE	high	Cap at 3 active customers, hire fractional ops by month 4
Procurement rejects us for lack of SOC 2	high	Start Vanta month 1, get Type I within 6 months
Hallucinated filing causes customer regulatory issue	existential	Human-in-loop on every submission, E&O insurance, no auto-submit ever
Foundation model price war pulls floor out	low	We benefit — lower inference cost. Open weights insulate.
Collide acqui-offers and we say yes	good problem	Negotiate from the position of running cash-flow positive

16Sources

Every claim in this document traces back to one of the following. When in doubt, prefer the primary source over the secondary.

Collide.io primary

Collide.io homepage and schema.org metadata — product positioning, FAQs, 99.4% filing time reduction claim.
EnergyCapital: Collide rolls out RIGGS AI platform (May 2026) — RIGGS SPE benchmark numbers, Spindletop training rig, McLelland quotes.
Collin McLelland (@FracSlap) on X — BoA hedge-fund presentation — 111K views, founder reach signal.
Collide $5M seed announcement (Apr 2025) — Mercury Fund, Sheffield/Quinn/Albin participation.
Houston InnovationMap on Collide seed.
EnergyCapital: Mercury Fund leads Collide round.

Competitive landscape — Enverus (the old guard)

Enverus ONE launch (April 7, 2026) — Astra model, SOC 2 Type II, Flows, "the gap between the companies that move now and the companies that wait is going to be significant" quote.
Enverus ONE product page.
Hellman & Friedman completes acquisition of Enverus (June 2021, $4.25B) — PE flip #2.
Blackstone to acquire Enverus (Aug 2025, $6.5B) — PE flip #3 in seven years.
PE Insights: H&F launches Enverus sale — deal mechanics and crossed-$500M-ARR threshold.
Enverus on Software Advice — 2.0/5 rating, 1.0/5 value-for-money and customer-support, public review evidence of pricing opacity and 90-day cancellation surprise.
Enverus on Slashdot — published $275/user/month starting price.
Enverus on Glassdoor (598 reviews) — employee commentary on bureaucracy, hierarchy, reorganizations, non-agile development culture.
Spatial Business Systems joins Enverus (April 2026) — acquisition-driven product expansion.
Enverus × Tracts.co partnership (April 22, 2026).
Novi Labs — competing on drilling economics.
Quorum Software market share.

Foundation-model leveling — petroleum corpus & domain LLMs

i2k Connect EnRG-LLM — SPE OnePetro corpus of 300,000+ papers/journals/articles used to fine-tune oil & gas LLMs.
AWS: Customize LLMs with O&G terminology via Bedrock — cloud-vendor evidence that domain customization is now commodity.
Toward EnergyGPT (arxiv 2509.07177) — academic survey of energy-domain LLM specialization.
Building domain-specific LLMs in 2026 — methodology overview.
LLMs and foundation models in petroleum engineering & geoscience (preprint).

Forward-Deployed Engineer model & vertical AI

Solo toolstack & open models

Hermes 4 by Nous Research — 14B / 70B / 405B open weights.
Hermes 4 release coverage.
llama.cpp vs MLX vs Ollama vs vLLM (Apple Silicon, 2026).
Apple M5 MLX — 70B LLMs go portable.
vllm-mlx — OpenAI/Anthropic-compatible server for Apple Silicon.
MLX Apple Silicon AI Dev Stack — fine-tune LLMs on Mac.
Portuguese petroleum word embeddings (PetroVec) — precedent for domain-tuned embeddings.
Petrovec GitHub repository.

Persona profiles

Texas RRC, filings, regulatory

APersona deep dive — Digital Wildcatters & the operator class

The §07 verdict — Collide's real moat is distribution, not architecture — only matters if a competitor can name the people who make distribution real. This addendum profiles four reference points: the Digital Wildcatters flywheel that birthed Collide, plus three named operators / executives whose patterns are worth modeling. For each, the question is the same: where does the FDE model fit when we are standing where they are standing?

The four-way fit framework

For each persona, evaluate four roles:

As a buyer — would they pay for a Lean Informatics FDE engagement?
As an advisor — what would they teach us?
As a partner / distribution — would they help us reach customers?
As an acquirer — would they (or someone like them) eventually acquire us?

A.1Digital Wildcatters — anatomy of the flywheel

Collide cannot be understood without understanding the machine that made it. Digital Wildcatters was founded by Collin McLelland (ex-roughneck, "Fracslap" on X) and Jake Corley in 2019 as a podcast. Chuck Yates — ex-$8B energy fund manager, "fired in April 2020" — joined and lent industry credibility. Collide was spun out of the community, not built into it.

Members

9,000+

Hand-vetted petroleum engineers, operators, founders across 122 countries.

Seed funding

$2.5M

For Digital Wildcatters (separate from Collide's later $5M seed).

Flagship event

Fuze

Annual Houston conference. Past attendees: Devon, Halliburton, Accenture, FTI, AWS, McKinsey.

Surface area

Podcasts + events + community + software

Oil and Gas Startups, Chuck Yates Got A Job, Energy Tech Night, DW Power Hour, Collide product.

How the flywheel actually spins

Podcast as top-of-funnel. Every episode is a free industry interview. Operators come for the stories, stay because they like the hosts.
Community as middle. The 9,000-member network is where the actual relationships form — deal flow, hiring, vendor referrals.
Events as conversion. Fuze and Energy Tech Night convert relationships into pipeline. Sponsors pay; vendors prospect; founders pitch.
Collide as monetization. The AI product sells back into the community that already trusts the brand. Pre-built warm market.

key

The order matters. Audience first, product second. Most vertical AI startups try to do this in reverse and fail. Collide had a 9,000-person warm market on day one of the AI pivot. That's why a $5M seed felt like enough.

What this means for the Lean Informatics build

We will not replicate the flywheel in 12 months. No attempt to. A podcast takes 3 years to build an audience, and McLelland/Yates have a 7-year head start.
We plug into the flywheel. Show up to Energy Tech Night as a startup pitcher. Sponsor a single Fuze track. Get on a podcast.
We build a counter-flywheel in a sub-niche they don't cover. Compliance officers. Landmen. Lease analysts. Pick a verb the Wildcatters don't own.
We use their members as customer discovery. The community is searchable, the conversations are public, the pain points are documented.

A.2Allen Gilmer — the DrillingInfo / Enverus playbook

If Collide has a north star, it's Allen Gilmer. He's done a version of this exact arc, only with structured data instead of LLMs, two decades earlier.

The pattern

1999: Co-founded DrillingInfo in Austin with Mark Nibbelink. Started by physically collecting Texas drilling permits daily and turning them into a searchable database.
2010s: Layered analytics, GIS, and well-economics onto the permit core. Became the de-facto E&P data platform.
2018: Acquired by Genstar Capital (San Francisco PE).
2019: Rebranded to Enverus on the 20-year anniversary. Now the energy industry's data & AI exec layer.
2021: Gilmer retired from the Enverus board.
2025: Joined Oil & Gas Asset Clearinghouse (OGAC) as Principal Partner. Active at MaScience. Also runs Tiki Tāne Pictures (film production, with industry vets) — he's deliberately diversified out of pure energy.

What Gilmer's playbook teaches us (and what Collide already learned)

Data acquisition is the moat. Everything else — the UI, the model, the analytics — can be replaced. But the firm that systematically captures the daily permit, the daily filing, the daily completion report becomes indispensable in seven years, not seven months. — Gilmer playbook, paraphrased from his public commentary

The four-way fit

Role	Fit	Reasoning
Buyer	unlikely direct	He's an investor/advisor now, not an operator buying tooling. But he is the kind of person who endorses operators to firms like ours.
Advisor	ideal	He has lived the data-to-platform arc. A 30-minute call with Gilmer about "should I build the corpus first or the model first" would be worth more than 100 hours of YouTube.
Partner / distribution	via OGAC	OGAC's clearinghouse customer base is upstream sellers/buyers — an adjacent audience to whoever buys our filing automation.
Acquirer	not him; his portfolio peers	Enverus itself is the obvious strategic acquirer of a workflow-AI tuck-in. Genstar's playbook is to consolidate. Build with that exit logic in mind.

action

An introduction to Gilmer through Digital Wildcatters or the OGAC/MaScience network is worth pursuing before more is built. One conversation reshapes the roadmap.

A.3Jim Flores @ Sable Offshore — the single-asset political operator

Flores is a different category entirely. He runs Sable Offshore Corp (NYSE: SOC), HQ Houston, with a single concentrated asset: the Santa Ynez Unit (SYU) in federal waters offshore California. Platform Harmony was producing ~22,000 gross bopd as of March 2026 after restart.

Why Sable is unlike a Texas E&P

Federal waters, not state. Regulator is BOEM (Bureau of Ocean Energy Management) + BSEE for safety, not the Texas RRC.
Political asset. Restart was contested by California Coastal Commission; required Defense Production Act invocation by the Trump Administration in March 2026. DOJ hearing on Consent Decree modification set for June 1, 2026 in C.D. Cal.
Pipeline blocked. The Las Flores Pipeline System remains under dispute. Sable submitted a revised plan to BOEM in October 2025 proposing an offshore storage & treating (OS&T) strategy with shuttle tankers as a workaround.
Single-asset risk profile. One platform's compliance posture is the entire company's compliance posture.

What an FDE engagement at Sable would look like

Sable's compliance burden is high-stakes, low-volume, and bespoke. This is the opposite end of the spectrum from the W-10/G-10 high-volume Texas RRC wedge. The right product for Sable isn't a filing automation pipeline — it's:

A regulatory document understanding system that ingests the Consent Decree, NEPA filings, BOEM correspondence, CCC briefs, and surfaces obligations + deadlines.
A stakeholder mapping tool that tracks every party (DOJ, BOEM, CCC, Santa Barbara County, NGOs, plaintiff coalitions) and what they've said publicly.
A scenario modeling engine for "if the pipeline restarts vs. if we go shuttle tanker, what's the production curve and the compliance trail."

The four-way fit

Role	Fit	Reasoning
Buyer	narrow, high-value	Sable could pay $250K–1M annually for a high-quality regulatory-intelligence FDE engagement. But this is not our wedge product — it's a sidecar consulting line at best.
Advisor	limited	Flores's deal-making is uniquely his. The lessons don't generalize to a solo software founder. Useful as a case study, not a mentor pattern.
Partner / distribution	no	Sable is single-asset. Their network isn't a fit for a Texas RRC compliance product.
Acquirer	no	Operators don't buy software companies. Wrong logic chain.

guard

We do not chase a Sable-shaped logo as our first customer. It's seductive (high contract value, name recognition) but the workflow doesn't generalize, the sales cycle is 9–18 months, and the political surface is dangerous. Stay in the mid-market upstream wedge until product-market fit is in hand, then maybe open a regulatory-intelligence sidecar.

A.4Bryan Hanks @ BETA Land Services — the existing FDE business

This is the most strategically interesting profile of the four. BETA Land Services is already a Forward-Deployed Engineer business — they just use landmen instead of software engineers. The model isn't theoretical for Hanks. He's been running it for 30+ years.

Founded

Lafayette LA

With Texas + Gulf Coast operations.

Acres developed

4.3M+

Across thousands of wells.

M&A managed

$40B+

120+ corporate transactions since 2010.

Hanks experience

41 years

CPL (Certified Professional Landman).

What BETA actually does

Land & lease acquisition, due diligence, abstracting, title research, title curative work, right-of-way / pipeline projects, and increasingly: solar, wind, transmission line, carbon sequestration, and battery storage. They sell completed landwork. Customer hands them an acreage block; BETA returns a clean title, an executed lease bundle, a ROW package.

This is exactly the value proposition Lean Informatics wants for the AI-powered version: completed filings, not a platform. The shape of the offering is identical. Only the technology stack changes.

The strategic question

The question for us is whether BETA is a competitor, a customer, a partner, or a template. The honest answer is: all four, depending on framing.

Competitor framing: If we sell title-curative AI to operators, BETA's landmen lose hours. They have inertia, relationships, and brand on their side. We have margin and speed.
Customer framing: Sell to BETA, not around them. Our AI becomes their internal force multiplier. Same landmen, 3x throughput. They keep the customer relationship, we take a per-acre fee.
Partner framing: White-label our filing automation as "BETA Compliance" or "BETA Digital." They distribute, we build. Their existing customers get an instant upgrade.
Template framing: Even without a Hanks conversation, BETA's service motion — embedded specialists, fixed-scope deliverables, charge by the project — is the right model to copy.

The four-way fit

Role	Fit	Reasoning
Buyer	very strong	BETA already pays for tooling that makes their landmen more productive. The pitch writes itself: "30% throughput on title work, same headcount." Test this in 2 months.
Advisor	underrated	Hanks has lived the FDE motion at scale. He's seen what wins and loses in the field for 41 years. Worth 10x more than any VC partner on this specific question.
Partner / distribution	structural fit	BETA's customers are precisely the mid-size operators we want. They sit between us and them today. Be the AI engine inside their service.
Acquirer	credible at scale	A profitable land services company expanding into adjacencies (solar, carbon) needs digital capability. Strategic acquisition logic is real if we reach $1M ARR.

play

The BETA move: Once one mid-size operator is running on our RRC pipeline, the next step is a "land package" extension — lease term extraction, title chain assembly, ROW document generation — and pitch BETA as our second customer. They have the workflow, the volume, and the budget. They are the missing-link distribution.

A.5Cross-walk: who matters for what

One table, the operative summary of the four profiles.

Persona	Best framing	What they unlock	How to engage
Digital Wildcatters community + Collide	Distribution flywheel	9,000-member warm market, Fuze access, podcast reach	Sponsor Energy Tech Night, pitch as a startup, get on a podcast, lurk in the community
Allen Gilmer DrillingInfo / Enverus / OGAC	Advisor + acquisition logic	Strategic playbook from our founder who did this 20 years ago in structured data	Warm intro via Wildcatters, OGAC, or Austin energy scene; one 30-min advisory call
Jim Flores @ Sable Sable Offshore Corp	High-CV sidecar customer (eventually)	Regulatory-intelligence niche if we ever want to expand off the Texas wedge	Do not chase. Park as a future opportunity once a SOC 2 letter and 3 case studies are in hand
Bryan Hanks @ BETA BETA Land Services	Partner / distribution / acquirer	Existing FDE business with customer base and operations; structural fit for AI tooling	Cold email after customer #1 lands. Pitch as "internal throughput multiplier." Houston/Lafayette is a 4-hour drive.

The deeper takeaway

These four profiles span the full operator class:

Digital Wildcatters is the distribution archetype — how we reach the customer.
Allen Gilmer is the founder archetype — how we build the long game.
Jim Flores is the edge-case customer archetype — how we stay focused.
Bryan Hanks is the scaled FDE archetype — how we avoid hiring twenty engineers by partnering with someone who already has them.

order

Engagement sequence, ranked: (1) Hanks — he unlocks the most leverage per conversation. (2) Wildcatters community — cheapest distribution. (3) Gilmer — one call, lifetime payoff. (4) Flores — not until we are ready, then carefully.

BLean Informatics — on-prem architecture & founder fit

This addendum specifies the reference architecture for the deployment topology where customer data never leaves the customer’s perimeter. It is one option in a customer-centric menu — alongside customer-owned colo, customer-managed hyperscaler tenancy, and hybrid — not Lean Informatics’ identity. We meet the same security standards (SOC 2 Type II, ISO 27001, KMS, audit logging) across every topology because those standards are industry-commoditized in 2026. We document this option in depth because (a) our founder’s prior art makes it unusually well-matched to LI’s delivery DNA, and (b) for the sovereignty-sensitive subset of upstream workloads — completion designs, M&A diligence, AFE pricing models, JOA-sensitive negotiation data — on-prem remains the buying trigger.

posture

This section is intentionally not promotional. The architecture works for specific wedges and breaks down for others. Read §B.8 before committing to it as the company default.

B.1Founder prior art — what translates

Lean Informatics' founder previously built and deployed a mass-notification platform on top of FM RBDS (Radio Broadcasting Data System), used by DoD, state homeland security offices, counties, sheriff's offices, and fire warning networks. Customers in that domain do not tolerate optimistic engineering. The technical and operational habits that come out of those deployments map directly onto the on-prem AI play described here.

sidecar

Industry-adjacent, sidecar-exposed. The mass-notification deployments concentrated heavily in O&G-dense Texas jurisdictions — Permian counties, Eagle Ford parishes, Barnett-overlay sheriff's offices, Houston-area emergency management offices, fire-warning networks downwind of refining corridors. Our founder is not a roughneck and not a petroleum engineer. He is the engineer who showed up when the roughneck's day went sideways — blowout coordination, well-pad evacuations, spill comms, sheriff coordination during pipeline incidents, fire-warning ops during weather emergencies in oil-services towns. That sidecar exposure is the right kind of "knows enough to be dangerous": fluent in the operating cadence, audit posture, regulator-scrutiny pressure, and the customer's worst-day shape — without being captive to industry orthodoxy on how things "have always been done." Combined with the foundation-model leveling of §04, this is exactly our founder profile the cross-vertical thesis (B.9) predicts wins vertical SaaS at this point in history.

The prior architecture, abridged

Data plane: Satellite-fed message origination → broadcast via terrestrial FM transmitters → received at distributed endpoints. End-to-end vertical integration.
Protocol layer: RBDS Group 7A used to carry addressed, opaque payloads on the 57 kHz subcarrier. To an outside sniffer the bitstream is structured but meaningless — the codebook for PIN + service-code addressing lives in the receiver.
Edge logic: Receivers held the decode logic and automatically re-locked onto the next FM tower carrying their PIN if the current carrier dropped. Resilience was a property of the endpoint, not of central infrastructure.
Fail-over: Multi-source (satellite primary, multiple terrestrial transmitters secondary). No single point of failure between origination and endpoint.

What that translates to in the AI architecture

RBDS pattern	On-prem AI analog	Why it matters here
Vertical stack ownership (sat → transmitter → receiver)	Vertical stack ownership (appliance → model → output portal)	Our founder already knows how to ship the whole vertical. Most software founders only know the top of the stack.
Group 7A opaque addressing	Token/codebook layer at the customer edge	Same idea: meaning lives at the endpoint, not in the transport. Sniffers see structured noise.
Receiver-side logic + auto fail-over	Appliance-side routing logic + paired failover unit	The smarts live where the data lives. Central infrastructure does not need to be trusted.
Satellite + terrestrial redundancy	Local inference primary + optional cloud burst for non-sensitive secondary	Two independent paths, customer-controlled which is used per workload.
DoD/HS/county/fire procurement experience	Same procurement DNA: audit trails, chain-of-custody, named operator accountability	We do not have to learn this from scratch. Most of the AI-for-O&G field is not staffed for this conversation.

edge

The competitive read: Collide and Enverus are cloud-first companies trying to retrofit enterprise security. Lean Informatics can credibly start sovereignty-first because our founder has shipped it before, to harder customers, in a harder protocol environment. This is not a story a firm can fake.

B.2The architecture

Customer-Owned Inference Appliance + Lean Informatics FDE Operation. Each customer gets a dedicated unit. No cross-customer data pooling. Lean Informatics holds no raw customer data on its own infrastructure at any time.

PLANE 01 · CUSTOMER PREMISES (TRUST DOMAIN)

The appliance — single GPU node + tokenization layer

2U or 4U server lives in customer's server room, on-site closet, or a customer-paid cage at a Tier-II colo of the customer's choosing. Holds: raw documents, tokenization codebook, embeddings, vector index, knowledge graph, agent runtime, audit log. This box is the trust boundary. No raw data leaves it.

↓

PLANE 02 · CONTROL CHANNEL (AUDITED, NARROW)

Outbound: telemetry, attestations, signed updates only

From appliance to Lean Informatics: signed health telemetry, model attestation hashes, log integrity proofs, and signed software-update receipts. No customer data, no embeddings, no document content. Wireguard tunnel with mutual TLS, customer-controlled kill switch.

↑

PLANE 03 · LEAN INFORMATICS HQ (NO CUSTOMER DATA)

Model lab + signed-update bus + FDE workstation

Houston-side: model fine-tuning against synthetic + customer-anonymized eval sets, signed model+config builds, FDE remote-ops workstation. Lean Informatics never holds a customer's raw documents. If subpoenaed, LI has nothing to produce. That is a feature, not an inconvenience.

Operating modes per customer

Air-gapped: Appliance has no network path to LI. FDE flies in monthly for updates via signed offline media. Highest sovereignty, slowest iteration.
DMZ / kill-switch: Outbound-only control channel, customer-controlled physical disconnect. Default mode for most customers.
Hybrid burst: Local inference primary, optional cloud burst (LLM API) for non-sensitive secondary workloads, gated by customer policy per workflow. Used only when customer explicitly opts in.

B.3Reference appliance BOM

One appliance generation per customer cohort. Designed to be unremarkable hardware that an enterprise IT department recognizes and a county sheriff's IT could maintain.

Component	Reference part	Notes
Chassis	Supermicro 4U GPU server (4124GS-TNR or similar)	Standard rack, dual PSU, IPMI
CPU	2× AMD EPYC 9354 (32C)	Headroom for tokenizer + retrieval + agent pipelines
GPU (option A)	1× NVIDIA L40S (48 GB)	Runs 30–70B quantized; ~$8–10K street
GPU (option B)	2× NVIDIA H100 PCIe (80 GB)	Heavy inference + light fine-tune; ~$50–60K street
RAM	512 GB DDR5 ECC	Knowledge-graph + vector index resident
Storage (data)	2× 7.68 TB NVMe (LUKS, mirror)	Self-encrypting, FIPS 140-3 SED preferred
Storage (OS)	2× 480 GB NVMe (mirror, encrypted)	Read-only mount post-boot
Network	Dual 10/25 GbE + IPMI	Out-of-band on isolated VLAN
Security	TPM 2.0, Secure Boot, IPMI-disabled-by-default	Measured boot, attestation chain
Key mgmt	YubiHSM 2 or external HSM (customer choice)	Document encryption keys never leave HSM
Failover unit	Identical spare in cold standby	Manual cutover; 30-min RTO. Optional.

Per-unit cost (Lean Informatics side)

BOM (Option A, L40S): ~$28–38K street, ~$22–30K negotiated direct.
BOM (Option B, dual H100): ~$95–120K street, ~$75–95K negotiated.
Burn-in + provisioning + customer-specific imaging: ~$2–4K of LI time per unit.
Shipping & installation logistics: ~$1–2K (white-glove freight, on-site time).

B.4Data plane & opacity layer

The opacity discipline from the RBDS days applied to AI:

Tokenization at ingest

Customer-identifying entities — well names, API numbers, lease IDs, operator names, person names — pass through a tokenization layer on the appliance before any model sees them.
The codebook lives on the appliance HSM. Lean Informatics never possesses it.
Even if a model output or log were exfiltrated, identifying entities are opaque without the local codebook. This is the same principle as RBDS Group 7A: structured data, meaningless without the receiver-side codebook.

What crosses to Lean Informatics (and what does not)

Data	Crosses?	Why
Customer documents (PDF, CSV, SCADA exports)	no	Stay on appliance
Embeddings of customer documents	no	Reversible — treated as sensitive
Knowledge-graph nodes/edges	no	Stay on appliance
Model weights (LI-provided)	yes, signed	Pushed from LI to appliance, attested
Health metrics (CPU, GPU, disk %)	yes	For SLA + remote diagnosis
Tokenized eval-set statistics (counts, accuracy on tokenized fixtures)	yes, with consent	Opt-in. Used to improve next model rev.
Raw error traces with content	no	Logs are redacted on appliance before any export

B.5Operational model (FDE-as-service)

The Forward-Deployed Engineer is no longer optional — it's the unit of value delivery.

Engagement phases

Scoping (Week 0–2): NDA, SOW, single workflow defined. Tabletop walkthrough with customer's IT, ops, and (when applicable) compliance.
Provisioning (Week 2–4): Appliance imaged at LI Houston bench. Customer-specific tokenization codebook generated and burned into HSM. Burn-in + integration tests against synthetic data.
Onsite deployment (Week 4–5): FDE flies in. Rack-and-stack, network handoff, customer-witnessed key ceremony, smoke tests.
Workflow onboarding (Week 5–8): FDE resident or daily-onsite. Real customer documents ingested onto appliance, never leaving. First production outputs delivered with customer sign-off on each.
Steady state (Month 3+): Remote operation via audited tunnel. Monthly onsite cadence. Quarterly key rotation. Annual physical security audit jointly with customer IT.

Audit posture from day one

Every action by LI's FDE on the appliance is logged with cryptographic chain-of-custody.
Customer's SIEM (if they have one) receives a live audit feed. If they don't, the appliance generates a signed weekly audit summary.
Customer holds the kill switch. Severing the control channel does not impair the appliance — it just freezes updates and remote support.
LI carries E&O + cyber liability insurance from day one (~$15–25K/yr at the appropriate coverage).

DoD

The discipline above is roughly the same posture used in DoD CDS (cross-domain solutions) and state HS deployments. Our founder has shipped to this bar. Re-using the muscle memory is the actual moat, not the hardware.

B.6Pricing & unit economics

Customer-facing menu

Tier	One-time	Recurring	What's included
Appliance Lite (L40S)	$25K onboarding	$5,500/mo	Single L40S unit, one workflow, monthly onsite, business-hours support
Appliance Pro (dual H100)	$45K onboarding	$11,000/mo	Dual H100, up to 3 workflows, bi-weekly onsite, 24/7 critical-issue line
Air-gap Compliance	$60K onboarding	$13,500/mo	Pro tier + paired failover unit + quarterly physical security audit + customer-witnessed key ceremonies
Custom (DoD/HS/agency style)	quoted	quoted	Statement of work, SCIF-compatible options, classification handling

Lean Informatics unit economics (per Appliance Pro customer)

Year-1 revenue: $45K onboarding + $132K recurring = $177K.
Year-1 cost of delivery:
- Hardware (amortized over 3 years): ~$28–32K/yr
- FDE time (founder, ~30% allocation): equivalent ~$45–60K/yr
- Travel (onsite cadence): ~$8–12K/yr
- Cyber insurance allocation: ~$3K/yr
- Tooling, monitoring, software allocation: ~$4K/yr
Y1 gross margin: ~50–55%. Improves to ~65–70% in year 2 (no onboarding cost, hardware partially amortized, FDE allocation drops as ops becomes routine).
Three customers at Pro tier = ~$530K Y1 ARR, ~$1.0–1.2M run-rate by Y2 if onboarding spreads.

limit

FDE cadence caps a solo at 3–4 active Pro accounts. If the on-prem play wins, we must hire a second FDE by customer 4. The plan is for that hire to be a former DoD/HS field engineer, not a generalist software engineer.

B.7Wedge realignment under the on-prem model

The on-prem architecture is more expensive and slower to ship than the cloud wedge in §12. It only makes sense for workflows where data sovereignty is the buying trigger. That is not the Texas RRC filing wedge.

Two-track strategy

Track	Wedge	Architecture	Purpose
Track 1 — cash flow	Texas RRC W-10 / G-10 compliance for mid-size operators (§12)	Cloud-native (Claude + pgvector + Hetzner)	Public data, fast deploy, 30-day sales cycle. Funds the company.
Track 2 — defensibility	Confidential workflows: completion design optimization, geosteering, M&A diligence, lease portfolio strategy, subsurface modeling	Lean Informatics appliance (this addendum)	High-CV, sticky, true moat. Competes against Collide/Enverus on sovereignty, not features.

Recommended Track-2 first wedge candidates, ranked

Confidential M&A diligence for upstream transactions. Buyer-side data room ingestion, well-by-well economics, lease overlap, environmental liability surface. Tied to BETA Land Services partnership in §A.4 — they handle 120+ deals annually. Sovereignty is non-negotiable because every party in a deal room is a competitor of the others.
Completion design optimization for Permian/Eagle Ford independents. Operator's frac design + offset performance + lateral spacing — the actual secret sauce. No operator will put this in Collide or Enverus.
Geosteering interpretation + subsurface modeling. Live interpretation during drilling operations. Latency-sensitive (favoring edge inference) and IP-sensitive (favoring on-prem).
Public-safety adjacencies. Wildland-urban-interface (WUI) fire risk to operator assets, hurricane evacuation logistics for offshore crews, emergency-management integration. Founder's prior network actually opens this category — not a stretch.

B.8Verdict — when this architecture wins and when it loses

Wins

Where the customer's well/completion/lease data is treated as competitive IP. Mid-size Permian and Eagle Ford operators competing with majors. Yes.
Where the buyer's compliance/risk officer signs off, not just the engineer. The on-prem story wins that signoff.
Where our founder's DoD/HS/agency procurement experience is the differentiator. No vertical-AI competitor in O&G credibly has it.
Where the FM RBDS prior art demonstrates "this team has shipped sovereignty-first systems before." That story is unfakeable.

Loses

For Texas RRC filings, where the data is public anyway. Cloud wedge wins.
For any wedge where time-to-first-value < 30 days is the buying criterion. On-prem can't beat SaaS on speed.
For customers below ~$300K revenue from this single workflow. Onboarding cost amortizes wrong.
If Lean Informatics tries to run this and the cloud wedge at solo headcount without sequencing. Sequence Track-1 first; Track-2 starts after first paying Track-1 customer.

Decision criteria for committing to Track 2

One Track-1 customer live and reference-able (≈Month 4).
Two qualified Track-2 prospects with budget authority identified (target: one BETA-channel, one direct operator).
First appliance BOM purchased only after a signed LOI on the first Track-2 customer.

read

The honest call: Track 2 is the actual long-term business. Track 1 is the cash-flow bridge that buys the credibility to sell Track 2. Treat them as two products with two architectures, sold to overlapping customers, with our founder's DoD/HS background as the wedge that makes Track 2 credible in year one rather than year three.

B.9The cross-vertical founder thesis

Collide's marketing leans on a specific implicit claim: that oil & gas software fails when built by outsiders, because outsiders automate workflows without understanding why those workflows exist. McLelland says this plainly on the company blog. It is a self-serving framing. It also doesn't survive contact with the venture data.

What the evidence actually says

The best vertical SaaS companies of the last fifteen years were largely built by cross-vertical founders:

Company	Vertical	Founder background
Veeva Systems	Pharma CRM	Peter Gassner — Salesforce, not pharma
Toast	Restaurant POS	Three founders — none were restaurant operators
Procore	Construction mgmt	Tooey Courtemanche — real estate, not construction
Carta	Cap tables / equity	Henry Ward — finance generalist, not equity admin
Stripe	Payments infra	Collison brothers — outsiders to payments
Snowflake	Data warehouse	Muglia (Microsoft), Dageville (Oracle) — outside the incumbent OLAP world
Datadog	Cloud monitoring	Pomel, Lê-Quôc — ex-Wireless Generation, an education company
Persefoni	Carbon accounting	Founders from finance, not climate science

What is — and isn't — truly vertical-specific

Compliance frameworks are universal. SOC 2 is SOC 2. GAAP is GAAP. SOX, audit posture, E&O insurance, evidence handling, chain-of-custody — portable across industries. A founder who has shipped to DoD doesn't need to relearn audit discipline because the customer says "petroleum" instead of "homeland security."
Cyclic economies share structure. Capex-intensive, M&A-driven, regulated industries on commodity cycles all behave the same way at the business-model layer. The shape of the cycle in upstream O&G is more dramatic than the FM/notification cycle, but the dynamics — cost-cutting in down years, capex unlocked in up years, M&A peaks at cycle bottoms — are recognizable.
Vernacular and culture are learnable. Six to twelve months of intentional customer discovery, time at industry events, and FDE onsite hours gets a startup veteran functionally fluent. This is empirically how most successful vertical SaaS founders learned their domain.
What insiders genuinely have: a head start on the first 5–10 customer conversations, faster intuition for "this won't work in the field" failure modes, and existing brand. None of these are insurmountable. All three can be addressed by partnering with an industry-veteran advisor and through the FDE motion.

The Lean Informatics reframe

Stated honestly, Lean Informatics' founder profile is:

Cross-vertical pattern recognition. Having shipped to DoD, state homeland security, county and sheriff agencies, and fire-warning networks — each with its own vernacular, procurement quirks, and political surface — is exactly the muscle that lets a founder enter a new vertical and avoid the rookie traps faster than someone who only knows one vertical.
Startup operational discipline. Going from zero to one is its own profession. Founders who have done it before do it faster on the second and third attempt regardless of vertical.
Vertical-integration experience. Building a full stack from satellite through transmitter to receiver firmware is a rare skill set in the AI-for-O&G field. Most competitors stop at the application layer.
Autodidactic learning posture. The single trait that most reliably predicts cross-vertical success. Learning a new vocabulary, reading the trade publications, attending the conferences, sitting on the wellsite — this is a 6–12 month effort, not a 6–12 year one.

The honest read on McLelland's framing: it's true that outsiders who skip customer discovery build operationally useless tools. It's not true that outsiders who do the work can't compete. The history of vertical SaaS is mostly outsiders who did the work.

posture

Operative implication: Lean Informatics doesn't compete by pretending to be an oil-patch native. It competes by showing up with DoD-grade ops discipline, vertical-stack engineering experience, and a willingness to sit with the operator until the workflow works. That's a different mode than Collide is running, and it's defensible precisely because most competitors can't do the first two.

CLean Informatics — vision & business plan

Sections 1–13 plus Addenda A–B were the sector research. Section C is the operating plan that follows from it. Lean Informatics is the protagonist. Collide is a peer in the sector. Enverus is the cloud incumbent. The plan is bootstrap-first, founder-led, and aims at sector dominance on a multi-decade timeline.

The Epic Systems play — the disciplines, not the deployment topology

The strategic posture is Epic Systems for upstream oil & gas. Judy Faulkner founded Epic in 1979 with $70K in Madison Wisconsin. Refused venture capital. Refused to go public. Ran 45 years of bootstrap growth to roughly $5B in annual revenue. Owns ~40% of US hospital EHR. Mayo Clinic runs on Epic. Kaiser runs on Epic. Cleveland Clinic runs on Epic. Once an institution deploys Epic, the lock-in is decades. That is the model.

What we copy from Epic is the operating discipline, not the deployment topology. Epic itself has evolved to a hybrid posture — Hyperdrive on Microsoft Azure, Azure Virtual Desktop, Azure Large Instances — without abandoning what made Epic Epic: bootstrap, implementers before salespeople, customer-deep workflows, refuse easy SaaS-ification, refuse early acquisition, annual user summit, geographic concentration, decades-long customer lock-in. That set of disciplines is what made Epic dominant. The on-prem-only piece was an artifact of the 1979 starting point, not the source of the moat. Lean Informatics is built around the disciplines; the deployment topology is whatever the customer prefers.

Lean Informatics will not match Epic's $5B revenue in 24 months. No one does. The 24-month $120–180M target is a milestone on a multi-decade arc. What we copy from Epic is the structural posture, not the financial trajectory.

What Epic owns and how (the playbook)

Epic pattern	How it works	Lean Informatics analog
Customer-centric on data location	Epic originally ran in hospital data centers. As of 2025 Epic also runs on Microsoft Azure (Hyperdrive on AVD, Azure Large Instances) — customer chooses. ~15% of Epic sessions are Hyperdrive on Azure as of May 2025.	Customer dial: on-prem appliance (Addendum B), customer-owned colo, hyperscaler tenancy, or hybrid. Same SOC 2 / ISO 27001 standards every topology. LI’s identity is the FDE relationship, not the box.
Long, deep implementations	12–24 month deployments by Epic-employed implementers ("Implementation Services"). Customers pay millions over years.	The FDE motion. Founder-led at first, then a small army of former DoD/HS field engineers. The implementation is the product.
Workflow depth, not breadth	Epic touches every clinical and billing workflow. Once it owns the workflow, the data, and the user training, displacement requires re-running the entire workflow.	Track 1 owns the RRC filing workflow. Track 2 owns the confidential-IP workflows. Together, they own the operator's day.
Institutional anchor customers	Mayo, Kaiser, Johns Hopkins, Cleveland Clinic. Their reputations carry the brand.	Land Diamondback, Pioneer, EOG, or a supermajor as a reference account. One halo customer per year.
Bootstrap, private, employee-owned	Faulkner refused VC and IPO. Profits stay in the company. Decisions stay with the engineers.	Bootstrap as long as possible. Take outside capital only when the alternative is losing a strategic window. Retain control.
Geographic concentration	Madison/Verona Wisconsin. All Epic developers in one place. Culture is the asset.	Houston. All Lean Informatics FDEs and engineers based here. Drive distance to the customer base. Culture matters.
Annual user group meeting	Epic UGM at the Verona campus — 25,000+ attendees, legendary in healthcare. Builds the community as a moat.	Launch the Lean Informatics annual summit by year 3. Operator IT directors, compliance leads, FDE alumni. Houston venue.
Refuse the easy SaaS-ification	Epic stayed enterprise, deep, expensive, and slow to release — even after going hybrid on infrastructure. The discipline didn't change; the substrate did. Held the line against SaaS competitors who looked faster but couldn't go deep.	Don't pivot to a self-serve / freemium model when pressure comes. The deep FDE-embedded services relationship is the moat. The underlying infrastructure can move with the customer.
Long sales cycles are the moat	Epic deployments take 18 months. That barrier keeps faster competitors out and locks customers in for decades.	Same logic applies. A 6-month FDE-led deployment is a feature, not a bug. It selects for customers who will stay for 10 years.

What this changes about the 24-month plan

Customer selection becomes prestige-aware. Every Track-2 customer in years 1–2 should be a name that future customers will recognize. One Pioneer or Diamondback is worth ten anonymous mid-size operators in brand terms.
Don't apologize for being expensive. Epic doesn't. The pricing menu is correct. Hold the floor.
Hire the implementers first, not the salespeople. Epic's army of implementers (Forward-Deployed Engineers, in modern parlance) is what makes the deployments stick. The same hire ranking applies.
Geographic concentration is a feature, not a constraint. Houston-based FDEs serving Houston-clustered operators is faster, cheaper, and more resilient than a distributed workforce.
Long-term thinking over short-term ARR. The 24-month target is a milestone. If hitting it requires sacrificing depth, customer fit, or post-sale trust — don't.
Refuse the wrong investors. If raising, take capital that's patient with the Epic-style timeline. Founder-friendly seeds. Strategic capital from energy LPs. Not growth equity that demands SaaS multiples and short payback.

Mission & positioning vs. the field

Mission: build the FDE-led services company for confidential industrial AI workflows, starting in upstream oil & gas, on the Epic Systems operating pattern — customer-centric on data location, deep on workflow ownership.

Competitor	Their angle	Lean Informatics' counter-angle
Enverus ONE	Cloud-only on their tenancy. Governed AI, SOC 2 Type II, 25-year data heritage. No customer choice on substrate.	Meet the customer where the customer wants to be: their colo, their hyperscaler, our recommended on-prem appliance, or hybrid. Same security standards either way. Win on FDE depth and pricing discipline.
Collide	Vertical-AI startup, RIGGS LLM, FDE motion, Digital Wildcatters distribution. Cloud-first.	Compete in adjacent positioning — operator IT directors, sovereignty-sensitive customers, agency-adjacent customers. Offer deployment-topology choice they don't. Not a head-on Wildcatters-community fight.
Palantir Foundry	Enterprise FDE incumbent, government heritage, full-stack platform.	Smaller, faster, operator-shaped pricing. Their floor is $1M; ours starts at $5.5K/mo. Same FDE DNA, lower altitude.
Big Consulting (Accenture, McKinsey)	Strategic advisory + delivery. No proprietary AI stack.	We ship a product, not slides. Foundation-model + workflow + FDE bundled. Customer can host wherever they want.
Novi Labs / Quorum / C3.ai	Specialized point solutions or large legacy platforms.	Workflow-specific, services-led, infrastructure-agnostic. Deployment-topology choice is a feature none of them offer.

What we sell

Track 1 — cloud SaaS: Texas RRC compliance + production reconciliation + JSAs for mid-size operators. $1.5–4.5K/mo.
Track 2 — on-prem appliance: confidential workflow AI (M&A diligence, completion design, geosteering, reservoir analytics). $5.5–13.5K/mo + onboarding.
Track 3 — services: Forward-Deployed Engineer engagements for high-stakes accounts. $15–50K/mo retainers.
Track 4 — agency cross-sell: emergency-management AI for DoD / state HS / county / fire-warning customers, leveraging our founder's prior network. Multi-year contracts $1–10M.
Track 5 — Always-On Upstream Intelligence (AFK): continuous monitoring agents watching state portals, BLM/BOEM feeds, and acreage-overlap signals against the client's portfolio. Alerts into Slack/Teams within minutes of source. $3–8K/mo on top of an existing engagement or as a standalone wedge. Ships after an underlying workflow proves Level 3 tokenomics (see Operating Model below), not before.

The agentic engineering operating model — the “how” beneath the “what”

The four tracks above describe what we sell. What follows describes how we deliver it: an operating model built on the five disciplines the industry now calls agentic engineering. We have been running this way since the first engagement was scoped — the language is new, the practice is not.

Pillar 01 · Agent Harness

“Whoever controls the harness controls your results.”

One base factory, many per-client harnesses. The Permian operator's harness is not the Bakken A&D shop's harness. Specialization is the moat. We own the harness end-to-end — model stack, adapters, doc pipeline, integrations — rather than renting it from a portal vendor.

Pillar 02 · Software Factory

“Build the system that builds the system.”

Our FDEs don't deliver engagements one at a time. They instantiate engagements on a shared factory: synthetic data factory, reusable state adapters, pre-trained extractors, integration templates. One engineer multiplexes parallel workstreams — the throughput indicator is outputs-per-FDE-hour, not hours-on-keyboard.

Pillar 03 · Extensible Software

“Open to extension, closed to modification.”

Models change quarterly. Tools change monthly. Workflows change with every M&A event. Every layer of our stack is pluggable and swappable. The cost of absorbing a new state portal, doc type, model class, or client workflow is bounded by design — which is what makes the unit economics hold at engagement five and ten, not just engagement one.

Pillar 04 · Always-On Agents (AFK)

“Useful tokens first. Then turn them on.”

Track 5 above. Continuous monitors against state portals, acreage overlap, M&A signals. Recurring revenue on the same factory once the underlying workflow proves Level 3 tokenomics. The KPI is alert-to-decision ratio, not API spend.

Pillar 05 · Agentic Access

“Agents only command what they can reach.”

Our integrations are built for human-facing tools today — Petrel, Spotfire, ArcGIS, Excel, well master DBs. The same surface is callable by the operator's own agents tomorrow over MCP, REST, direct DB. When clients run their own agents (by end of 2026), our layer is already agent-ready. Portal incumbents will not be.

Knowledge compounding & workflow capture — the moat in one sentence

The longer we operate inside a client, the more of their workflow we capture; the more we capture, the more necessary we become; the more necessary we become, the more durably the engagement-grade margin holds against the commoditization pressure that hits every services firm in years three through five.

Three things compound simultaneously every month an engagement is live:

Knowledge compounds. Corrections corpus grows. Per-client LoRA sharpens on this operator's document idiosyncrasies. FDE institutional understanding of this well master, these geologists, these lease quirks — none of it replicable by a successor vendor in under twelve months.
Workflow capture compounds. Engagement 1 is the wedge (typically RRC filings or a targeted A&D pull). Engagement 2 is an adjacency (AFE, monthly PR, completion review). Engagement 3 is the always-on monitor. Engagement 4 is the well-master sync. By engagement 5, we operate the upstream data layer end-to-end. Each capture deepens the integration surface and shortens the next sale cycle.
Necessity compounds. A vendor who delivers a report can be replaced at renewal with a competing report. A services partner whose AFK monitor, well-master sync, Petrel pipeline, and weekly-filing watch are all running against production decision-making cannot. Removing us becomes a project, not a procurement decision.

Tokenomics — how the unit economics actually work

Most AI services firms operate at what the industry calls Level 1 tokenomics: buy more tokens, run them through agents, hope value emerges. The bill goes up, the value doesn't. Lean Informatics is engineered to operate at Level 3 from day one.

Level 1 — use more tokens. The floor. Anyone can do this with a cron job. Generates spend, not value. Where most enterprise AI pilots quietly die in year two.
Level 2 — make the tokens useful. Spend tokens on problems an operator currently pays a human to solve. Every token we plan against has a known dollar-cost-of-the-human-alternative (a tracer pass at $40 of analyst time, a filing classification at a landman's morning).
Level 3 — capture the revenue. Buy commodity inference cheap on local hardware (75–85% of token volume on a single L40S or M5 Ultra), buy frontier inference sparingly via API for genuinely hard judgment (15–25%), sell the integrated output as an engagement-grade deliverable at services pricing. The spread is the 88–92% gross margin number that anchors the per-customer math.

The lock-in described above is what protects the Level 3 arbitrage from competitive price compression. Without lock-in, the spread closes the moment a credible competitor lands. With it, the spread holds across renewal cycles.

C.1The 24-month target & honest math

$10–15M/month total revenue, 24 months, bootstrap. That equals $120–180M annualized. The math has to work somehow. The table below shows it doesn't work via direct sales alone.

Avg deal size (ARR)	Deals needed for $150M	Feasibility at solo + bootstrap
$50K	3,000	impossible
$200K	750	impossible
$500K	300	requires 100+ FTE sales org
$2M	75	requires channel partner doing volume
$5M anchor + $500K small	10 anchors + 60 small	possible with right anchor + channel + ~5 FTE
$20M agency contract + small	2–3 agency + 40 small	possible if gov network closes

The conclusion the math forces: $120–180M in 24 months bootstrap requires one or more of — an anchor enterprise/gov deal ($10–30M ARR each), a high-velocity partner channel (BETA-style, 100+ small deals through their existing customer base), or a productized SKU sold through resellers. Section C.3 maps the four leverage paths.

read

A target this aggressive should be either lowered, lengthened, or matched with a willingness to raise capital in months 6–12 if leverage paths require it. The plan below optimizes for optionality — we build to be ready for the target without burning if it doesn't materialize. And, critically — the Epic Systems posture means we'd rather hit $50M ARR with a 10-year compounding lock-in than $150M ARR with shallow customers.

C.2Base case — solo bootstrap without leverage paths

What happens if Lean Informatics ships Track 1 and Track 2 cleanly but none of the leverage paths fire. This is the floor.

Month	Track 1 customers	Track 2 customers	Track 3 retainers	Run-rate ARR
3	1 (pilot)	0	0	$30K
6	2	0	1	$240K
9	4	0	1	$400K
12	6	1	1	$700K
15	9	1	2	$1.1M
18	12	2	2	$1.8M
21	16	3	3	$2.8M
24	20	4	3	$3.8M

Solo bootstrap floor: $3–5M ARR by month 24. Roughly $300K–420K/month. That's 2–4% of the stated $120–180M target. The gap is real and the rest of the plan is about closing it.

Notable: this floor is still a credible high-growth bootstrap result and a real business. From an Epic Systems lens, this is the equivalent of Faulkner's first 3–5 years — small revenue, deep customers, building the foundation. Don't mistake it for failure.

C.3The four leverage paths

Each path is independent. Each is leveraged by something our founder already has. The plan opens all four in parallel and gates further investment on leading indicators.

Path A — BETA Land Services partner channel

Mechanism	White-label or revenue-share into BETA's 4.3M-acre customer base. LI's appliance becomes the digital throughput multiplier inside BETA's existing FDE motion.
Revenue model	20–30% rev share on pull-through transactions, or per-acre / per-deal fee.
Upside	If 10% of BETA's annual transactions pull LI tools, that's plausibly $20–50M/yr to LI.
Timeline	3–9 months to first pilot, 12–18 months to material rev share.
Pre-conditions	Working Track-2 appliance demo, BETA pilot agreement, channel contract with exclusivity language.
Risk	BETA builds internally instead of partnering; takes longer than expected.
Leverage source	BETA's appetite for AI tooling; Hanks's 41 years of FDE-business operating muscle.

Path B — DoD / state HS / county agency contract

Mechanism	Reposition the on-prem appliance + RBDS-era operating discipline for agency emergency management. WUI fire risk, agency-side AI for emergency notification, sheriff incident triage, county hazard modeling.
Revenue model	Multi-year contracts at $1–10M ACV. Annual recurring + services.
Upside	1–2 agency contracts in 24 months = $5–30M revenue.
Timeline	12–18 months to first contract dollar (gov procurement is slow).
Pre-conditions	Small Business contracting registration (SAM.gov, UEI), partner with existing GSA-schedule prime, optional SBIR/STTR grant for non-dilutive bridge.
Risk	Gov procurement timing; scope shifts during contracting.
Leverage source	Founder's DoD / state HS / county / sheriff / fire-warning customer base. This door is open only because the prior company existed.

Path C — Anchor enterprise (T2 operator or supermajor)

Mechanism	One operator at $5–15M ARR via Track-2 multi-workflow deployment plus heavy Track-3 services. Target list: Diamondback, Pioneer, Devon, EOG, Continental, Plains, Targa, Enbridge. This is the "Mayo Clinic of operators" play.
Revenue model	3-year enterprise contract, $500K–1.5M onboarding, $5–15M annual.
Upside	Single anchor moves the needle materially AND becomes the brand-halo reference.
Timeline	9–18 months to close, 6 months to deploy.
Pre-conditions	SOC 2 Type II (required), 1–2 reference customers, exec-level intro.
Risk	Single point of failure; one customer dependence; slow procurement.
Leverage source	Allen Gilmer's network if a warm intro materializes; Houston operator network broadly; Wildcatters community indirectly.

Path D — Productized appliance + reseller program

Mechanism	Package the Track-2 appliance as a SKU. Landmen, O&G consultants, regional IT integrators, and BETA-like service firms resell it under a margin split.
Revenue model	30–50% margin to reseller, 50–70% to LI. Per-unit revenue $50–100K + recurring.
Upside	50–200 units/yr at $200K blended = $10–40M revenue.
Timeline	12–18 months to launch the program; 18–24 months to material volume.
Pre-conditions	Stable Track-2 product, reseller training program, channel agreements, support tier infrastructure.
Risk	Channel conflict with direct sales; support burden scales with units.
Leverage source	The appliance product itself + LI's hardware/firmware experience from the RBDS era.

read

Most plausible combination for the stated target: A (BETA) + B (one gov contract) + C (one anchor enterprise). A alone could plausibly deliver $30–50M. B alone $5–30M. C alone $5–15M plus halo. The three combined with steady direct sales put $80–130M ARR in reach by month 24. Adding D in year 2 closes the rest. The Epic-style discipline: don't chase D until A and B and C are stable.

C.4Org & hiring schedule

Bootstrap discipline: hire only when revenue justifies it. Hire implementers (FDEs) before salespeople, on the Epic pattern. Cap at <15 FTE until $10M ARR.

Trigger	Hire	Profile	Comp range
Customer #1 live (~Month 3)	Fractional ops admin (10–15 hr/wk)	Houston-based, contracts/AP/customer onboarding	$2–4K/mo
Customer #4 or BETA pilot signed	Second FDE (full-time)	Former DoD/HS field engineer or O&G ops engineer	$180–220K base + equity
$2M ARR or Track-2 customer #2	Third FDE + 1 full-stack engineer	Engineer profile: pipelines/data/agents	$150–200K each
$5M ARR or Path A activation	Channel/sales lead	Industry vet from BETA, Enverus, Wildcatters network; closer not BDR	$180–250K base + commission + equity
$10M ARR or Path B contract	Compliance/contracts/SOC officer + 2 more engineers + 1 more FDE	SOC 2 / FedRAMP / agency-procurement-literate	$200K+ each

By month 24 in the stretch scenario: ~12–18 FTE. By month 24 in the base case: ~3–5 FTE. Compare to Epic's first 5 years: Faulkner kept the company under 10 people for the first half-decade. Discipline is the asset.

C.5Sales motion

Year 1 (founder-led, Epic-style)

First 5–10 customers: founder closes 100%. Mostly Houston / Texas mid-size operators via Wildcatters community, Houston energy network, and friend-of-friend warm intros. Treat each customer like a future Mayo Clinic reference — depth, not speed.
Conference presence: NAPE (Houston, Feb), CERAWeek (Houston, March), Fuze (Houston, Oct), AAPL Houston meetings. Speaking slots wherever offered. The RBDS-to-AI architecture story is a real talk title.
Inbound content: LinkedIn writing on (a) on-prem vs. cloud trade-offs for operator IP, (b) FDE economics, (c) RBDS-era prior art applied to AI sovereignty. ~2 substantive posts/week. Goal: become recognizable to operator CIOs and Wildcatters community by month 9.
Outbound: Warm intros via Digital Wildcatters, OGAC, Houston energy meetups, BETA-channel referrals.

Year 2 (channels open)

BETA channel motion: co-sell, joint case studies, white-labeled materials. Hanks's name on the joint pitch deck. Quarterly business reviews.
Gov procurement track: separate sales cycle. Founder-led but with prime contractor partner. SBIR application as a non-dilutive bridge.
Direct enterprise: sales lead hired Month 15+ owns supermajor and T2 conversations. Founder remains the executive sponsor on top accounts.
Reseller program: soft launch Month 18 with 3–5 hand-picked reseller partners (one of which is BETA).

Year 3+ (the Epic UGM equivalent)

Annual Lean Informatics Summit in Houston. Operator IT directors, compliance leads, FDE alumni. Single track. Invitation-only first 2 years. This becomes the cultural moat — the "Epic UGM" for upstream.
Customer Advisory Board: 6–8 anchor customers meeting quarterly. Influence the roadmap. Loyalty through inclusion.

C.6Product roadmap (24 months)

Months	Build	Why now
1–3	Track 1 v1 — RRC W-10 / G-10 automation	The wedge. Public data, fast deploy, $5K pilots within 30 days.
3–6	Track 1 v2 — production reconciliation, JSAs, lease term extraction	Per-customer ARR expansion. Same customers, more workflows. Epic-style: own the day.
6–9	Track 1 v3 — workover reports, ESP RCA via OEM data integration	Engineering credibility. Moves LI from filing automation to operational decision support.
6–12	Track 2 v1 — M&A diligence appliance (BETA pilot)	Opens Path A. M&A diligence has the cleanest sovereignty case.
9–12	Petroleum-domain LLM fine-tune (Qwen 14B / Hermes 14B base)	RIGGS-equivalent. Target: 55–65% on SPE PE exam subset.
12–18	Track 2 v2 — completion design optimization for Permian operators	Highest customer-IP-sensitivity workflow. Pure on-prem play.
12–18	Track 4 v1 — agency emergency management cross-sell (WUI fire risk, county hazard modeling)	Opens Path B. Leverages founder's prior network.
18–24	Track 2 v3 — geosteering + reservoir analytics	Live-during-drilling latency play. Edge inference advantage.
18–24	Reseller program launch	Opens Path D after Track 2 stabilized.

C.7Financial model

Three scenarios, ARR run-rate

Month	Conservative bootstrap	Base + 1 leverage path	Stretch + 2–3 paths
6	$240K	$500K	$1.2M
12	$700K	$3M	$12M
18	$1.8M	$12M	$45M
24	$3.8M	$25–35M	$100–180M

Cost structure

Line	Year 1 ($)	Year 2 ($)	Notes
Founder draw	$100–150K	$180–250K	Houston cost of living, family-first scheduling
FDE / engineering hires	$200–400K	$700K–1.8M	1–2 in Y1, 3–7 in Y2 depending on scenario
Hardware (appliances on inventory)	$50–200K	$300K–1.2M	Tied to Track-2 customer pipeline; lease-back option
Cloud / API costs	$15–40K	$60–200K	Anthropic + Hetzner + observability
SOC 2 readiness + audit	$50–80K	$30–50K	Vanta + auditor. Type I Y1, Type II Y2.
E&O + cyber insurance	$15–25K	$30–50K	Material from day one
Legal, accounting, tooling	$30–50K	$60–100K	Texas LLC or Delaware C-corp depending on raise posture
Travel + conferences	$30–50K	$60–100K	Critical for customer-facing FDE motion
Year total burn	$490K–995K	$1.4–3.5M	Y2 scales with scenario

Break-even (monthly): Conservative ~Month 18–20. Base + 1 path ~Month 9–12. Stretch ~Month 4–6.

Cash on hand requirement: ~$200–400K personal / savings to cover first 6–9 months before revenue covers burn (conservative case).

C.8Risk register — plan-specific

Risk	Severity	Mitigation
Gov procurement timing > 18 months	high	Start SBIR/STTR pipeline Month 1. Partner with GSA-schedule prime by Month 6.
BETA builds internally instead of partnering	medium-high	Lock channel agreement with exclusivity clauses. Bring working demo to first conversation.
Anchor enterprise deal slips or kills momentum	high	Never depend on single anchor. Diversify by Month 12.
Founder burnout	high	Cap weekly hours, mandatory PTO, family-first scheduling. Second FDE hire is for resilience, not just capacity.
Hallucinated filing causes regulatory issue	existential	Human-in-loop on every submission. E&O insurance Day 1. No auto-submit ever.
Cyber incident on customer appliance	existential	Insurance, audit posture, hot spare, customer SIEM integration. Tabletop exercise quarterly.
Pressure to SaaS-ify / commoditize	high — Epic-relevant	Hold the line on the appliance / FDE / sovereignty model. Don't follow Enverus into cloud-only.
Enverus drops sub-$1K tier	medium	Don't compete on price. Compete on sovereignty + FDE motion.
Hardware supply chain disruption	medium	Maintain 3–6 month BOM inventory. Dual-source GPUs (L40S + H100 paths).
Channel conflict (reseller vs. direct)	medium	Strict territory and account rules in channel agreements.
Acquisition offer mid-arc	good problem	The Epic answer is "no." Decline unless valuation reflects 10-year lock-in moat.

C.9Trigger conditions — when to raise, slow, or pivot

When to raise capital (against the Epic instinct)

$1–3M angel — if Path A or B requires capital to capture a closing window (a BETA exclusive deal, a specific gov RFP).
$3–5M seed — if anchor enterprise deal requires SOC 2 Type II completion within 6 months, or if FDE hiring needs to outpace customer growth to win an account.
$10–15M Series A — if two leverage paths are firing simultaneously and the constraint is execution capacity. Target investors: Mercury Fund (Collide's lead, ironically the most relevant), Energy Innovation Capital, S2G Ventures, EIV Capital, plus a strategic from the Allen Gilmer / OGAC network.
Default position: don't raise. Faulkner didn't. Bootstrap discipline pays decade dividends.

When to slow down (hold hiring, focus on retention)

Track 1 customer count < 4 by Month 9.
Track 2 first appliance customer not deployed by Month 12.
Burn-to-ARR ratio above 3.0.
Customer churn above 10% annualized in the first 12 customers.

When to pivot or revise the goal

BETA pilot fails or BETA passes: rebuild distribution via direct + Wildcatters and reset target to $15–25M ARR in 36 months.
Gov procurement track produces no LOI / contract pipeline by Month 18: deprioritize Track 4, focus on commercial.
Sustained burn-to-ARR > 5.0 by Month 18: open acquihire conversation (Enverus, Quorum, Collide, or BETA itself) only as a last resort — against the Epic principle.
Hallucinated filing or appliance incident: do not pivot — address it as an incident-response operation. Trust loss is harder to recover than revenue.

Leading indicators to monitor monthly

Indicator	Healthy	Warning	Stop
Customer count growth	+1/mo by M6, +2/mo by M12	<0.5/mo for 2 months	0 net adds for 90 days
Cash runway	>9 months	6–9 months	<4 months
BETA pilot progress	NDA→SOW→deploy on track	Slippage >30 days	BETA stops returning calls
Gov pipeline	1+ active RFI/RFP	0 active, 1 in conversation	0 conversations for 90 days
Net Revenue Retention	>110%	95–110%	<95%
Hallucination / incident rate	0	Any reportable incident	2nd reportable incident in 90 days

C.10Honest verdict on the target & the long game

$10–15M/month total revenue in 24 months from a solo bootstrap start is at the right tail of the distribution of what has actually happened in B2B vertical AI plays. Not impossible — Glean, Hebbia, and a handful of vertical AI plays have hit comparable numbers — but those almost universally involved either significant funding, an unusual viral channel, or a category-creating product positioning.

The wind at our back, in plain terms. Two structural tailwinds compound on each other. First, the foundation-model leveling event (§04) compresses 25 years of institutional industry knowledge into the weights of every frontier model, killing the "you need a roughneck to compete" defense. Second, security and infrastructure standards (SOC 2 Type II, ISO 27001, KMS, audit logging) are industry-commoditized in 2026 — the perimeter is no longer a moat, the hyperscalers and on-prem stacks meet the same bar. Both of those moats are gone for the incumbent. What remains as defensible is the FDE services relationship and the workflow ownership it produces — which is exactly what Lean Informatics is built around. Three private-equity flips have made the incumbent (Enverus) pricing-opaque and organizationally slow at exactly the moment those two moats evaporated. The newcomer (Collide) has proven a single founder team can raise, build, and sell in this category in twelve months. Old-buddy networks, favoritism in vendor selection, and tribal industry knowledge cannot stop a services-led delivery that already speaks the language fluently, meets the customer where the customer wants to host, deploys in 30 days, and costs one-tenth of what the incumbent is charging. That is not a slogan. That is the structural read.

Realistic Y2 endpoint without leverage paths firing: $3–5M ARR ($300K–420K/month). That is 24–36× below the stated $10–15M/month target. To hit the target, at least two of (A) BETA channel + (B) gov contract + (C) anchor enterprise + (D) reseller program must fire in 24 months.

The 12-month checkpoint

If by month 12 we have:

5+ Track-1 customers ($500K+ ARR)
1 Track-2 customer live
BETA pilot signed
A gov contract in the procurement pipeline (RFI or RFP stage)

...then $30–80M ARR by month 24 is on the table. Consider raising at this point if execution capacity is the bottleneck.

If by month 12 we have:

2–3 Track-1 customers
No Track-2 customer
BETA passed or stalled
No gov pipeline

...then $5–10M ARR by month 24 is the realistic ceiling. The right move is to revise the goal to $25M ARR in 36 months, or raise capital to accelerate — but only if the leverage paths require it.

The Epic Systems lens on the verdict

Through the Epic lens, the 24-month target is the wrong unit of measurement. Faulkner's 5-year revenue was probably in the low millions. By year 10 Epic had maybe ~$50M. By year 20, several hundred million. The compounding kicked in once the institutional anchors and the workflow lock-in were established. Lean Informatics should optimize for the same shape: deep customers, lock-in workflows, geographic concentration, refused easy money, refused easy SaaS-ification, refused acquisition before the moat is built.

Under this lens:

24-month goal: 1–2 anchor customers + BETA partnership locked + first gov contract in pipeline + 15–20 Track-1 customers. Revenue $10–30M ARR. Foundation set.
60-month goal: 8–15 anchor customers, BETA channel running, 3–5 gov contracts, 100+ Track-1 customers, reseller program live. Revenue $80–200M ARR. Recognized as the sovereignty-first vertical AI vendor.
10-year goal: The Epic of upstream. Default vendor at most US E&P CIOs. $500M–1B+ ARR. Still private. Still bootstrap-ratio-disciplined. Still in Houston.

commit

What we commit to: build to optionality, optimize for depth. Ship Track 1 fast. Land BETA conversation by Month 4. Open SBIR pipeline by Month 6. First Track-2 customer by Month 12. Revisit the $10–15M/mo target at Month 12 with hard data. The Epic Systems posture is the long-term frame: own the sector, not the quarter.

17Methodology & epistemic posture

This report was produced through a structured first-principles analysis on May 21, 2026, by an analyst working in the Knuth–Ousterhout–Karpathy mode: rigor, complexity reduction, verifiability.

Primary sources (Collide.io, Enverus, RRC, McLelland's X presence) were preferred over secondary commentary where they conflict.
Quantitative claims were spot-checked against multiple sources where possible. Where a single source is cited, treat the number as indicative.
The "GPT-5.1 scored 4%" benchmark number is flagged as suspect because it deviates implausibly from public model performance; the surrounding numbers (Grok 4, Sonnet 4.5, RIGGS) are internally consistent.
The solo-feasibility verdicts are based on the 2026 tooling landscape (MLX 0.31+, Hermes 4, Qwen 3.5, Claude Opus/Sonnet 4.6). They will look different in 2027.
"Solo achievable" never means "trivial." It means: one disciplined operator with the listed toolstack can reach equivalent customer outcomes for a narrow workflow, within the timeframes given.
This is not legal, financial, or regulatory advice. RRC filing is a regulated activity. Get a Texas-licensed compliance professional before automating submissions.