Lean Informatics — sector strategy & business plan
Vertical AI for upstream oil & gas, built FDE-first. A $400B US industry still runs most of its daily workflow surface on PDFs, spreadsheets, and tribal knowledge — and in 2026, foundation models can finally read those PDFs, speak the vocabulary, and ship audit-grade outputs. Lean Informatics is the services company that meets operators where they want their data to live and owns the workflows the incumbents don’t. Sections 1–15 plus Addenda A–B are the sector research that backs the plan; Section C is the plan itself. Audience: founder, analyst, advisor.
Why this space, why this time
Texas first — the maximum addressable market right outside our window
Before talking about anything bigger, look at what’s addressable inside Texas alone:
The workflow gap, in plain terms
Upstream oil & gas is one of the largest US industries still running day-to-day work on PDFs, spreadsheets, vendor-specific schemas, and tribal knowledge. The sector generates roughly $400B in annual US revenue, employs ~600K people directly, reports to ~30 state agencies plus a half-dozen federal bodies, and pulls information from a long tail of land services firms, completion contractors, midstream counterparties, and trading desks. Almost every operator workflow — from the monthly RRC filing, to an AFE evaluation, to a JOA negotiation, to a completion-design review — still depends on people in chairs reading PDFs and re-keying values into different systems. That gap is the entire opportunity.
Why now
Three forces converged in 2025–2026 to open the window:
- Foundation models crossed the operator-fluency line. Frontier models (Claude Opus 4.6, GPT-5.1, Grok 4, Hermes 4 405B, Llama 4 405B) have ingested the public petroleum corpus — SPE OnePetro’s 300K+ papers, AAPG/SEG journals, state commission filings, courthouse title records, Schlumberger/Halliburton manuals, the Craft-Hawkins/Slider/Lake/Economides textbook canon. Out of the box they score 50–70% on the SPE certification subset and converse with field hands using the right vocabulary. The "you need a roughneck on staff to compete" barrier is empirically dead. §04 develops this in depth.
- Security and infrastructure standards have commoditized. SOC 2 Type II, ISO 27001, KMS-managed encryption, audit-grade logging, and FIPS-validated crypto are commodity table stakes in 2026. Whatever substrate the operator chooses — on-prem appliance, customer colo, AWS, Azure, GCP, or hybrid — the security perimeter is portable. The "we need our own datacenter" or "we need a private cloud" objection is now a preference question, not a technical one.
- The labor cycle is forcing the issue. The petroleum engineering workforce is aging out. The COVID-era exodus cost a generation of mid-career operators. Operators are running the same workflows with smaller teams against rising regulatory complexity. AI is the only path to absorb throughput demand without headcount the operators don’t want to add.
What AI brings operators, concretely
A short list of upstream workflows where AI is shipping real economic value today — not in marketing decks:
Every one of these is an operator-facing workflow. Not "a platform." Not "data analytics." Specific work that engineers do today, that AI can do faster, cheaper, and with audit-grade traceability. The rest of this document is what Lean Informatics is doing about it.
The role at the center — FDE, the blue-collar knowledge worker
One term recurs throughout this document and deserves a definition up front: FDE — Forward-Deployed Engineer. Origin: Palantir, mid-2000s. Adopted as a model by OpenAI’s DeployCo, Anthropic’s enterprise practice, Google, EY, and a long list of vertical-AI startups in 2024–2026. Job postings up ~800% year over year, average compensation ~$238K, hiring still outrunning supply.
An FDE is not a salesperson, not a consultant, not a customer-success rep, and not an architect on slides. The FDE is the engineer who shows up at the customer site, reads the actual workflows, writes the actual integrations, configures the actual system, owns the actual outcomes, and stays through the deployment until the customer’s operating problem is solved. The FDE’s work is the product. Headquarters ships the platform; the FDE ships the customer.
Why this role matters right now
Three structural shifts converge on the FDE in 2026:
- White-collar knowledge work is being commoditized by AI. The MBA-grade analyst who reads documents and produces synthesis is no longer the scarce resource — the foundation model does that in seconds for fractions of a cent. What remains scarce is the human who can sit with a customer, understand a workflow in its real operating context, and ship the implementation that actually works under the customer’s regulatory, political, and infrastructural constraints. The analyst layer compresses; the FDE layer becomes the durable layer of the org chart.
- An organic brain drain is in progress in upstream. The petroleum engineering workforce is aging out. The COVID-era exodus cost a generation of mid-career operators. The junior pipeline thinned through the 2014–2020 down cycles. Tribal knowledge that lived in those people is walking out the door faster than it can be transferred. Customers do not have spare engineering capacity to absorb a complex SaaS implementation. The FDE is the substitute for the institutional engineer the customer used to have on staff.
- Service implementation is taking over from software shipping. Palantir built a public ~$80B+ market cap on this model. OpenAI’s DeployCo, Anthropic’s enterprise JV, EY’s AI delivery practice, and every serious vertical-AI startup in 2026 is running an FDE motion because the product alone does not produce the customer outcome — only the product plus the implementation engineer does. The market is rerating “software vendor” toward “services-led product company,” and the FDE is the unit of that.
The blue-collar knowledge worker
The cleanest framing of what an FDE actually is, in 2026, is the blue-collar knowledge worker: hands-on, on-site, technical, outcome-accountable, with the operating discipline of a journeyman tradesperson applied to knowledge work. Not the analyst in the conference room. Not the architect on the slide deck. Not the consultant with the recommendations memo. The engineer who shows up with the toolkit, reads the actual job, fixes what’s broken, ships what works, and owns the result when it ships.
That shape of work used to be a lineman, a millwright, a field instrument technician. In an industrial economy, those were the people who walked into the plant, kept the machinery running, and answered the 2am phone call. In a 2026 economy where AI is eating the analyst layer of white-collar work, that same shape of work — hands-on, on-site, outcome-accountable, paid for what gets shipped not what gets recommended — is reasserting itself one altitude up the stack. It is now the FDE who walks into the operator’s office, reads the actual filing workflow, ships the actual integration, and answers the 2am call when the RRC submission breaks. The trade is the same. The toolkit changed.
The exciting opportunity
Upstream oil & gas is one of the last under-digitized industries of its size. Texas alone produces 5.7M barrels of crude per day, files ~2.8M monthly well reports with the RRC, runs through ~9,000 active operators, and most of that workflow surface still moves on PDFs and spreadsheets. The window is open right now because three forces converged in 2025–2026: foundation models speak petroleum fluently out of the box, security and infrastructure standards have commoditized into table stakes, and the petroleum engineering workforce is aging out faster than it can be replaced. That is the opportunity. An FDE-led services company can ship audit-grade workflows to operators where they want their data to live, undercut the cloud-locked incumbent on price and deploy time, and compound the customer relationship into a multi-decade moat.
Lean Informatics is built for that window. The competitive ladder, in one line: Enverus is the old guard — cloud-locked and PE-flipped three times in seven years; Collide is the newcomer that proved the gate is open; Lean Informatics is the new approach — services-first, infrastructure-agnostic, foundation-model leverage compounding monthly. The operating discipline we lean on is the one Epic Systems built in healthcare over 45 bootstrap years — discipline, not topology (full primer in §01, full crosswalk in §C.0). The 24-month $10–15M/month target is a milestone on a multi-decade arc, not the endgame.
The sector research below (Sections 1–15 + Addenda A–B) documents the landscape we’re entering. Section C is the operating plan.
01Executive summary
Lean Informatics is the FDE-led vertical AI services company for upstream oil & gas. We meet operators where they want their data to live (on-prem appliance, customer-owned colo, customer hyperscaler tenancy, or hybrid) and own the workflows the incumbents don’t. Data location is a customer dial, not our differentiator. Security and infrastructure standards (SOC 2 Type II, ISO 27001, KMS, audit logging, FIPS-validated crypto) are commodity table stakes — we meet them everywhere we deploy. The moat is the FDE relationship, the workflow ownership, and the foundation-model fluency we bring to the customer’s problem. Houston is the starting wedge. The serviceable surface is national upstream, then international. The distribution moat is the existing infrastructure of survey and land services companies whose embedded delivery model already maps onto our FDE motion. The operating discipline we lean on is the Epic Systems pattern from healthcare — primer in the callout below, full crosswalk in §C.0.
The competitive ladder — old guard, newcomer, new approach
Lean Informatics in one screen
- The Epic Systems posture is the operating frame — the disciplines, not the deployment topology. Bootstrap as long as possible. FDEs before salespeople. Long sales cycles as the moat. Annual user summit by Y3. Refuse easy SaaS-ification, easy money, and early acquisition. Houston as the geographic concentration. Epic itself moved to a hybrid cloud posture (Hyperdrive on Azure / AVD / Azure Large Instances) without abandoning the disciplines — the disciplines are what we copy. 10-year sector dominance is the goal; the 24-month $10–15M/mo target is a milestone on that arc.
- The competitive ladder has a top rung that's structurally slow. Enverus is the old guard — great data, $500M+ ARR, 8,000 customers, 25 years of moat-building — but three PE flips in seven years (most recently Blackstone, Aug 2025), employees publicly describing "large project management bureaucracy" and "not really 'agile' in any sense," and the Enverus ONE platform launch itself framed as defensive against latecomers. That's a target, not a fortress. §03 develops the gap.
- The market exists at meaningful scale and is structurally large. $15–30B/yr maximum addressable across upstream AI, land services, compliance automation, and adjacent agency emergency tech. Empirically proven by competitors, customers, and investor capital flowing in — §05 catalogs the Collide evidence.
- Foundation models have leveled industry expertise. The vernacular, definitions, metrics, and institutional IP that used to gate vertical-AI entry are now compressed into a $20/month API call or a free open-weight checkpoint. What still gates entry is distribution, sovereignty, and FDE reliability under regulator scrutiny — not knowledge of the sector. §04.
- Distribution wins, not features. The land/survey channel — 150–250 US firms with embedded customer relationships — is the unfair distribution path no cloud-first competitor can reproduce. Lean Informatics becomes the AI engine inside their existing service motion.
- Two-track architecture compounds. Track 1 (cloud RRC compliance, $1.5–4.5K/mo) funds the company while Track 2 (on-prem appliance, $5.5–13.5K/mo) builds the defensibility. Tracks 3 (FDE services) and 4 (agency cross-sell) extend reach as scale permits.
- Founder profile is industry-adjacent with sidecar exposure. Jonathan deployed FM RBDS mass-notification systems to DoD, state HS offices, counties, sheriffs, and fire-warning networks — many of them in O&G-dense Texas jurisdictions. He saw upstream from the customer's worst-day angle: incident response, blowout coordination, spill comms, well-pad evacuations. Not a roughneck — the engineer who showed up when the roughneck's day went sideways. Knows enough to be dangerous, not enough to be captive to industry orthodoxy.
- Infrastructure is not the moat. The services relationship is. Security and infrastructure standards are industry-commoditized: SOC 2 Type II, ISO 27001, KMS-managed encryption, audit-grade logging, FIPS crypto, signed DPAs. We meet those standards everywhere we deploy. Data location becomes a customer preference dial — on-prem appliance for sovereignty-sensitive workloads, customer-owned colo for the middle path, hyperscaler of choice for the cost-optimized path. The on-prem reference architecture in Addendum B is one option, not our identity. The compounding moat is the FDE relationship, the workflow ownership, and the customer outcomes we own quarter over quarter. That is what makes Lean Informatics a services company first and a software vendor second — and why it's more competitive in 2026 than the cloud-or-bust playbook the incumbent and the newcomer both run.
02Core thesis
Lean Informatics' strategic question: if foundation models do most of the cognitive lifting, if open weights are free, if MLX runs them on a laptop, and if Claude Code can write the glue — what's left to charge for in 2026?
Three things, in order of defensibility:
- Embedded trust — the FDE services relationship. Whose engineers sit in the customer's morning meeting? Whose phone does the VP Operations call when filings break at 11pm on a Thursday? Whose post-incident review does the regulator read first? This is the compounding moat. It cannot be cloned by feature work and it cannot be acquired in a $4.25B PE transaction.
- Workflow-specific data accumulation, wherever the customer wants it. Every filing the system completes, every JSA it generates, every well-failure post-mortem it reads compounds into pattern recognition no foundation model has. The accumulation happens on the customer’s chosen substrate — on-prem appliance, customer-owned colo, hyperscaler of choice, or hybrid. The location of the data is the customer’s preference; the workflow ownership is ours.
- Regulator-grade audit posture as a portable standard. SOC 2 Type II, ISO 27001, signed DPAs, named-human signoff workflows, FIPS-validated crypto, agency procurement readiness. The boring stuff that takes 12–18 months of paperwork. The same standards travel with us across every deployment topology, because in 2026 security and infrastructure standards are industry-commoditized table stakes, not competitive differentiators. The founder's DoD/HS background means this isn't learned from scratch — it's transferred discipline.
Lean Informatics is structured to own all three from day one. #1 is the FDE motion. #2 is the workflow ownership pattern with location-flexible deployment (Addendum B documents the on-prem reference architecture as one option; the same workflow runs on the customer’s hyperscaler if that’s what they prefer). #3 is sequenced into the plan (SOC 2 Type I by Y1, Type II by Y2, agency-prime partnerships in parallel). This is what an Epic-style competitor looks like in vertical AI — and notice that even Epic itself has moved to a hybrid cloud posture without abandoning the operating disciplines that built the moat.
One more thing the thesis depends on, taken up in §04 in detail: foundation models have leveled the industry-expertise barrier, and security/infrastructure standards have commoditized the perimeter. A century of public, academic, and institutional O&G knowledge is in the weights of every frontier model. SOC 2 / ISO 27001 / KMS / audit logging are table-stakes everywhere. The "you need a roughneck on staff and a private datacenter" defense is empirically dead on both axes. What's left to charge for is what the models and the infrastructure cannot supply on their own — trust, distribution, FDE reliability, and ownership of the customer's workflow outcomes. That's exactly what Lean Informatics is built around.
03Enverus — the old guard
Enverus is the structural incumbent of upstream data and analytics. Founded 1999 as DrillingInfo by Allen Gilmer, rebranded to Enverus in 2019 after multiple acquisitions, $500M+ ARR, 8,000+ customers across 50 countries, 2.7 PB of data, 350M+ courthouse records, $500B+ in annual energy transactions through its platform. Enverus is also where the wedge opens. Three private-equity flips in seven years, public employee commentary about bureaucracy and slow product cycles, two-star customer reviews on the public review sites that exist, and a product launch posture (Enverus ONE, April 7, 2026) that reads as defensive rather than confident. The data moat is real. The execution-organization is not what it was when DrillingInfo was the scrappy challenger.
Where Enverus is structurally exposed
- PE-driven price compression on customers. Public review-site evidence: "the subscription is not worth the price and they hide the price of the annual subscription till they send you the invoice"; "constant price increases and no added value for my uses"; 90-day cancellation policies buried until the customer tries to leave; threats of "action" for non-payment after attempted cancellation. The pattern is consistent with three PE sponsors in seven years compounding subscription revenue at the customer's expense. This is the unhappy customer pool.
- Internal bureaucracy at PE-mature scale. Public Glassdoor commentary describes "large project management bureaucracy for a company of their size", "not really 'agile' in any sense of the word", "too much hierarchy in their management structure", "frequent internal reorganizations", and senior management protecting "pet projects" that should have been killed. A 598-review aggregate of 4.1/5 doesn't change the fact that the engineering organization is not a fast-moving target.
- No deployment-topology choice for the customer. Enverus ONE is cloud-only on their tenancy. The launch language — "proprietary customer data remains isolated within a private tenancy" — is the strongest sovereignty pitch they can credibly make, but it's still their cloud. The customer ships data out and has no say in the substrate. For operators who want on-prem for completion designs, M&A diligence, AFE pricing models, or JOA-sensitive negotiation data — or for operators who simply prefer to run on their existing Microsoft/AWS contract and consumption-discount stack — Enverus has one answer. Lean Informatics meets the customer where the customer wants to be. Even Epic itself moved to a hybrid cloud posture in healthcare (Hyperdrive on Azure / Azure Large Instances) once customers asked for the option. Enverus has not extended that courtesy to upstream.
- Acquisition-driven product sprawl. Spatial Business Systems (April 2026, utility design/engineering), Tracts.co partnership (April 2026, title automation), Xpansiv partnership expansion (May 2026, price discovery), plus legacy PRISM, MarketView, Foundations, Sphere, and now ONE. Each bolt-on carries integration debt and uneven UX. Customers reach for fewer tools, not more.
- The Enverus ONE pitch contains its own admission. CEO Manuj Nikhanj framed the April 7 launch with the line "the gap between the companies that move now and the companies that wait is going to be significant and it is going to compound", and CPO Jimmy Fortuna pitched ONE as "the only AI platform that can" reason with O&G operating context. That language is defensive against the very thing Lean Informatics' thesis predicts: foundation models compressing the 25-year data moat into a deployable product. When the incumbent's CEO has to publicly insist on the gap, the gap is shrinking.
- The long tail of mid-size operators is undersold. Enverus's center of gravity is enterprise: supermajors, capital markets, large independents, midstream majors. The 5–200-well operator who needs RRC compliance and AFE evaluation but can't justify a $50K+/yr Enverus contract is structurally underserved. That is exactly Track 1's wedge customer.
04The foundation-model leveling event
The most important fact in this entire document, and the one that makes Lean Informatics possible at all, is this: frontier foundation models trained between 2023 and 2026 absorbed roughly a century of public, academic, and institutional oil & gas knowledge into their weights. The "you can't build vertical AI for upstream without 25 years of proprietary data and a roster of petroleum engineers" defense, repeated by every incumbent for the last decade, is no longer empirically true. The vernacular, the definitions, the metrics, the workflows, and the institutional IP that used to gate entry are now a $20/month API call or a free open-weight checkpoint.
What's in the weights
By 2026, the major closed-weight frontiers (Claude Opus 4.6, GPT-5.1, Grok 4) and the major open-weight families (Hermes 4 405B, Llama 4 405B, Qwen 3.5) have ingested, in some combination:
- SPE OnePetro corpus — ~300,000 peer-reviewed petroleum engineering papers and conference proceedings.
- AAPG and SEG journals — the geology and geophysics literature stretching back to the 1920s.
- DOE, USGS, EIA technical reports — reservoir studies, basin assessments, methodology papers, regulatory technical bases.
- State commission filings — Texas RRC, Oklahoma OCC, New Mexico OCD, North Dakota Industrial Commission, Louisiana Office of Conservation. Decades of public W-10/G-10/H-10 equivalents, drilling permits, completion reports, plugging records.
- Courthouse public records — title abstracts, lease assignments, ROW grants, unit designations, pooling orders. Much of this is web-indexable.
- Industry textbooks and field manuals — Schlumberger Oilfield Glossary, Halliburton/Baker Hughes operations manuals (where public), classic Craft-Hawkins, Slider, Lake, Ahmed, Economides petroleum engineering texts.
- Trade press and conference proceedings — Hart Energy, JPT, World Oil, E&P Monthly, ARC Group, Wood Mackenzie public reports, IHS Markit precursors.
- Academic petroleum engineering coursework — MIT OpenCourseWare, Stanford SCRF, Texas A&M, UT Austin, Colorado School of Mines, Tulsa graduate-level course materials where publicly posted.
- Expert witness transcripts and litigation discovery — PACER and state court systems carry decades of technical expert depositions on well control, fluid mechanics, completion failure, royalty disputes.
- YouTube engineering channels — everything from Practical Engineering and Real Engineering down to ChevronTexaco operator training videos and Oilfield Joe explainers. Petroleum vocabulary is in the audio transcripts.
The result, validated repeatedly in 2025-2026 benchmarks: frontier models hold their own on petroleum engineering coursework, score 50–70% on the SPE certification subset out of the box (Claude Sonnet 4.5: 52.5%; Grok 4: 62.5%; Collide's RIGGS at 67.5% is a 5-point fine-tune lift on top of a model that already knew the material), and can fluently produce AFE narratives, JOA boilerplate, geosteering interpretations, completion-design rationale, and regulator-grade filing language. The domain-tuned embeddings (PetroVec-style) that incumbents marketed as defensible IP can be reproduced by a competent ML engineer in a weekend.
What's been leveled
What still gates entry
- Customer-specific proprietary data — which is exactly what stays on the customer's premises. The foundation model can speak the language fluently; it cannot tell you what's in this operator's 2024 completion report or AFE history. That gap is the buying trigger for the on-prem appliance.
- Distribution — relationships, channel, trust. The 150–250 US land services firms have multi-decade trust with mid-size operators that no foundation model can manufacture in 20 quarters. That is why Lean Informatics goes through that channel, not around it.
- FDE reliability under regulator scrutiny. SOC 2, signed DPAs, auditable signoff chains, named human accountability, agency procurement readiness. Foundation models don't sign DPAs. People do.
- Workflow ownership and process design. Knowing what the AFE evaluation should look like is one thing. Designing the workflow so a 50-well operator can run it in 30 days without firing their landman is another. The model helps; the human still owns the process.
Why this is good news for Lean Informatics specifically
If industry expertise had still been the moat, an industry-adjacent founder like Jonathan would be at a structural disadvantage. The foundation-model leveling event turns that on its head. The wedge becomes:
- Cross-vertical operating discipline — FM RBDS, DoD, state HS, county, sheriff, fire-warning deployment experience — transfers into upstream's regulator-scrutiny posture better than 20 years of petroleum-only career.
- Industry-adjacent sidecar exposure — deploying safety systems to Texas counties, sheriffs, and fire-warning networks meant standing next to O&G operations during their worst days: blowouts, spills, well-pad incidents, evacuations. Knows enough to be dangerous; not enough to be captive to industry orthodoxy.
- Sovereignty-first architecture from RBDS prior art — the opacity layer, addressed payloads, vertical-stack ownership pattern — maps directly onto on-prem AI delivery. The incumbents grew up cloud-native; they cannot retrofit this.
- Foundation-model fluency as the operator's translator. The founder doesn't have to be a 20-year reservoir engineer; the model is. The founder has to be the operator-engineer who knows how to ship the system to the customer's server room and keep it running.
The cross-vertical thesis (Addendum B.9) catalogues the eight SaaS winners who succeeded as outsiders in their target industries (Veeva, Toast, Procore, Carta, Stripe, Snowflake, Datadog, Persefoni). Every one of them won by treating institutional jargon and tribal knowledge as a layer to be learned, not a wall to be respected. Foundation models have flattened that layer further. The McLelland framing on X that "outsiders can't compete in O&G" is — in 2026 — self-serving marketing for a previous era.
05Market evidence — Collide as proof of fit
Collide sits one rung below Enverus on the competitive ladder. If §03 is the old guard, Collide is the newcomer whose mere existence and trajectory proves the gate has opened: a single founder team, a $5M Mercury Fund seed (April 2025), a public FDE motion, a domain-tuned LLM, and credible reference customers (Winn Resources, ConocoPhillips-affiliated deployment chatter) inside 12 months. The relevant signal is not "Collide will win the category" — it is that the category is now fundable, sellable, and executable in 2026 by teams that didn't exist in 2024. Lean Informatics' positioning, distribution, architecture, and sequencing are deliberately different (see Addendum B and Section C). The takeaway: the newcomer has proved the wedge, and the foundation-model leveling event (§04) means the wedge is wider than Collide's own positioning admits.
From the public stack diagram, Collide is a three-layer architecture flanked by Forward-Deployed Engineers (FDEs) who handle deployment, configuration, and change management. The "proprietary" labels in the diagram mark what they consider defensible.
Document classification → Domain pipelines → Petroleum embeddings → Security
Reads drilling reports, well logs, completion procedures, scanned land leases, SCADA exports, third-party plant statements. Their "Petroleum Embeddings Model" is marketed at +34% accuracy vs. OpenAI on petroleum terminology — a domain-tuned contrastive embedding (sentence-transformers or proprietary). proprietary proprietary proprietary off-the-shelf security
Agentic orchestration → RIGGS LLM → Knowledge base → Basket of LLMs
RIGGS is the petroleum-tuned LLM, trained on "Spindletop hardware" (their internal training rig). 67.5% on SPE exam subset. Validation/reasoning layer wraps it. Agentic orchestration handles regulatory, production, well-failure flows. A "basket of LLMs" (GPT, Claude, others) is used for general reasoning when domain isn't needed. proprietary proprietary proprietary off-the-shelf LLMs
Automated workflows → GIS & mapping → Continuous improvement
The user-facing surface. Texas RRC G-10/W-10/H-10 filings, production reconciliation, lease term extraction, dynamic JSAs (job safety analyses) keyed to live weather, ESP failure root-cause. GIS layer lets users chat with maps. Continuous improvement = RLHF on FDE refinement and SME ranking. proprietary proprietary proprietary
06Claims vs. reality
Treat the marketing surface as untrusted input. Verify before you assume.
| Claim | Verdict | Honest read |
|---|---|---|
| RIGGS beats GPT-5.1 / Grok 4 / Claude Sonnet 4.5 on SPE exam | PARTIALLY TRUE | 67.5% > 62.5% (Grok) > 52.5% (Sonnet) is plausible. GPT-5.1 at 4% is implausible as a knowledge claim — almost certainly a refusal/format failure on a particular prompt mode. Cite cautiously. The subset is only 40 questions. |
| Petroleum Embeddings: +34% vs OpenAI | DIRECTIONALLY TRUE | Domain embeddings beat general embeddings on domain tasks — well-established in the literature (PetroVec, FinBERT, BioBERT pattern). The specific 34% number on what benchmark? Unstated. A solo can replicate the direction of this result on a weekend. |
| Texas RRC filing: 99.4% time reduction | CREDIBLE | Winn Resources case study (50 wells / 20 min vs hours). Forms are structured, the RRC publishes EDI specs, the failure mode is tedium not intelligence. Reproducible by a competent script. |
| "AI-native platform purpose-built for the oilfield" | MOSTLY MARKETING | The architecture is RAG + agents + fine-tuned model + workflow UI. The positioning is what's purpose-built — FDEs, founder pedigree, vocabulary. Not a fundamentally novel architecture. |
| "First GenAI platform for energy" | CONTESTED | Enverus, C3.ai, and others were there first at different sizes. "First" is a positioning claim, not a fact. Enverus ONE (Apr 2026) is the actual category-defining incumbent. |
| RIGGS as "the intelligence layer underneath every operator's workflow" | ASPIRATIONAL | Unproven at scale. Single named customer (Winn Resources) so far in public materials. Distribution is the open question, not the model. |
07Where the moat actually lives
Strip the tech and look at what's hard to copy:
Five moat candidates, scored
| Moat | Strength | Why it matters |
|---|---|---|
| Founder distribution — McLelland is ex-roughneck with 111K-view tweets, Chuck Yates is a known oilman. Digital Wildcatters has 8,000+ professional members in 122 countries. | UNREPLICATABLE | This is the actual moat. Cold outreach as a stranger to an upstream operator is a closed door. Coming in as McLelland is a phone call. |
| FDE service motion — embed engineers onsite, deliver custom config, learn the patterns, push back into product. | HARD | Palantir invented this. Real moat — if you have the product spine. A solo can do this for ~2 accounts max. |
| Workflow-specific data flywheel — every filing they handle, every well-failure pattern, makes the next one easier. | EMERGENT | Compounds only with multiple customers and clear permission to learn cross-customer. Not yet realized at $5M seed scale. |
| Regulator audit posture — SOC 2 Type II, signed BAAs, audit trails, named human signoff. | EXPENSIVE | ~$80–150K and 12–18 months for SOC 2 Type II. Doable for funded company, painful for solo. |
| RIGGS & petroleum embeddings — their proprietary model + embedding stack. | COMMODITIZABLE | Open weights + good corpus + MLX = ~80% of the gap closed in 60 days. Not a real moat; marketing-led defensibility. |
08Your toolstack, mapped
The stack you described — Claude, data ingestion, open models, MLX, Hermes, report structure, Houston — is unusually well-suited to this problem. Concretely, here's what it gives you:
09Layer-by-layer solo build
Mapping the Collide architecture to your toolstack, in order of build sequence.
Layer 01 · Ingestion & petroleum embeddings
Solo equivalent:
# corpus: SPE papers, OnePetro abstracts, RRC filing PDFs, lease templates,
# completion procedures, Daily Drilling Reports (scrub PII)
# ~50-200M tokens of petroleum-domain text is the sweet spot
# embedding stack
- BAAI/bge-large-en-v1.5 (start here)
- contrastive fine-tune on petroleum SME-labeled triplets (~3-5K pairs)
- evaluate on a holdout of well-name / formation / equipment queries
- target: +20-30% recall@10 over base on petroleum queries
# pipeline
unstructured.io → tika fallback for OCR-heavy PDFs
LlamaIndex or LangChain (whichever you tolerate)
Qdrant or pgvector (start with pgvector, scale later)
Verdict: SOLO ACHIEVABLE. Two to four weeks of focused work matches Collide's claim direction. The hard part is corpus curation, not the embedding training.
Layer 02 · RIGGS-equivalent domain model
Solo equivalent:
# base model choices (ranked)
1. Qwen 3.5 14B-Instruct — best fine-tune ROI, MLX-ready
2. Hermes 4 14B — steerable, no refusal tax
3. Llama 3.3 70B — heaviest, best ceiling, slower iter
# training data (this is the real work)
- SPE papers (you'll need licensing for some)
- Public RRC filings (millions of records)
- Drilling reports (anonymized via customer #1)
- Synthetic Q/A pairs generated from petroleum textbooks → graded by an SME
# infrastructure
- LoRA + MLX-LM on M5 Max for first 2-3 rounds
- Lambda Labs / RunPod for the final full SFT pass
- Eval against the SPE PE exam (you can buy the practice exams)
# realistic target
- 55-60% on the SPE PE exam in 60 days of focused work
- 65-70% in 6 months with domain SME feedback loop
Verdict: SOLO ACHIEVABLE WITH FOCUS. You probably can't match RIGGS's 67.5% in 60 days, but you can get within striking distance — and for the actual customer workflow (filling out a W-10), you don't need to. The eval ≠ the product.
Layer 03 · Agentic orchestration
Solo equivalent:
# the agent loop, kept boring on purpose
- Claude Agent SDK / LangGraph for the high-stakes flow
- Hermes 4 (local) for cheap intra-tool reasoning
- Tool registry: rrc_lookup, well_master_query, scada_pull, pdf_extract,
jsa_template_fill, ssa_compute, gis_proximity_check
- Guardrails: every output that mutates state has a "human sign here" step
# observability (do this from day 1, not day 100)
- Langfuse or Helicone for trace replay
- Every prompt versioned in git
- Every customer interaction snapshotted for the eval set
Verdict: SOLO ACHIEVABLE. This is exactly where Claude Code + the Agent SDK shines. One person can ship this faster than a team because there's no coordination tax.
Layer 04 · Outcome surface (the actual product)
Solo equivalent:
- Pick ONE workflow. Not all of Collide's. Just one. The W-10/G-10 filing is the canonical wedge — it has clear ROI ($300/well/yr in filing labor), public APIs, and Collide has already proven the demand.
- Build it as a CLI first, then a web app. The first customer doesn't need a polished UI — they need a working pipeline.
- Templated outputs with human signoff. The user reviews and approves each filing before submission. This is both compliance posture and trust-building.
- GIS only if customer asks. Leaflet + RRC shapefiles get you 80% of the way. Don't pre-build it.
Verdict: SOLO ACHIEVABLE IN 30 DAYS. The W-10/G-10 use case specifically.
Layer 05 · The FDE motion (you, in person)
Solo equivalent: You are the FDE. You drive to the customer's office in Midland or The Woodlands. You sit with their operations director. You watch them file W-10s by hand. You build the integration in front of them. This is how you win against Collide on a single account: you ship faster because you don't have to schedule a meeting with yourself.
Verdict: SOLO ACHIEVABLE, BUT IT CAPS YOU AT 2–3 ACCOUNTS. This is the fundamental scaling constraint.
10Solo vs. company verdict matrix
The honest gap, capability by capability.
| Capability | Collide (company) | You (solo) | Gap |
|---|---|---|---|
| Document classification + ingestion | Custom pipelines, scaled | unstructured.io + Claude + pgvector | close to zero |
| Petroleum embeddings | RIGGS embeddings, claimed +34% | bge fine-tune on petroleum corpus | closable in weeks |
| Petroleum LLM (RIGGS) | 67.5% SPE, trained on Spindletop | Qwen/Hermes LoRA, 55-65% SPE | closable in months |
| Agentic orchestration | Internal framework | Claude Agent SDK + LangGraph | solo may ship faster |
| GIS mapping over O&G data | Proprietary, chat-with-map | Leaflet + RRC shapefiles + Claude | slower polish |
| W-10/G-10 filing automation | Live with Winn Resources | 30-day build to first customer | parity achievable |
| JSA generation (weather-aware) | Live | Buildable in 2 weeks | close to zero |
| Dynamic well-failure RCA | Claimed, low public detail | Harder — needs OEM specs + sensor history | depends on data access |
| SOC 2 Type II posture | Implied, presumed in progress | ~$80–150K + 12–18 mo | structural disadvantage |
| Forward-Deployed Engineers (scaled) | Hiring multiple FDEs | You, one human, 2–3 accounts | caps your growth |
| Brand & community distribution | Digital Wildcatters, 8K+ members, 111K-view tweets | Build from zero | months/years to close |
| Capital cushion | $5M seed runway | Personal runway + first invoice | structural |
| Speed of single-customer iteration | Internal coordination cost | You ship same day | SOLO ADVANTAGE |
| Burn rate | ~$150–300K/mo blended | ~$8–15K/mo all-in | SOLO ADVANTAGE |
| Pricing flexibility | Enterprise floor (~$50K/yr+) | $2K–15K/mo, flexible | SOLO ADVANTAGE |
11What a solo can't do (honest)
This section exists so you don't fool yourself.
- You can't scale FDEs. Past 2–3 accounts, you're either turning customers away or becoming a consultancy. Plan the bottleneck.
- You can't sign with a supermajor. No procurement department will sign with a Texas LLC of one without a SOC 2 letter, named executives, and references. Aim middle: 5–50 well operators.
- You can't outrun a brand-distribution flywheel. McLelland's tweets do free pipeline generation. You'll need either a personal brand strategy or a partnership.
- You can't easily defend against open-source. Everything you build, someone could open-source 6 months later. The defense is customer entrenchment and workflow ownership, not novel IP.
- You can't compete on "platform" framing. Don't try. Compete on "outcome": a number on a contract, signed in dollars saved.
- You can't ignore Enverus. Enverus ONE (April 2026, with Astra model + SOC 2 Type II + 25 years of data + Continental/BPX/Chord partnerships) is the real giant. Position around them, not Collide.
12The Houston wedge
The right way to do this is to pick the narrowest defensible wedge and own it before anyone notices.
The wedge: Texas RRC compliance for mid-size operators
- Why this customer: Mid-size Texas operators (5–200 wells) file W-10, G-10, PR forms monthly. Most do it by hand or with Excel. The pain is real, recurring, and quantifiable.
- Why this workflow: Public forms, public data, public APIs (RRC EDI), low political risk if it breaks (filing is reviewed before submission). Easy to demo, easy to price.
- Why this geography: The RRC is in Austin. Texas operators are in Houston, Midland, Tyler, Fort Worth. You can be on a wellsite in 5 hours.
- Why this moment: Collide has proven the demand with Winn Resources. The market is now educated. You don't need to evangelize; you need to be cheaper, faster, and closer for operators below their floor.
Positioning against Collide
We're the RRC compliance pipeline for operators too small for Collide and too tired of spreadsheets to keep doing it by hand. White-glove, fixed-price, deployed in your office in two weeks. We don't sell a platform — we sell completed filings. — draft positioning statement
Three customer profiles to target
- Family-owned Permian operator, 10–40 wells. Owner runs the filings themselves. Hates it. Will pay $1500/mo to make it go away.
- Midstream gathering company, 50–150 wells. Has a controller doing this 3 days/month. Math is obvious.
- Mineral rights manager / landman service company. Filings for multiple clients. They'd resell your tool as part of their service.
1390 / 180 / 540 day plan
Days 1–30 · Build the wedge
- Set up the project structure: monorepo, pgvector, Claude Agent SDK, unstructured.io pipeline.
- Get 100 sample W-10 and G-10 filings from the public RRC archive.
- Build the end-to-end pipeline against your own well data (synthetic if needed): read SCADA exports → reconcile production → generate filing → human signoff → submit via EDI.
- Write the eval set: 25 wells, known correct answers, runs in 5 minutes.
- Buy domain. Build a 1-page landing page. Start writing on LinkedIn / X about RRC pain.
Days 31–90 · First customer, in person
- Get one paying customer at $1.5–5K/month. Houston-area, friend-of-friend, or community connection.
- Drive to their office. Sit with their ops person. Watch the manual process for half a day.
- Ship the integration in 2 weeks. Use the next 2 to harden against their actual edge cases.
- Document everything: a runbook, a one-pager, a case study with hours-saved math.
- Start the SOC 2 readiness conversation with Vanta or Drata. Don't start the audit yet.
Days 91–180 · Second customer + the eval moat
- Use case study #1 to land customer #2 and #3 (target: 3 customers at $36K–120K ARR by day 180).
- Start the petroleum embedding fine-tune in earnest — you now have real corpus from customer data (with permission).
- Begin the petroleum domain LLM fine-tune. Target: 55% on SPE practice exam.
- Apply to TX Railroad Commission as an authorized filer / EDI participant if not already.
- Hire a fractional ops person to handle customer onboarding so you stay on engineering.
Days 181–540 · Decide what you are
- Path A — Lifestyle consultancy. 5–10 customers, $400K–1.2M ARR, you run forever. No outside capital. Houston gold.
- Path B — Productize and raise. The same workflow, repackaged as self-serve. Raise a small angel round to hire 2 FDEs. Compete with Collide directly.
- Path C — Acquihire / partner. Sell to Collide, Enverus, or Quorum as a workflow module. Your code + your customers = their next-month roadmap.
- Decide which based on the market signal: how fast customers came, what they're asking for next.
14Costs, pricing, margins
Cost stack (monthly, year one)
| Line item | Monthly | Notes |
|---|---|---|
| Claude API (Sonnet + Opus mix) | $400–1500 | Scales with customer count and document volume |
| Embedding API / self-hosted | $50–200 | Mostly free after MLX local serve |
| Vector DB (Qdrant Cloud / pgvector on Hetzner) | $50–200 | Self-hosted dirt cheap |
| Observability (Langfuse, Sentry) | $100 | Start free tier |
| Cloud compute (1 small VPS, occasional GPU rent) | $200–500 | Hetzner + Lambda Labs on demand |
| Vanta / Drata (SOC 2 readiness) | $1000 | Start month 4 |
| LLC, accounting, insurance | $300 | E&O insurance becomes essential customer 2+ |
| You (founder draw) | $6000–10000 | Houston cost of living |
| Total | $8–14K/mo | vs. Collide's est. $150–300K/mo |
Pricing menu
| Tier | Price | Includes |
|---|---|---|
| Pilot (30 days) | $5,000 one-time | Onsite setup, one workflow, 50 filings |
| Operator | $1,500/mo | Up to 50 wells, monthly filings, JSAs, support SLA |
| Operator+ | $4,500/mo | Up to 200 wells, custom workflows, GIS, dedicated Slack |
| FDE engagement | $15K/mo flat | Half your time on one account, custom build |
Path to $400K ARR
5 Operator+ ($270K) + 4 Operator ($72K) + one $5K pilot/month ($60K). Achievable in 12–18 months from a standing start with disciplined sales discipline. Gross margin: ~85%. Net margin (incl. your salary): ~30–40%.
15Risk register
| Risk | Severity | Mitigation |
|---|---|---|
| Collide drops price to floor or open-sources commoditized layers | medium | Compete on white-glove + smaller account fit, not platform |
| Enverus ONE crushes the small-operator segment with a sub-$1K tier | medium | Move fastest in the small-operator gap and entrench before they do |
| RRC changes filing format / EDI spec | medium | Subscribe to RRC bulletins, build adapter pattern, charge for the migration |
| You burn out being a one-person FDE | high | Cap at 3 active customers, hire fractional ops by month 4 |
| Procurement rejects you for lack of SOC 2 | high | Start Vanta month 1, get Type I within 6 months |
| Hallucinated filing causes customer regulatory issue | existential | Human-in-loop on every submission, E&O insurance, no auto-submit ever |
| Foundation model price war pulls floor out | low | You benefit — lower inference cost. Open weights insulate. |
| Collide acqui-offers and you say yes | good problem | Negotiate from the position of running cash-flow positive |
16Sources
Every claim in this document traces back to one of the following. When in doubt, prefer the primary source over the secondary.
Collide.io primary
- Collide.io homepage and schema.org metadata — product positioning, FAQs, 99.4% filing time reduction claim.
- EnergyCapital: Collide rolls out RIGGS AI platform (May 2026) — RIGGS SPE benchmark numbers, Spindletop training rig, McLelland quotes.
- Collin McLelland (@FracSlap) on X — BoA hedge-fund presentation — 111K views, founder reach signal.
- Collide $5M seed announcement (Apr 2025) — Mercury Fund, Sheffield/Quinn/Albin participation.
- Houston InnovationMap on Collide seed.
- EnergyCapital: Mercury Fund leads Collide round.
Competitive landscape — Enverus (the old guard)
- Enverus ONE launch (April 7, 2026) — Astra model, SOC 2 Type II, Flows, "the gap between the companies that move now and the companies that wait is going to be significant" quote.
- Enverus ONE product page.
- Hellman & Friedman completes acquisition of Enverus (June 2021, $4.25B) — PE flip #2.
- Blackstone to acquire Enverus (Aug 2025, $6.5B) — PE flip #3 in seven years.
- PE Insights: H&F launches Enverus sale — deal mechanics and crossed-$500M-ARR threshold.
- Enverus on Software Advice — 2.0/5 rating, 1.0/5 value-for-money and customer-support, public review evidence of pricing opacity and 90-day cancellation surprise.
- Enverus on Slashdot — published $275/user/month starting price.
- Enverus on Glassdoor (598 reviews) — employee commentary on bureaucracy, hierarchy, reorganizations, non-agile development culture.
- Spatial Business Systems joins Enverus (April 2026) — acquisition-driven product expansion.
- Enverus × Tracts.co partnership (April 22, 2026).
- Novi Labs — competing on drilling economics.
- Quorum Software market share.
Foundation-model leveling — petroleum corpus & domain LLMs
- i2k Connect EnRG-LLM — SPE OnePetro corpus of 300,000+ papers/journals/articles used to fine-tune oil & gas LLMs.
- AWS: Customize LLMs with O&G terminology via Bedrock — cloud-vendor evidence that domain customization is now commodity.
- Toward EnergyGPT (arxiv 2509.07177) — academic survey of energy-domain LLM specialization.
- Building domain-specific LLMs in 2026 — methodology overview.
- LLMs and foundation models in petroleum engineering & geoscience (preprint).
Forward-Deployed Engineer model & vertical AI
- Why Anthropic and OpenAI Are Copying Palantir's FDE Playbook.
- What is a Forward Deployed Engineer (MarkTechPost, May 2026) — 800% job-posting growth, $238K avg comp.
- OpenAI's DeployCo — the moat from workflows no lab can simulate.
- a16z: The Palantirization of everything.
- Comprehensive analysis of Palantir FDE model.
- Indie Hacker FDE Playbook (Superframeworks).
- Menlo Ventures: The opportunity in vertical AI.
- VC Cafe: Vertical AI in 2026 — the good, the bad, the ugly.
- Bessemer: Building Vertical AI playbook.
Solo toolstack & open models
- Hermes 4 by Nous Research — 14B / 70B / 405B open weights.
- Hermes 4 release coverage.
- llama.cpp vs MLX vs Ollama vs vLLM (Apple Silicon, 2026).
- Apple M5 MLX — 70B LLMs go portable.
- vllm-mlx — OpenAI/Anthropic-compatible server for Apple Silicon.
- MLX Apple Silicon AI Dev Stack — fine-tune LLMs on Mac.
- Portuguese petroleum word embeddings (PetroVec) — precedent for domain-tuned embeddings.
- Petrovec GitHub repository.
Persona profiles
- Digital Wildcatters — community, podcasts, Fuze conference.
- Fuze conference page.
- Digital Wildcatters $2.5M seed announcement.
- Co-founders interview — Jake Corley & Collin McLelland.
- Allen Gilmer — Crunchbase profile.
- Allen Gilmer at MaScience.
- OGAC names Gilmer Principal Partner (Apr 2025).
- Shale Magazine profile of Allen Gilmer.
- Bloomberg — DrillingInfo rebrand to Enverus.
- Sable Offshore corporate update.
- Sable Offshore Form 8-K (FY2026).
- Sable Offshore & Las Flores Pipeline timeline.
- Sable Offshore restart under Defense Production Act.
- BOEM Pacific OCS revised development plans.
- BETA Land Services — homepage.
- Bryan Hanks LinkedIn profile.
- Bryan J. Hanks, CPL — BETA team page.
- BETA Land Services on LinkedIn.
Texas RRC, filings, regulatory
APersona deep dive — Digital Wildcatters & the operator class
The §07 verdict — Collide's real moat is distribution, not architecture — only matters if you can name the people who make distribution real. This addendum profiles four reference points: the Digital Wildcatters flywheel that birthed Collide, plus three named operators / executives whose patterns are worth modeling. For each, the question is the same: where does the FDE model fit when you're standing where they're standing?
The four-way fit framework
For each persona, evaluate four roles:
- As a buyer — would they pay for your FDE engagement?
- As an advisor — what would they teach you?
- As a partner / distribution — would they help you reach customers?
- As an acquirer — would they (or someone like them) eventually buy you?
A.1Digital Wildcatters — anatomy of the flywheel
You can't understand Collide without understanding the machine that made it. Digital Wildcatters was founded by Collin McLelland (ex-roughneck, "Fracslap" on X) and Jake Corley in 2019 as a podcast. Chuck Yates — ex-$8B energy fund manager, "fired in April 2020" — joined and lent industry credibility. Collide was spun out of the community, not built into it.
How the flywheel actually spins
- Podcast as top-of-funnel. Every episode is a free industry interview. Operators come for the stories, stay because they like the hosts.
- Community as middle. The 9,000-member network is where the actual relationships form — deal flow, hiring, vendor referrals.
- Events as conversion. Fuze and Energy Tech Night convert relationships into pipeline. Sponsors pay; vendors prospect; founders pitch.
- Collide as monetization. The AI product sells back into the community that already trusts the brand. Pre-built warm market.
What this means for your build
- You can't replicate the flywheel in 12 months. Don't try. A podcast takes 3 years to build an audience, and McLelland/Yates have a 7-year head start.
- You can plug into the flywheel. Be on Energy Tech Night as a startup pitcher. Sponsor a single Fuze track. Get on a podcast.
- You can build a counter-flywheel in a sub-niche they don't cover. Compliance officers? Landmen? Lease analysts? Pick a verb the Wildcatters don't own.
- You can use their members as customer discovery. The community is searchable, the conversations are public, the pain points are documented.
A.2Allen Gilmer — the DrillingInfo / Enverus playbook
If Collide has a north star, it's Allen Gilmer. He's done a version of this exact arc, only with structured data instead of LLMs, two decades earlier.
The pattern
- 1999: Co-founded DrillingInfo in Austin with Mark Nibbelink. Started by physically collecting Texas drilling permits daily and turning them into a searchable database.
- 2010s: Layered analytics, GIS, and well-economics onto the permit core. Became the de-facto E&P data platform.
- 2018: Acquired by Genstar Capital (San Francisco PE).
- 2019: Rebranded to Enverus on the 20-year anniversary. Now the energy industry's data & AI exec layer.
- 2021: Gilmer retired from the Enverus board.
- 2025: Joined Oil & Gas Asset Clearinghouse (OGAC) as Principal Partner. Active at MaScience. Also runs Tiki Tāne Pictures (film production, with industry vets) — he's deliberately diversified out of pure energy.
What he'd teach you (and what Collide already learned)
Data acquisition is the moat. Everything else — the UI, the model, the analytics — can be replaced. But if you're the only one who systematically captures the daily permit, the daily filing, the daily completion report — you become indispensable in seven years, not seven months. — Gilmer playbook, paraphrased from his public commentary
The four-way fit
| Role | Fit | Reasoning |
|---|---|---|
| Buyer | unlikely direct | He's an investor/advisor now, not an operator buying tooling. But he'd be the kind of person who endorses you to operators. |
| Advisor | ideal | He has lived the data-to-platform arc. A 30-minute call with Gilmer about "should I build the corpus first or the model first" would be worth more than 100 hours of YouTube. |
| Partner / distribution | via OGAC | OGAC's clearinghouse customer base is upstream sellers/buyers — an adjacent audience to whoever buys your filing automation. |
| Acquirer | not him; his portfolio peers | Enverus itself is the obvious strategic acquirer of a workflow-AI tuck-in. Genstar's playbook is to consolidate. Build with that exit logic in mind. |
A.3Jim Flores @ Sable Offshore — the single-asset political operator
Flores is a different category entirely. He runs Sable Offshore Corp (NYSE: SOC), HQ Houston, with a single concentrated asset: the Santa Ynez Unit (SYU) in federal waters offshore California. Platform Harmony was producing ~22,000 gross bopd as of March 2026 after restart.
Why Sable is unlike a Texas E&P
- Federal waters, not state. Regulator is BOEM (Bureau of Ocean Energy Management) + BSEE for safety, not the Texas RRC.
- Political asset. Restart was contested by California Coastal Commission; required Defense Production Act invocation by the Trump Administration in March 2026. DOJ hearing on Consent Decree modification set for June 1, 2026 in C.D. Cal.
- Pipeline blocked. The Las Flores Pipeline System remains under dispute. Sable submitted a revised plan to BOEM in October 2025 proposing an offshore storage & treating (OS&T) strategy with shuttle tankers as a workaround.
- Single-asset risk profile. One platform's compliance posture is the entire company's compliance posture.
What an FDE engagement at Sable would look like
Sable's compliance burden is high-stakes, low-volume, and bespoke. This is the opposite end of the spectrum from the W-10/G-10 high-volume Texas RRC wedge. The right product for Sable isn't a filing automation pipeline — it's:
- A regulatory document understanding system that ingests the Consent Decree, NEPA filings, BOEM correspondence, CCC briefs, and surfaces obligations + deadlines.
- A stakeholder mapping tool that tracks every party (DOJ, BOEM, CCC, Santa Barbara County, NGOs, plaintiff coalitions) and what they've said publicly.
- A scenario modeling engine for "if the pipeline restarts vs. if we go shuttle tanker, what's the production curve and the compliance trail."
The four-way fit
| Role | Fit | Reasoning |
|---|---|---|
| Buyer | narrow, high-value | Sable could pay $250K–1M annually for a high-quality regulatory-intelligence FDE engagement. But this is not your wedge product — it's a sidecar consulting line at best. |
| Advisor | limited | Flores's deal-making is uniquely his. The lessons don't generalize to a solo software founder. Useful as a case study, not a mentor pattern. |
| Partner / distribution | no | Sable is single-asset. Their network isn't a fit for a Texas RRC compliance product. |
| Acquirer | no | Operators don't buy software companies. Wrong logic chain. |
A.4Bryan Hanks @ BETA Land Services — the existing FDE business
This is the most strategically interesting profile of the four. BETA Land Services is already a Forward-Deployed Engineer business — they just use landmen instead of software engineers. The model isn't theoretical for Hanks. He's been running it for 30+ years.
What BETA actually does
Land & lease acquisition, due diligence, abstracting, title research, title curative work, right-of-way / pipeline projects, and increasingly: solar, wind, transmission line, carbon sequestration, and battery storage. They sell completed landwork. Customer hands them an acreage block; BETA returns a clean title, an executed lease bundle, a ROW package.
This is exactly the value proposition you'd want for your AI-powered version: completed filings, not a platform. The shape of the offering is identical. Only the technology stack changes.
The strategic question
The question for you is whether BETA is a competitor, a customer, a partner, or a template. The honest answer is: all four, depending on framing.
- Competitor framing: If you sell title-curative AI to operators, BETA's landmen lose hours. They have inertia, relationships, and brand on their side. You have margin and speed.
- Customer framing: Sell to BETA, not around them. Your AI becomes their internal force multiplier. Same landmen, 3x throughput. They keep the customer relationship, you take a per-acre fee.
- Partner framing: White-label your filing automation as "BETA Compliance" or "BETA Digital." They distribute, you build. Their existing customers get an instant upgrade.
- Template framing: Even if you never talk to Hanks, BETA's service motion — embedded specialists, fixed-scope deliverables, charge by the project — is the right model to copy.
The four-way fit
| Role | Fit | Reasoning |
|---|---|---|
| Buyer | very strong | BETA already pays for tooling that makes their landmen more productive. The pitch writes itself: "30% throughput on title work, same headcount." Test this in 2 months. |
| Advisor | underrated | Hanks has lived the FDE motion at scale. He's seen what wins and loses in the field for 41 years. Worth 10x more than any VC partner on this specific question. |
| Partner / distribution | structural fit | BETA's customers are precisely the mid-size operators you want. They sit between you and them today. Be the AI engine inside their service. |
| Acquirer | credible at scale | A profitable land services company expanding into adjacencies (solar, carbon) needs digital capability. Strategic acquisition logic is real if you reach $1M ARR. |
A.5Cross-walk: who matters for what
One table, the operative summary of the four profiles.
| Persona | Best framing | What they unlock | How to engage |
|---|---|---|---|
| Digital Wildcatters community + Collide |
Distribution flywheel | 9,000-member warm market, Fuze access, podcast reach | Sponsor Energy Tech Night, pitch as a startup, get on a podcast, lurk in the community |
| Allen Gilmer DrillingInfo / Enverus / OGAC |
Advisor + acquisition logic | Strategic playbook from the founder who did this 20 years ago in structured data | Warm intro via Wildcatters, OGAC, or Austin energy scene; one 30-min advisory call |
| Jim Flores @ Sable Sable Offshore Corp |
High-CV sidecar customer (eventually) | Regulatory-intelligence niche if you ever want to expand off the Texas wedge | Don't chase. Park as a future opportunity once you have a SOC 2 letter and 3 case studies |
| Bryan Hanks @ BETA BETA Land Services |
Partner / distribution / acquirer | Existing FDE business with customer base and operations; structural fit for AI tooling | Cold email after customer #1 lands. Pitch as "internal throughput multiplier." Houston/Lafayette is a 4-hour drive. |
The deeper takeaway
These four profiles span the full operator class:
- Digital Wildcatters is the distribution archetype — how you reach the customer.
- Allen Gilmer is the founder archetype — how you build the long game.
- Jim Flores is the edge-case customer archetype — how you don't get distracted.
- Bryan Hanks is the scaled FDE archetype — how you avoid hiring twenty engineers by partnering with someone who already has them.
BLean Informatics — on-prem architecture & founder fit
This addendum specifies the reference architecture for the deployment topology where customer data never leaves the customer’s perimeter. It is one option in a customer-centric menu — alongside customer-owned colo, customer-managed hyperscaler tenancy, and hybrid — not Lean Informatics’ identity. We meet the same security standards (SOC 2 Type II, ISO 27001, KMS, audit logging) across every topology because those standards are industry-commoditized in 2026. We document this option in depth because (a) the founder’s prior art makes it unusually well-matched to LI’s delivery DNA, and (b) for the sovereignty-sensitive subset of upstream workloads — completion designs, M&A diligence, AFE pricing models, JOA-sensitive negotiation data — on-prem remains the buying trigger.
B.1Founder prior art — what translates
Lean Informatics' founder previously built and deployed a mass-notification platform on top of FM RBDS (Radio Broadcasting Data System), used by DoD, state homeland security offices, counties, sheriff's offices, and fire warning networks. Customers in that domain do not tolerate optimistic engineering. The technical and operational habits that come out of those deployments map directly onto the on-prem AI play described here.
The prior architecture, abridged
- Data plane: Satellite-fed message origination → broadcast via terrestrial FM transmitters → received at distributed endpoints. End-to-end vertical integration.
- Protocol layer: RBDS Group 7A used to carry addressed, opaque payloads on the 57 kHz subcarrier. To an outside sniffer the bitstream is structured but meaningless — the codebook for PIN + service-code addressing lives in the receiver.
- Edge logic: Receivers held the decode logic and automatically re-locked onto the next FM tower carrying their PIN if the current carrier dropped. Resilience was a property of the endpoint, not of central infrastructure.
- Fail-over: Multi-source (satellite primary, multiple terrestrial transmitters secondary). No single point of failure between origination and endpoint.
What that translates to in the AI architecture
| RBDS pattern | On-prem AI analog | Why it matters here |
|---|---|---|
| Vertical stack ownership (sat → transmitter → receiver) | Vertical stack ownership (appliance → model → output portal) | You already know how to ship the whole vertical. Most software founders only know the top of the stack. |
| Group 7A opaque addressing | Token/codebook layer at the customer edge | Same idea: meaning lives at the endpoint, not in the transport. Sniffers see structured noise. |
| Receiver-side logic + auto fail-over | Appliance-side routing logic + paired failover unit | The smarts live where the data lives. Central infrastructure does not need to be trusted. |
| Satellite + terrestrial redundancy | Local inference primary + optional cloud burst for non-sensitive secondary | Two independent paths, customer-controlled which is used per workload. |
| DoD/HS/county/fire procurement experience | Same procurement DNA: audit trails, chain-of-custody, named operator accountability | You don't have to learn this from scratch. Most of the AI-for-O&G field is not staffed for this conversation. |
B.2The architecture
Customer-Owned Inference Appliance + Lean Informatics FDE Operation. Each customer gets a dedicated unit. No cross-customer data pooling. Lean Informatics holds no raw customer data on its own infrastructure at any time.
The appliance — single GPU node + tokenization layer
2U or 4U server lives in customer's server room, on-site closet, or a customer-paid cage at a Tier-II colo of the customer's choosing. Holds: raw documents, tokenization codebook, embeddings, vector index, knowledge graph, agent runtime, audit log. This box is the trust boundary. No raw data leaves it.
Outbound: telemetry, attestations, signed updates only
From appliance to Lean Informatics: signed health telemetry, model attestation hashes, log integrity proofs, and signed software-update receipts. No customer data, no embeddings, no document content. Wireguard tunnel with mutual TLS, customer-controlled kill switch.
Model lab + signed-update bus + FDE workstation
Houston-side: model fine-tuning against synthetic + customer-anonymized eval sets, signed model+config builds, FDE remote-ops workstation. Lean Informatics never holds a customer's raw documents. If subpoenaed, LI has nothing to produce. That is a feature, not an inconvenience.
Operating modes per customer
- Air-gapped: Appliance has no network path to LI. FDE flies in monthly for updates via signed offline media. Highest sovereignty, slowest iteration.
- DMZ / kill-switch: Outbound-only control channel, customer-controlled physical disconnect. Default mode for most customers.
- Hybrid burst: Local inference primary, optional cloud burst (LLM API) for non-sensitive secondary workloads, gated by customer policy per workflow. Used only when customer explicitly opts in.
B.3Reference appliance BOM
One appliance generation per customer cohort. Designed to be unremarkable hardware that an enterprise IT department recognizes and a county sheriff's IT could maintain.
| Component | Reference part | Notes |
|---|---|---|
| Chassis | Supermicro 4U GPU server (4124GS-TNR or similar) | Standard rack, dual PSU, IPMI |
| CPU | 2× AMD EPYC 9354 (32C) | Headroom for tokenizer + retrieval + agent pipelines |
| GPU (option A) | 1× NVIDIA L40S (48 GB) | Runs 30–70B quantized; ~$8–10K street |
| GPU (option B) | 2× NVIDIA H100 PCIe (80 GB) | Heavy inference + light fine-tune; ~$50–60K street |
| RAM | 512 GB DDR5 ECC | Knowledge-graph + vector index resident |
| Storage (data) | 2× 7.68 TB NVMe (LUKS, mirror) | Self-encrypting, FIPS 140-3 SED preferred |
| Storage (OS) | 2× 480 GB NVMe (mirror, encrypted) | Read-only mount post-boot |
| Network | Dual 10/25 GbE + IPMI | Out-of-band on isolated VLAN |
| Security | TPM 2.0, Secure Boot, IPMI-disabled-by-default | Measured boot, attestation chain |
| Key mgmt | YubiHSM 2 or external HSM (customer choice) | Document encryption keys never leave HSM |
| Failover unit | Identical spare in cold standby | Manual cutover; 30-min RTO. Optional. |
Per-unit cost (Lean Informatics side)
- BOM (Option A, L40S): ~$28–38K street, ~$22–30K negotiated direct.
- BOM (Option B, dual H100): ~$95–120K street, ~$75–95K negotiated.
- Burn-in + provisioning + customer-specific imaging: ~$2–4K of LI time per unit.
- Shipping & installation logistics: ~$1–2K (white-glove freight, on-site time).
B.4Data plane & opacity layer
The opacity discipline from the RBDS days applied to AI:
Tokenization at ingest
- Customer-identifying entities — well names, API numbers, lease IDs, operator names, person names — pass through a tokenization layer on the appliance before any model sees them.
- The codebook lives on the appliance HSM. Lean Informatics never possesses it.
- Even if a model output or log were exfiltrated, identifying entities are opaque without the local codebook. This is the same principle as RBDS Group 7A: structured data, meaningless without the receiver-side codebook.
What crosses to Lean Informatics (and what does not)
| Data | Crosses? | Why |
|---|---|---|
| Customer documents (PDF, CSV, SCADA exports) | no | Stay on appliance |
| Embeddings of customer documents | no | Reversible — treated as sensitive |
| Knowledge-graph nodes/edges | no | Stay on appliance |
| Model weights (LI-provided) | yes, signed | Pushed from LI to appliance, attested |
| Health metrics (CPU, GPU, disk %) | yes | For SLA + remote diagnosis |
| Tokenized eval-set statistics (counts, accuracy on tokenized fixtures) | yes, with consent | Opt-in. Used to improve next model rev. |
| Raw error traces with content | no | Logs are redacted on appliance before any export |
B.5Operational model (FDE-as-service)
The Forward-Deployed Engineer is no longer optional — it's the unit of value delivery.
Engagement phases
- Scoping (Week 0–2): NDA, SOW, single workflow defined. Tabletop walkthrough with customer's IT, ops, and (when applicable) compliance.
- Provisioning (Week 2–4): Appliance imaged at LI Houston bench. Customer-specific tokenization codebook generated and burned into HSM. Burn-in + integration tests against synthetic data.
- Onsite deployment (Week 4–5): FDE flies in. Rack-and-stack, network handoff, customer-witnessed key ceremony, smoke tests.
- Workflow onboarding (Week 5–8): FDE resident or daily-onsite. Real customer documents ingested onto appliance, never leaving. First production outputs delivered with customer sign-off on each.
- Steady state (Month 3+): Remote operation via audited tunnel. Monthly onsite cadence. Quarterly key rotation. Annual physical security audit jointly with customer IT.
Audit posture from day one
- Every action by LI's FDE on the appliance is logged with cryptographic chain-of-custody.
- Customer's SIEM (if they have one) receives a live audit feed. If they don't, the appliance generates a signed weekly audit summary.
- Customer holds the kill switch. Severing the control channel does not impair the appliance — it just freezes updates and remote support.
- LI carries E&O + cyber liability insurance from day one (~$15–25K/yr at the appropriate coverage).
B.6Pricing & unit economics
Customer-facing menu
| Tier | One-time | Recurring | What's included |
|---|---|---|---|
| Appliance Lite (L40S) | $25K onboarding | $5,500/mo | Single L40S unit, one workflow, monthly onsite, business-hours support |
| Appliance Pro (dual H100) | $45K onboarding | $11,000/mo | Dual H100, up to 3 workflows, bi-weekly onsite, 24/7 critical-issue line |
| Air-gap Compliance | $60K onboarding | $13,500/mo | Pro tier + paired failover unit + quarterly physical security audit + customer-witnessed key ceremonies |
| Custom (DoD/HS/agency style) | quoted | quoted | Statement of work, SCIF-compatible options, classification handling |
Lean Informatics unit economics (per Appliance Pro customer)
- Year-1 revenue: $45K onboarding + $132K recurring = $177K.
- Year-1 cost of delivery:
- Hardware (amortized over 3 years): ~$28–32K/yr
- FDE time (founder, ~30% allocation): equivalent ~$45–60K/yr
- Travel (onsite cadence): ~$8–12K/yr
- Cyber insurance allocation: ~$3K/yr
- Tooling, monitoring, software allocation: ~$4K/yr
- Y1 gross margin: ~50–55%. Improves to ~65–70% in year 2 (no onboarding cost, hardware partially amortized, FDE allocation drops as ops becomes routine).
- Three customers at Pro tier = ~$530K Y1 ARR, ~$1.0–1.2M run-rate by Y2 if onboarding spreads.
B.7Wedge realignment under the on-prem model
The on-prem architecture is more expensive and slower to ship than the cloud wedge in §12. It only makes sense for workflows where data sovereignty is the buying trigger. That is not the Texas RRC filing wedge.
Two-track strategy
| Track | Wedge | Architecture | Purpose |
|---|---|---|---|
| Track 1 — cash flow | Texas RRC W-10 / G-10 compliance for mid-size operators (§12) | Cloud-native (Claude + pgvector + Hetzner) | Public data, fast deploy, 30-day sales cycle. Funds the company. |
| Track 2 — defensibility | Confidential workflows: completion design optimization, geosteering, M&A diligence, lease portfolio strategy, subsurface modeling | Lean Informatics appliance (this addendum) | High-CV, sticky, true moat. Competes against Collide/Enverus on sovereignty, not features. |
Recommended Track-2 first wedge candidates, ranked
- Confidential M&A diligence for upstream transactions. Buyer-side data room ingestion, well-by-well economics, lease overlap, environmental liability surface. Tied to BETA Land Services partnership in §A.4 — they handle 120+ deals annually. Sovereignty is non-negotiable because every party in a deal room is a competitor of the others.
- Completion design optimization for Permian/Eagle Ford independents. Operator's frac design + offset performance + lateral spacing — the actual secret sauce. No operator will put this in Collide or Enverus.
- Geosteering interpretation + subsurface modeling. Live interpretation during drilling operations. Latency-sensitive (favoring edge inference) and IP-sensitive (favoring on-prem).
- Public-safety adjacencies. Wildland-urban-interface (WUI) fire risk to operator assets, hurricane evacuation logistics for offshore crews, emergency-management integration. Founder's prior network actually opens this category — not a stretch.
B.8Verdict — when this architecture wins and when it loses
Wins
- Where the customer's well/completion/lease data is treated as competitive IP. Mid-size Permian and Eagle Ford operators competing with majors. Yes.
- Where the buyer's compliance/risk officer signs off, not just the engineer. The on-prem story wins that signoff.
- Where the founder's DoD/HS/agency procurement experience is the differentiator. No vertical-AI competitor in O&G credibly has it.
- Where the FM RBDS prior art demonstrates "this team has shipped sovereignty-first systems before." That story is unfakeable.
Loses
- For Texas RRC filings, where the data is public anyway. Cloud wedge wins.
- For any wedge where time-to-first-value < 30 days is the buying criterion. On-prem can't beat SaaS on speed.
- For customers below ~$300K revenue from this single workflow. Onboarding cost amortizes wrong.
- If Lean Informatics tries to run this and the cloud wedge at solo headcount without sequencing. Sequence Track-1 first; Track-2 starts after first paying Track-1 customer.
Decision criteria for committing to Track 2
- One Track-1 customer live and reference-able (≈Month 4).
- Two qualified Track-2 prospects with budget authority identified (target: one BETA-channel, one direct operator).
- First appliance BOM purchased only after a signed LOI on the first Track-2 customer.
B.9The cross-vertical founder thesis
Collide's marketing leans on a specific implicit claim: that oil & gas software fails when built by outsiders, because outsiders automate workflows without understanding why those workflows exist. McLelland says this plainly on the company blog. It is a self-serving framing. It also doesn't survive contact with the venture data.
What the evidence actually says
The best vertical SaaS companies of the last fifteen years were largely built by cross-vertical founders:
| Company | Vertical | Founder background |
|---|---|---|
| Veeva Systems | Pharma CRM | Peter Gassner — Salesforce, not pharma |
| Toast | Restaurant POS | Three founders — none were restaurant operators |
| Procore | Construction mgmt | Tooey Courtemanche — real estate, not construction |
| Carta | Cap tables / equity | Henry Ward — finance generalist, not equity admin |
| Stripe | Payments infra | Collison brothers — outsiders to payments |
| Snowflake | Data warehouse | Muglia (Microsoft), Dageville (Oracle) — outside the incumbent OLAP world |
| Datadog | Cloud monitoring | Pomel, Lê-Quôc — ex-Wireless Generation, an education company |
| Persefoni | Carbon accounting | Founders from finance, not climate science |
What is — and isn't — truly vertical-specific
- Compliance frameworks are universal. SOC 2 is SOC 2. GAAP is GAAP. SOX, audit posture, E&O insurance, evidence handling, chain-of-custody — portable across industries. A founder who has shipped to DoD doesn't need to relearn audit discipline because the customer says "petroleum" instead of "homeland security."
- Cyclic economies share structure. Capex-intensive, M&A-driven, regulated industries on commodity cycles all behave the same way at the business-model layer. The shape of the cycle in upstream O&G is more dramatic than the FM/notification cycle, but the dynamics — cost-cutting in down years, capex unlocked in up years, M&A peaks at cycle bottoms — are recognizable.
- Vernacular and culture are learnable. Six to twelve months of intentional customer discovery, time at industry events, and FDE onsite hours gets a startup veteran functionally fluent. This is empirically how most successful vertical SaaS founders learned their domain.
- What insiders genuinely have: a head start on the first 5–10 customer conversations, faster intuition for "this won't work in the field" failure modes, and existing brand. None of these are insurmountable. All three can be addressed by partnering with an industry-veteran advisor and through the FDE motion.
The Lean Informatics reframe
Stated honestly, Lean Informatics' founder profile is:
- Cross-vertical pattern recognition. Having shipped to DoD, state homeland security, county and sheriff agencies, and fire-warning networks — each with its own vernacular, procurement quirks, and political surface — is exactly the muscle that lets a founder enter a new vertical and avoid the rookie traps faster than someone who only knows one vertical.
- Startup operational discipline. Going from zero to one is its own profession. Founders who have done it before do it faster on the second and third attempt regardless of vertical.
- Vertical-integration experience. Building a full stack from satellite through transmitter to receiver firmware is a rare skill set in the AI-for-O&G field. Most competitors stop at the application layer.
- Autodidactic learning posture. The single trait that most reliably predicts cross-vertical success. Learning a new vocabulary, reading the trade publications, attending the conferences, sitting on the wellsite — this is a 6–12 month effort, not a 6–12 year one.
The honest read on McLelland's framing: it's true that outsiders who skip customer discovery build operationally useless tools. It's not true that outsiders who do the work can't compete. The history of vertical SaaS is mostly outsiders who did the work.
CLean Informatics — vision & business plan
Sections 1–13 plus Addenda A–B were the sector research. Section C is the operating plan that follows from it. Lean Informatics is the protagonist. Collide is a peer in the sector. Enverus is the cloud incumbent. The plan is bootstrap-first, founder-led, and aims at sector dominance on a multi-decade timeline.
The Epic Systems play — the disciplines, not the deployment topology
The strategic posture is Epic Systems for upstream oil & gas. Judy Faulkner founded Epic in 1979 with $70K in Madison Wisconsin. Refused venture capital. Refused to go public. Ran 45 years of bootstrap growth to roughly $5B in annual revenue. Owns ~40% of US hospital EHR. Mayo Clinic runs on Epic. Kaiser runs on Epic. Cleveland Clinic runs on Epic. Once an institution deploys Epic, the lock-in is decades. That is the model.
What we copy from Epic is the operating discipline, not the deployment topology. Epic itself has evolved to a hybrid posture — Hyperdrive on Microsoft Azure, Azure Virtual Desktop, Azure Large Instances — without abandoning what made Epic Epic: bootstrap, implementers before salespeople, customer-deep workflows, refuse easy SaaS-ification, refuse early acquisition, annual user summit, geographic concentration, decades-long customer lock-in. That set of disciplines is what made Epic dominant. The on-prem-only piece was an artifact of the 1979 starting point, not the source of the moat. Lean Informatics is built around the disciplines; the deployment topology is whatever the customer prefers.
Lean Informatics will not match Epic's $5B revenue in 24 months. No one does. The 24-month $120–180M target is a milestone on a multi-decade arc. What we copy from Epic is the structural posture, not the financial trajectory.
What Epic owns and how (the playbook)
| Epic pattern | How it works | Lean Informatics analog |
|---|---|---|
| Customer-centric on data location | Epic originally ran in hospital data centers. As of 2025 Epic also runs on Microsoft Azure (Hyperdrive on AVD, Azure Large Instances) — customer chooses. ~15% of Epic sessions are Hyperdrive on Azure as of May 2025. | Customer dial: on-prem appliance (Addendum B), customer-owned colo, hyperscaler tenancy, or hybrid. Same SOC 2 / ISO 27001 standards every topology. LI’s identity is the FDE relationship, not the box. |
| Long, deep implementations | 12–24 month deployments by Epic-employed implementers ("Implementation Services"). Customers pay millions over years. | The FDE motion. Founder-led at first, then a small army of former DoD/HS field engineers. The implementation is the product. |
| Workflow depth, not breadth | Epic touches every clinical and billing workflow. Once it owns the workflow, the data, and the user training, displacement requires re-running the entire workflow. | Track 1 owns the RRC filing workflow. Track 2 owns the confidential-IP workflows. Together, they own the operator's day. |
| Institutional anchor customers | Mayo, Kaiser, Johns Hopkins, Cleveland Clinic. Their reputations carry the brand. | Land Diamondback, Pioneer, EOG, or a supermajor as a reference account. One halo customer per year. |
| Bootstrap, private, employee-owned | Faulkner refused VC and IPO. Profits stay in the company. Decisions stay with the engineers. | Bootstrap as long as possible. Take outside capital only when the alternative is losing a strategic window. Retain control. |
| Geographic concentration | Madison/Verona Wisconsin. All Epic developers in one place. Culture is the asset. | Houston. All Lean Informatics FDEs and engineers based here. Drive distance to the customer base. Culture matters. |
| Annual user group meeting | Epic UGM at the Verona campus — 25,000+ attendees, legendary in healthcare. Builds the community as a moat. | Launch the Lean Informatics annual summit by year 3. Operator IT directors, compliance leads, FDE alumni. Houston venue. |
| Refuse the easy SaaS-ification | Epic stayed enterprise, deep, expensive, and slow to release — even after going hybrid on infrastructure. The discipline didn't change; the substrate did. Held the line against SaaS competitors who looked faster but couldn't go deep. | Don't pivot to a self-serve / freemium model when pressure comes. The deep FDE-embedded services relationship is the moat. The underlying infrastructure can move with the customer. |
| Long sales cycles are the moat | Epic deployments take 18 months. That barrier keeps faster competitors out and locks customers in for decades. | Same logic applies. A 6-month FDE-led deployment is a feature, not a bug. It selects for customers who will stay for 10 years. |
What this changes about the 24-month plan
- Customer selection becomes prestige-aware. Every Track-2 customer in years 1–2 should be a name that future customers will recognize. One Pioneer or Diamondback is worth ten anonymous mid-size operators in brand terms.
- Don't apologize for being expensive. Epic doesn't. The pricing menu is correct. Hold the floor.
- Hire the implementers first, not the salespeople. Epic's army of implementers (Forward-Deployed Engineers, in modern parlance) is what makes the deployments stick. The same hire ranking applies.
- Geographic concentration is a feature, not a constraint. Houston-based FDEs serving Houston-clustered operators is faster, cheaper, and more resilient than a distributed workforce.
- Long-term thinking over short-term ARR. The 24-month target is a milestone. If hitting it requires sacrificing depth, customer fit, or post-sale trust — don't.
- Refuse the wrong investors. If raising, take capital that's patient with the Epic-style timeline. Founder-friendly seeds. Strategic capital from energy LPs. Not growth equity that demands SaaS multiples and short payback.
Mission & positioning vs. the field
Mission: build the FDE-led services company for confidential industrial AI workflows, starting in upstream oil & gas, on the Epic Systems operating pattern — customer-centric on data location, deep on workflow ownership.
| Competitor | Their angle | Lean Informatics' counter-angle |
|---|---|---|
| Enverus ONE | Cloud-only on their tenancy. Governed AI, SOC 2 Type II, 25-year data heritage. No customer choice on substrate. | Meet the customer where the customer wants to be: their colo, their hyperscaler, our recommended on-prem appliance, or hybrid. Same security standards either way. Win on FDE depth and pricing discipline. |
| Collide | Vertical-AI startup, RIGGS LLM, FDE motion, Digital Wildcatters distribution. Cloud-first. | Compete in adjacent positioning — operator IT directors, sovereignty-sensitive customers, agency-adjacent customers. Offer deployment-topology choice they don't. Not a head-on Wildcatters-community fight. |
| Palantir Foundry | Enterprise FDE incumbent, government heritage, full-stack platform. | Smaller, faster, operator-shaped pricing. Their floor is $1M; ours starts at $5.5K/mo. Same FDE DNA, lower altitude. |
| Big Consulting (Accenture, McKinsey) | Strategic advisory + delivery. No proprietary AI stack. | We ship a product, not slides. Foundation-model + workflow + FDE bundled. Customer can host wherever they want. |
| Novi Labs / Quorum / C3.ai | Specialized point solutions or large legacy platforms. | Workflow-specific, services-led, infrastructure-agnostic. Deployment-topology choice is a feature none of them offer. |
What we sell
- Track 1 — cloud SaaS: Texas RRC compliance + production reconciliation + JSAs for mid-size operators. $1.5–4.5K/mo.
- Track 2 — on-prem appliance: confidential workflow AI (M&A diligence, completion design, geosteering, reservoir analytics). $5.5–13.5K/mo + onboarding.
- Track 3 — services: Forward-Deployed Engineer engagements for high-stakes accounts. $15–50K/mo retainers.
- Track 4 — agency cross-sell: emergency-management AI for DoD / state HS / county / fire-warning customers, leveraging the founder's prior network. Multi-year contracts $1–10M.
C.1The 24-month target & honest math
$10–15M/month total revenue, 24 months, bootstrap. That equals $120–180M annualized. The math has to work somehow. The table below shows it doesn't work via direct sales alone.
| Avg deal size (ARR) | Deals needed for $150M | Feasibility at solo + bootstrap |
|---|---|---|
| $50K | 3,000 | impossible |
| $200K | 750 | impossible |
| $500K | 300 | requires 100+ FTE sales org |
| $2M | 75 | requires channel partner doing volume |
| $5M anchor + $500K small | 10 anchors + 60 small | possible with right anchor + channel + ~5 FTE |
| $20M agency contract + small | 2–3 agency + 40 small | possible if gov network closes |
The conclusion the math forces: $120–180M in 24 months bootstrap requires one or more of — an anchor enterprise/gov deal ($10–30M ARR each), a high-velocity partner channel (BETA-style, 100+ small deals through their existing customer base), or a productized SKU sold through resellers. Section C.3 maps the four leverage paths.
C.2Base case — solo bootstrap without leverage paths
What happens if Lean Informatics ships Track 1 and Track 2 cleanly but none of the leverage paths fire. This is the floor.
| Month | Track 1 customers | Track 2 customers | Track 3 retainers | Run-rate ARR |
|---|---|---|---|---|
| 3 | 1 (pilot) | 0 | 0 | $30K |
| 6 | 2 | 0 | 1 | $240K |
| 9 | 4 | 0 | 1 | $400K |
| 12 | 6 | 1 | 1 | $700K |
| 15 | 9 | 1 | 2 | $1.1M |
| 18 | 12 | 2 | 2 | $1.8M |
| 21 | 16 | 3 | 3 | $2.8M |
| 24 | 20 | 4 | 3 | $3.8M |
Solo bootstrap floor: $3–5M ARR by month 24. Roughly $300K–420K/month. That's 2–4% of the stated $120–180M target. The gap is real and the rest of the plan is about closing it.
Notable: this floor is still a credible high-growth bootstrap result and a real business. From an Epic Systems lens, this is the equivalent of Faulkner's first 3–5 years — small revenue, deep customers, building the foundation. Don't mistake it for failure.
C.3The four leverage paths
Each path is independent. Each is leveraged by something the founder already has. The plan opens all four in parallel and gates further investment on leading indicators.
Path A — BETA Land Services partner channel
| Mechanism | White-label or revenue-share into BETA's 4.3M-acre customer base. LI's appliance becomes the digital throughput multiplier inside BETA's existing FDE motion. |
| Revenue model | 20–30% rev share on pull-through transactions, or per-acre / per-deal fee. |
| Upside | If 10% of BETA's annual transactions pull LI tools, that's plausibly $20–50M/yr to LI. |
| Timeline | 3–9 months to first pilot, 12–18 months to material rev share. |
| Pre-conditions | Working Track-2 appliance demo, BETA pilot agreement, channel contract with exclusivity language. |
| Risk | BETA builds internally instead of partnering; takes longer than expected. |
| Leverage source | BETA's appetite for AI tooling; Hanks's 41 years of FDE-business operating muscle. |
Path B — DoD / state HS / county agency contract
| Mechanism | Reposition the on-prem appliance + RBDS-era operating discipline for agency emergency management. WUI fire risk, agency-side AI for emergency notification, sheriff incident triage, county hazard modeling. |
| Revenue model | Multi-year contracts at $1–10M ACV. Annual recurring + services. |
| Upside | 1–2 agency contracts in 24 months = $5–30M revenue. |
| Timeline | 12–18 months to first contract dollar (gov procurement is slow). |
| Pre-conditions | Small Business contracting registration (SAM.gov, UEI), partner with existing GSA-schedule prime, optional SBIR/STTR grant for non-dilutive bridge. |
| Risk | Gov procurement timing; scope shifts during contracting. |
| Leverage source | Founder's DoD / state HS / county / sheriff / fire-warning customer base. This door is open only because the prior company existed. |
Path C — Anchor enterprise (T2 operator or supermajor)
| Mechanism | One operator at $5–15M ARR via Track-2 multi-workflow deployment plus heavy Track-3 services. Target list: Diamondback, Pioneer, Devon, EOG, Continental, Plains, Targa, Enbridge. This is the "Mayo Clinic of operators" play. |
| Revenue model | 3-year enterprise contract, $500K–1.5M onboarding, $5–15M annual. |
| Upside | Single anchor moves the needle materially AND becomes the brand-halo reference. |
| Timeline | 9–18 months to close, 6 months to deploy. |
| Pre-conditions | SOC 2 Type II (required), 1–2 reference customers, exec-level intro. |
| Risk | Single point of failure; one customer dependence; slow procurement. |
| Leverage source | Allen Gilmer's network if a warm intro materializes; Houston operator network broadly; Wildcatters community indirectly. |
Path D — Productized appliance + reseller program
| Mechanism | Package the Track-2 appliance as a SKU. Landmen, O&G consultants, regional IT integrators, and BETA-like service firms resell it under a margin split. |
| Revenue model | 30–50% margin to reseller, 50–70% to LI. Per-unit revenue $50–100K + recurring. |
| Upside | 50–200 units/yr at $200K blended = $10–40M revenue. |
| Timeline | 12–18 months to launch the program; 18–24 months to material volume. |
| Pre-conditions | Stable Track-2 product, reseller training program, channel agreements, support tier infrastructure. |
| Risk | Channel conflict with direct sales; support burden scales with units. |
| Leverage source | The appliance product itself + LI's hardware/firmware experience from the RBDS era. |
C.4Org & hiring schedule
Bootstrap discipline: hire only when revenue justifies it. Hire implementers (FDEs) before salespeople, on the Epic pattern. Cap at <15 FTE until $10M ARR.
| Trigger | Hire | Profile | Comp range |
|---|---|---|---|
| Customer #1 live (~Month 3) | Fractional ops admin (10–15 hr/wk) | Houston-based, contracts/AP/customer onboarding | $2–4K/mo |
| Customer #4 or BETA pilot signed | Second FDE (full-time) | Former DoD/HS field engineer or O&G ops engineer | $180–220K base + equity |
| $2M ARR or Track-2 customer #2 | Third FDE + 1 full-stack engineer | Engineer profile: pipelines/data/agents | $150–200K each |
| $5M ARR or Path A activation | Channel/sales lead | Industry vet from BETA, Enverus, Wildcatters network; closer not BDR | $180–250K base + commission + equity |
| $10M ARR or Path B contract | Compliance/contracts/SOC officer + 2 more engineers + 1 more FDE | SOC 2 / FedRAMP / agency-procurement-literate | $200K+ each |
By month 24 in the stretch scenario: ~12–18 FTE. By month 24 in the base case: ~3–5 FTE. Compare to Epic's first 5 years: Faulkner kept the company under 10 people for the first half-decade. Discipline is the asset.
C.5Sales motion
Year 1 (founder-led, Epic-style)
- First 5–10 customers: founder closes 100%. Mostly Houston / Texas mid-size operators via Wildcatters community, Houston energy network, and friend-of-friend warm intros. Treat each customer like a future Mayo Clinic reference — depth, not speed.
- Conference presence: NAPE (Houston, Feb), CERAWeek (Houston, March), Fuze (Houston, Oct), AAPL Houston meetings. Speaking slots wherever offered. The RBDS-to-AI architecture story is a real talk title.
- Inbound content: LinkedIn writing on (a) on-prem vs. cloud trade-offs for operator IP, (b) FDE economics, (c) RBDS-era prior art applied to AI sovereignty. ~2 substantive posts/week. Goal: become recognizable to operator CIOs and Wildcatters community by month 9.
- Outbound: Warm intros via Digital Wildcatters, OGAC, Houston energy meetups, BETA-channel referrals.
Year 2 (channels open)
- BETA channel motion: co-sell, joint case studies, white-labeled materials. Hanks's name on the joint pitch deck. Quarterly business reviews.
- Gov procurement track: separate sales cycle. Founder-led but with prime contractor partner. SBIR application as a non-dilutive bridge.
- Direct enterprise: sales lead hired Month 15+ owns supermajor and T2 conversations. Founder remains the executive sponsor on top accounts.
- Reseller program: soft launch Month 18 with 3–5 hand-picked reseller partners (one of which is BETA).
Year 3+ (the Epic UGM equivalent)
- Annual Lean Informatics Summit in Houston. Operator IT directors, compliance leads, FDE alumni. Single track. Invitation-only first 2 years. This becomes the cultural moat — the "Epic UGM" for upstream.
- Customer Advisory Board: 6–8 anchor customers meeting quarterly. Influence the roadmap. Loyalty through inclusion.
C.6Product roadmap (24 months)
| Months | Build | Why now |
|---|---|---|
| 1–3 | Track 1 v1 — RRC W-10 / G-10 automation | The wedge. Public data, fast deploy, $5K pilots within 30 days. |
| 3–6 | Track 1 v2 — production reconciliation, JSAs, lease term extraction | Per-customer ARR expansion. Same customers, more workflows. Epic-style: own the day. |
| 6–9 | Track 1 v3 — workover reports, ESP RCA via OEM data integration | Engineering credibility. Moves LI from filing automation to operational decision support. |
| 6–12 | Track 2 v1 — M&A diligence appliance (BETA pilot) | Opens Path A. M&A diligence has the cleanest sovereignty case. |
| 9–12 | Petroleum-domain LLM fine-tune (Qwen 14B / Hermes 14B base) | RIGGS-equivalent. Target: 55–65% on SPE PE exam subset. |
| 12–18 | Track 2 v2 — completion design optimization for Permian operators | Highest customer-IP-sensitivity workflow. Pure on-prem play. |
| 12–18 | Track 4 v1 — agency emergency management cross-sell (WUI fire risk, county hazard modeling) | Opens Path B. Leverages founder's prior network. |
| 18–24 | Track 2 v3 — geosteering + reservoir analytics | Live-during-drilling latency play. Edge inference advantage. |
| 18–24 | Reseller program launch | Opens Path D after Track 2 stabilized. |
C.7Financial model
Three scenarios, ARR run-rate
| Month | Conservative bootstrap | Base + 1 leverage path | Stretch + 2–3 paths |
|---|---|---|---|
| 6 | $240K | $500K | $1.2M |
| 12 | $700K | $3M | $12M |
| 18 | $1.8M | $12M | $45M |
| 24 | $3.8M | $25–35M | $100–180M |
Cost structure
| Line | Year 1 ($) | Year 2 ($) | Notes |
|---|---|---|---|
| Founder draw | $100–150K | $180–250K | Houston cost of living, family-first scheduling |
| FDE / engineering hires | $200–400K | $700K–1.8M | 1–2 in Y1, 3–7 in Y2 depending on scenario |
| Hardware (appliances on inventory) | $50–200K | $300K–1.2M | Tied to Track-2 customer pipeline; lease-back option |
| Cloud / API costs | $15–40K | $60–200K | Anthropic + Hetzner + observability |
| SOC 2 readiness + audit | $50–80K | $30–50K | Vanta + auditor. Type I Y1, Type II Y2. |
| E&O + cyber insurance | $15–25K | $30–50K | Material from day one |
| Legal, accounting, tooling | $30–50K | $60–100K | Texas LLC or Delaware C-corp depending on raise posture |
| Travel + conferences | $30–50K | $60–100K | Critical for customer-facing FDE motion |
| Year total burn | $490K–995K | $1.4–3.5M | Y2 scales with scenario |
Break-even (monthly): Conservative ~Month 18–20. Base + 1 path ~Month 9–12. Stretch ~Month 4–6.
Cash on hand requirement: ~$200–400K personal / savings to cover first 6–9 months before revenue covers burn (conservative case).
C.8Risk register — plan-specific
| Risk | Severity | Mitigation |
|---|---|---|
| Gov procurement timing > 18 months | high | Start SBIR/STTR pipeline Month 1. Partner with GSA-schedule prime by Month 6. |
| BETA builds internally instead of partnering | medium-high | Lock channel agreement with exclusivity clauses. Bring working demo to first conversation. |
| Anchor enterprise deal slips or kills momentum | high | Never depend on single anchor. Diversify by Month 12. |
| Founder burnout | high | Cap weekly hours, mandatory PTO, family-first scheduling. Second FDE hire is for resilience, not just capacity. |
| Hallucinated filing causes regulatory issue | existential | Human-in-loop on every submission. E&O insurance Day 1. No auto-submit ever. |
| Cyber incident on customer appliance | existential | Insurance, audit posture, hot spare, customer SIEM integration. Tabletop exercise quarterly. |
| Pressure to SaaS-ify / commoditize | high — Epic-relevant | Hold the line on the appliance / FDE / sovereignty model. Don't follow Enverus into cloud-only. |
| Enverus drops sub-$1K tier | medium | Don't compete on price. Compete on sovereignty + FDE motion. |
| Hardware supply chain disruption | medium | Maintain 3–6 month BOM inventory. Dual-source GPUs (L40S + H100 paths). |
| Channel conflict (reseller vs. direct) | medium | Strict territory and account rules in channel agreements. |
| Acquisition offer mid-arc | good problem | The Epic answer is "no." Decline unless valuation reflects 10-year lock-in moat. |
C.9Trigger conditions — when to raise, slow, or pivot
When to raise capital (against the Epic instinct)
- $1–3M angel — if Path A or B requires capital to capture a closing window (a BETA exclusive deal, a specific gov RFP).
- $3–5M seed — if anchor enterprise deal requires SOC 2 Type II completion within 6 months, or if FDE hiring needs to outpace customer growth to win an account.
- $10–15M Series A — if two leverage paths are firing simultaneously and the constraint is execution capacity. Target investors: Mercury Fund (Collide's lead, ironically the most relevant), Energy Innovation Capital, S2G Ventures, EIV Capital, plus a strategic from the Allen Gilmer / OGAC network.
- Default position: don't raise. Faulkner didn't. Bootstrap discipline pays decade dividends.
When to slow down (hold hiring, focus on retention)
- Track 1 customer count < 4 by Month 9.
- Track 2 first appliance customer not deployed by Month 12.
- Burn-to-ARR ratio above 3.0.
- Customer churn above 10% annualized in the first 12 customers.
When to pivot or revise the goal
- BETA pilot fails or BETA passes: rebuild distribution via direct + Wildcatters and reset target to $15–25M ARR in 36 months.
- Gov procurement track produces no LOI / contract pipeline by Month 18: deprioritize Track 4, focus on commercial.
- Sustained burn-to-ARR > 5.0 by Month 18: open acquihire conversation (Enverus, Quorum, Collide, or BETA itself) only as a last resort — against the Epic principle.
- Hallucinated filing or appliance incident: do not pivot — address it as an incident-response operation. Trust loss is harder to recover than revenue.
Leading indicators to monitor monthly
| Indicator | Healthy | Warning | Stop |
|---|---|---|---|
| Customer count growth | +1/mo by M6, +2/mo by M12 | <0.5/mo for 2 months | 0 net adds for 90 days |
| Cash runway | >9 months | 6–9 months | <4 months |
| BETA pilot progress | NDA→SOW→deploy on track | Slippage >30 days | BETA stops returning calls |
| Gov pipeline | 1+ active RFI/RFP | 0 active, 1 in conversation | 0 conversations for 90 days |
| Net Revenue Retention | >110% | 95–110% | <95% |
| Hallucination / incident rate | 0 | Any reportable incident | 2nd reportable incident in 90 days |
C.10Honest verdict on the target & the long game
$10–15M/month total revenue in 24 months from a solo bootstrap start is at the right tail of the distribution of what has actually happened in B2B vertical AI plays. Not impossible — Glean, Hebbia, and a handful of vertical AI plays have hit comparable numbers — but those almost universally involved either significant funding, an unusual viral channel, or a category-creating product positioning.
The wind at our back, in plain terms. Two structural tailwinds compound on each other. First, the foundation-model leveling event (§04) compresses 25 years of institutional industry knowledge into the weights of every frontier model, killing the "you need a roughneck to compete" defense. Second, security and infrastructure standards (SOC 2 Type II, ISO 27001, KMS, audit logging) are industry-commoditized in 2026 — the perimeter is no longer a moat, the hyperscalers and on-prem stacks meet the same bar. Both of those moats are gone for the incumbent. What remains as defensible is the FDE services relationship and the workflow ownership it produces — which is exactly what Lean Informatics is built around. Three private-equity flips have made the incumbent (Enverus) pricing-opaque and organizationally slow at exactly the moment those two moats evaporated. The newcomer (Collide) has proven a single founder team can raise, build, and sell in this category in twelve months. Old-buddy networks, favoritism in vendor selection, and tribal industry knowledge cannot stop a services-led delivery that already speaks the language fluently, meets the customer where the customer wants to host, deploys in 30 days, and costs one-tenth of what the incumbent is charging. That is not a slogan. That is the structural read.
Realistic Y2 endpoint without leverage paths firing: $3–5M ARR ($300K–420K/month). That is 24–36× below the stated $10–15M/month target. To hit the target, at least two of (A) BETA channel + (B) gov contract + (C) anchor enterprise + (D) reseller program must fire in 24 months.
The 12-month checkpoint
If by month 12 we have:
- 5+ Track-1 customers ($500K+ ARR)
- 1 Track-2 customer live
- BETA pilot signed
- A gov contract in the procurement pipeline (RFI or RFP stage)
...then $30–80M ARR by month 24 is on the table. Consider raising at this point if execution capacity is the bottleneck.
If by month 12 we have:
- 2–3 Track-1 customers
- No Track-2 customer
- BETA passed or stalled
- No gov pipeline
...then $5–10M ARR by month 24 is the realistic ceiling. The right move is to revise the goal to $25M ARR in 36 months, or raise capital to accelerate — but only if the leverage paths require it.
The Epic Systems lens on the verdict
Through the Epic lens, the 24-month target is the wrong unit of measurement. Faulkner's 5-year revenue was probably in the low millions. By year 10 Epic had maybe ~$50M. By year 20, several hundred million. The compounding kicked in once the institutional anchors and the workflow lock-in were established. Lean Informatics should optimize for the same shape: deep customers, lock-in workflows, geographic concentration, refused easy money, refused easy SaaS-ification, refused acquisition before the moat is built.
Under this lens:
- 24-month goal: 1–2 anchor customers + BETA partnership locked + first gov contract in pipeline + 15–20 Track-1 customers. Revenue $10–30M ARR. Foundation set.
- 60-month goal: 8–15 anchor customers, BETA channel running, 3–5 gov contracts, 100+ Track-1 customers, reseller program live. Revenue $80–200M ARR. Recognized as the sovereignty-first vertical AI vendor.
- 10-year goal: The Epic of upstream. Default vendor at most US E&P CIOs. $500M–1B+ ARR. Still private. Still bootstrap-ratio-disciplined. Still in Houston.
17Methodology & epistemic posture
This report was produced through a structured first-principles analysis on May 21, 2026, by an analyst working in the Knuth–Ousterhout–Karpathy mode: rigor, complexity reduction, verifiability.
- Primary sources (Collide.io, Enverus, RRC, McLelland's X presence) were preferred over secondary commentary where they conflict.
- Quantitative claims were spot-checked against multiple sources where possible. Where a single source is cited, treat the number as indicative.
- The "GPT-5.1 scored 4%" benchmark number is flagged as suspect because it deviates implausibly from public model performance; the surrounding numbers (Grok 4, Sonnet 4.5, RIGGS) are internally consistent.
- The solo-feasibility verdicts are based on the 2026 tooling landscape (MLX 0.31+, Hermes 4, Qwen 3.5, Claude Opus/Sonnet 4.6). They will look different in 2027.
- "Solo achievable" never means "trivial." It means: one disciplined operator with the listed toolstack can reach equivalent customer outcomes for a narrow workflow, within the timeframes given.
- This is not legal, financial, or regulatory advice. RRC filing is a regulated activity. Get a Texas-licensed compliance professional before automating submissions.