Why we built on the open web

When you start a B2B data company, one of the first architectural choices you make is also one of the most consequential. You have to decide what the substrate of your database is going to be — which sources you are going to commit to, where the data is going to come from, and what world you are betting on.

The dominant choice in our industry has been to treat one particular professional network as the substrate. Most of the established vendors are, directly or indirectly, derived from it. Their coverage tracks its adoption. Their freshness depends on access to it. Their economics assume continued availability of it.

We made the opposite bet. Fullinfo is built on the open web — on company websites, on regulatory filings, on news, on the structured signal of millions of independent sources that we crawl ourselves or consume through commercial search APIs. We chose this not because it was easier (it wasn't), but because the bet on a walled garden looked structurally fragile to us, and that fragility is becoming visible faster than we expected.

This is a short piece about why we made the choice we made, and what it buys our customers in practice.

The fragility of single-source intelligence

A B2B data company that depends on a single platform has, in effect, outsourced its product roadmap to that platform's enforcement team. When the platform tightens its API. When the platform sues its scrapers. When the platform updates its terms of service. When the platform bans an extension. Every one of those events propagates directly into the data company's product. The vendor is along for the ride.

This dependency is not theoretical. The legal pressure on scraping has increased significantly in the last few years. The case law is moving in one direction. The platform-owners are getting more sophisticated about detecting and stopping data extraction, and more willing to litigate when they catch it. Whatever your view of who is right in those disputes, the practical effect for the data buyer is the same: the supply chain you bought into is being made narrower and more expensive in real time.

The open web does not have this problem. There is no single party that can change the terms under which you access company websites. There is no single legal action that can collapse your sourcing. The substrate is distributed by design, which means it is robust by design.

What “open web” actually means for us

We are deliberately not going to describe our infrastructure in detail in this post. The general shape is fine to share: we collect company-level data primarily by observing what companies publish about themselves on the open web; we consume professional and individual information through commercial search APIs that we pay for as a licensed customer; we verify contact information through a multi-step process that records its provenance.

What this means in practice for customers:

Coverage doesn't collapse in markets where one network has low adoption. Large parts of the global business landscape — SMBs across most of Europe, family businesses across Asia, the long tail of regulated industries that prefer their own websites to social platforms — are visible on the open web in ways they are not visible inside any single walled garden. Vendors who derive from one source see these markets as gaps. We see them as the largest part of the database.

The data does not have a single point of failure. When a website changes, we re-crawl. When a public filing updates, we re-index. When a commercial search source rolls out a new feature, we use it. No single upstream decision can break our coverage; the architecture is plural by construction.

Source links are real. Every record carries the URL or document it was observed at. You can click through and check. This is something a customer can do today, not something that requires a legal escalation.

The data is defensible. Because every record is sourced from material that the subject (the company, the person, the employer) chose to publish on the open web, the lawfulness of our processing rests on a much firmer footing than if we were ingesting data of unclear provenance.

What we give up to get this

It would be misleading to suggest that the open web bet is strictly better than the single-network bet. There are real trade-offs, and a customer evaluating us deserves to know them.

The open web is not as uniform as a single platform. A single platform offers a clean schema — everyone fills in the same fields the same way. The open web is messier. Job titles are described differently across cultures. Headcount is reported on some company sites and not others. Industries are categorized inconsistently. Working with this kind of heterogeneity is more engineering work than working with a normalized feed, and the results are sometimes less crisp.

The open web is sparser on certain kinds of personal information that people only volunteer inside a closed network. If your use case depends entirely on what a person chose to put on a social platform, you may legitimately need that platform. We try to be honest about this when we talk to prospects.

And the open web does not give us the same growth curve as a fast-rising platform. We don't get a free ride on someone else's network effects. Our data improves through our own work, not by riding a wave we don't control. This is slower, and we are okay with that.

Why we think it's still the right bet

Five years ago, the case for the single-network bet was compelling. The network was growing, scrapers were tolerated, the data quality was high, and the upstream legal pressure was minimal. A vendor that built on top of it could ride that growth efficiently.

The situation today is different. The network has matured. The legal pressure on extraction is high and rising. The enforcement infrastructure is sophisticated and aggressive. The data quality is degrading in places (people stop updating their profiles when the cost of doing so exceeds the benefit). And the vendors who built on this substrate are starting to look like they're holding a depreciating asset they didn't pay to build.

We are skeptical that this trend reverses. Our bet is that the structural advantages of the open web — legal defensibility, source diversity, global breadth, no single point of failure — will compound over the next several years as the limitations of single-source approaches become more apparent.

This is not a prediction we want to be loud about. We'd rather just keep building and let the work speak for itself. But because customers and prospects ask, we wanted to put the reasoning on the record.

See the open-web approach in practice

Curious what 100M+ source-linked organizations looks like?

Request early access and we'll show you. Bring a market you want to explore — we'll run a live search on it.

Request Early Access →

This is the fourth piece in a Fullinfo blog series on what serious B2B data ownership looks like. Earlier pieces: What it means to actually own your data, The four questions every B2B data buyer should ask their vendor, and Data, AI, and the question of accountability.

Open web Positioning Data sourcing B2B data

Why we built on the open web

The fragility of single-source intelligence

What “open web” actually means for us

What we give up to get this

Why we think it's still the right bet

Curious what 100M+ source-linked organizations looks like?

Continue reading

Data, AI, and the question of accountability

What a contact record should contain

Privacy by design in a B2B data company