Protege
Footwork leads the Series A of the leading data platform for AI training.
The key substrates underpinning the current revolution in AI are energy, compute, data, and models. There are massive companies being built in each of these areas — NVIDIA in compute, for example, and of course OpenAI, Anthropic, et al. in models. In data, we’ve seen the rise of large companies in human labeling of data, such as Scale and Surge AI. In real-world, non-synthetic data, companies such as Google and X are leveraging their own data to build models such as Gemini and Grok. And companies that sit on compelling proprietary datasets have found ways to monetize their data for AI training, such as Reddit through its deals with OpenAI and other model companies. But, surprisingly, no central platform existed to enable data providers and those wanting to license data for AI training to transact. That is, until Protege launched last year.

Protege has quickly become the leading data exchange for AI training. It works with over 100 data providers across four verticals: healthcare, media, audio and speech, and motion capture data. By making this data quickly accessible and usable for model development, Protege has landed most of the major foundational model companies as customers, as well as many application layer AI companies that need access to proprietary datasets to deliver customer value.
We were fortunate to connect with Protege co-founder and CEO Bobby Samuels in January, shortly after the company launched, thanks to an intro from Colin Evans at OpenAI. We were immediately struck by the expansiveness of Bobby’s vision to unlock proprietary data for AI progress, and then as we watched the game film in the subsequent months, we were blown away by the team’s ability to execute to make that vision a reality. Protege has grown more than 20X in GMV already in 2025 up from 2024, more than double its aggressive forecast at the start of the year. Customers are transacting across multiple verticals, and most of their deals are recurring in nature, or multi-year commits.
While the company wasn’t out running a fundraising process, we did our work in the background to gain a tremendous amount of conviction in the Protege opportunity and team. We are delighted today to announce the company’s $25M Series A, led by Footwork, with participation from existing investors including CRV, Flex Capital, Shaper Capital, Bloomberg Beta, and Liquid 2. More here from Natasha Mascarenhas in The Information, and from Protege directly. Protege represents Footwork’s largest initial investment to-date, and yet another company that we have built a thesis around and pre-empted to invest (just as with our other announced investment this week, Confido).
What gets us most excited about working with Protege is the team. Bobby and co-founder Travis May worked together at Datavant and LiveRamp, both data companies with several parallels to Protege’s business model and strategy. During a customer reference call, we heard co-founder and Chief Scientific Officer, Engy Ziedan, be referred to as Protege’s “secret weapon” — a health economist by training, she has a deep understanding of data in healthcare, and what matters to customers. Co-founder and CTO Richard Ho has kept the team moving very quickly to iterate on product and build up technical defensibility. The pace at which this team is moving — for example, launching two new verticals just last week — is perhaps what has defined it best in the just over one year since Protege launched. If building the next generational data company to power the AI revolution appeals to you, come join us — we’re hiring across engineering, go-to-market, operations, product success, and research.




congrats - awesome company