“The web is a collection of data, but it’s a mess,” says Will Bryk, co-founder and CEO of Exa. “There’s a Joe Rogan video here and an Atlantic article there. There’s no organization. But the dream is that the Web will start to feel like a database.”
Websets is aimed at power users who need to search for people, companies, etc. that other search engines aren’t as good at. If you ask for “startups that make futuristic hardware,” you’ll get a list of hundreds of specific companies, rather than random links to web pages that mention those terms. Google can’t do that, Bruck says. “There are a lot of valuable use cases for investors, recruiters, or anyone who needs all kinds of data sets from the web.”
Things have moved quickly since MIT Technology Review broke the news in 2021 that Google researchers were considering using large-scale language models in a new kind of search engine. The idea quickly attracted fierce critics. But technology companies paid little attention. Three years later, giants like Google and Microsoft are jockeying for a piece of this hot new trend with a slew of hot new players like Perplexity and OpenAI, which launched ChatGPT Search in October. .
Exa is not trying to surpass these companies (yet). Instead, we are proposing something new. Most other search companies wrap large language models into their existing search engines and use those models to analyze user queries and summarize results. However, the search engine itself hasn’t changed much. Perplexity still sends queries to Google Search and Bing. Think of today’s AI search engines like a stale sandwich with fresh bread inside.
More than keywords
Exa provides users with a familiar list of links, but uses the technology behind a large language model to reinvent the way searches themselves are performed. The basic idea is as follows. Google works by crawling the web and building a huge index of keywords that match your queries. Exa crawls the web and encodes the content of web pages into a format called embedding that can be processed by a large language model.
Embedding converts words to numbers so that words with similar meanings become numbers with similar values. In fact, this allows Exa to capture not only keywords but also the meaning of text on a web page.
Large-scale language models use embedding to predict the next word in a sentence. Exa’s search engine predicts the following links: If you type “startup that makes futuristic hardware,” the model will come up with (real) links that could follow that phrase.
However, the Exa approach comes at a cost. Encoding pages rather than indexing keywords is time consuming and expensive. Exa has encoded billions of web pages, Bryk said. This is second only to Google, which has indexed approximately 1 trillion items. But Bryk doesn’t see this as a problem. “You don’t have to embed the entire web to be useful,” he says. (Fun fact: “exa” means 1 followed by 18 zeros, and “googol” means 1 followed by 100 zeros.)