Tuesday, December 24, 2024

A startup trying to turn the web into a database

by [email protected]
0 comments

“The web is a collection of data, but it’s a mess,” says Will Bryk, co-founder and CEO of Exa. “There’s a Joe Rogan video here and an Atlantic article there. There’s no organization. But the dream is that the Web will start to feel like a database.”

Websets is aimed at power users who need to search for people, companies, etc. that other search engines aren’t as good at. If you ask for “startups that make futuristic hardware,” you’ll get a list of hundreds of specific companies, rather than random links to web pages that mention those terms. Google can’t do that, Bruck says. “There are a lot of valuable use cases for investors, recruiters, or anyone who needs all kinds of data sets from the web.”

Things have moved quickly since MIT Technology Review broke the news in 2021 that Google researchers were considering using large-scale language models in a new kind of search engine. The idea quickly attracted fierce critics. But technology companies paid little attention. Three years later, giants like Google and Microsoft are jockeying for a piece of this hot new trend with a slew of hot new players like Perplexity and OpenAI, which launched ChatGPT Search in October. .

Exa is not trying to surpass these companies (yet). Instead, we are proposing something new. Most other search companies wrap large language models into their existing search engines and use those models to analyze user queries and summarize results. However, the search engine itself hasn’t changed much. Perplexity still sends queries to Google Search and Bing. Think of today’s AI search engines like a stale sandwich with fresh bread inside.

More than keywords

Exa provides users with a familiar list of links, but uses the technology behind a large language model to reinvent the way searches themselves are performed. The basic idea is as follows. Google works by crawling the web and building a huge index of keywords that match your queries. Exa crawls the web and encodes the content of web pages into a format called embedding that can be processed by a large language model.

Embedding converts words to numbers so that words with similar meanings become numbers with similar values. In fact, this allows Exa to capture not only keywords but also the meaning of text on a web page.

Screenshot of the web set showing search results: “Corporate, Startup, US Based, Healthcare Focus, Technology Co-Founder”

Large-scale language models use embedding to predict the next word in a sentence. Exa’s search engine predicts the following links: If you type “startup that makes futuristic hardware,” the model will come up with (real) links that could follow that phrase.

However, the Exa approach comes at a cost. Encoding pages rather than indexing keywords is time consuming and expensive. Exa has encoded billions of web pages, Bryk said. This is second only to Google, which has indexed approximately 1 trillion items. But Bryk doesn’t see this as a problem. “You don’t have to embed the entire web to be useful,” he says. (Fun fact: “exa” means 1 followed by 18 zeros, and “googol” means 1 followed by 100 zeros.)

You may also like

Subscribe For Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

Will be used in accordance with our u00a0Privacy Policy

Copyright ©️ 2024 The Leader Report | All rights reserved.