How Yelp reviewed competing LLMs for correctness, relevance and tone to develop its user-friendly AI assistant

0


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

The review app Yelp has provided helpful information to diners and other consumers for decades. It had experimented with machine learning since its early years. During the recent explosion in AI technology, it was still encountering stumbling blocks as it worked to employ modern large language models to power some features. 

Yelp realized that customers, especially those who only occasionally used the app, had trouble connecting with its AI features, such as its AI-powered assistant. 

“One of the obvious lessons that we saw is that it’s very easy to build something that looks cool, but very hard to build something that looks cool and is very useful,” Craig Saldanha, chief product officer at Yelp, told VentureBeat in an interview.

It certainly wasn’t all easy. After it launched Yelp Assistant, its AI-powered service search assistant, in April 2024 to a broader swathe of customers, Yelp saw usage figures for its AI tools actually beginning to decline. 

“The one that took us by surprise was when we launched this as a beta to consumers — a few users and folks who are very familiar with the app — [and they] loved it. We got such a strong signal that this would be successful, and then we rolled it out to everyone, [and] the performance just fell off,” Saldanha said. “It took us a long time to figure out why.”

It turned out that Yelp’s more casual users, those who occasionally visited the site or app to find a new tailor or plumber, did not expect to be be immediately talking with an AI representative. 

From simple to more involved AI features

Most people know Yelp as a website and app to look up restaurant reviews and menu photos. I use Yelp to find pictures of food in new eateries and to see if others share my feelings about a particularly bland dish. It’s also a place that tells me if a coffee shop I plan to use as a workspace for the day has WiFi, plugs and seating, a rarity in Manhattan.

Saldanha recalled that Yelp had been investing in AI “for the better part of a decade.”

“Way back when, I’d say in the 2013-2014 timeline, we were in a very different generation of AI, so our focus was on building our own models to do things like query understanding. Part of the job of making a meaningful connection is helping people refine their own search intent,” he said.

But as AI continued to evolve, so did Yelp’s needs. It invested in AI to recognize food in pictures submitted by users to identify popular dishes, and then it launched new ways to connect to tradespeople and services and help guide users’ searches on the platform. 

Yelp Assistant helps Yelp users find the right “Pro” to work with. People can tap the chatbox and either use the prompts or type out the task they need done. The assistant then asks follow-up questions to narrow down potential service providers before drafting a message to Pros who might want to bid for the job.

Saldanha said Pros are encouraged to respond to users themselves, though he acknowledges that larger brands often have call centers that handle messages generated by Yelp’s AI Assistant. 

In addition to Yelp Assistant, Yelp launched Review Insights and Highlights. LLMs analyze user and reviewer sentiment, which Yelp collects into sentiment scores. Yelp uses a detailed GPT-4o prompt to generate a dataset for a list of topics. Then, it’s fine-tuned with a GPT-4o-mini model. 

The review highlights feature, which presents information from reviews, also uses an LLM prompt to generate a dataset. However, it is based on GPT-4, with fine-tuning from GPT-3.5 Turbo. Yelp said it will update the feature with GPT-4o and o1. 

Yelp joined many other companies using LLMs to improve the usefulness of reviews by adding better search functions based on customer comments. For example, Amazon launched Rufus, an AI-powered assistant that helps people find recommended items.

Big models and performance needs

For many of its new AI features, including the AI assistant, Yelp turned to OpenAI’s GPT-4o and other models, but Saldanha noted that no matter the model, Yelp’s data is the secret sauce for its assistants. Yelp did not want to lock itself into one model and kept an open mind about which LLMs would provide the best service for its customers. 

“We use models from OpenAI, Anthropic and other models on AWS Bedrock,” Saldanha said. 

Saldanha explained that Yelp created a rubric to test the performance of models in correctness, relevance, consciousness, customer safety and compliance. He said that “it ‘s really the top end models” that performed best. The company runs a small pilot with each model before taking into account iteration cost and response latency. 

Teaching users

Yelp also embarked on a concerted effort to educate both casual and power users to get comfortable with the new AI features. Saldanha said one of the first things they realized, especially with the AI assistant, is that the tone had to feel human. It couldn’t respond too fast or too slowly; it couldn’t be overly encouraging or too brusque.

“We put a bunch of effort into helping people feel comfortable, especially with that first response. It took us almost four months to get this second piece right. And as soon as we did, it was very obvious and you could see that hockey stick in engagement,” Saldanha said. 

Part of that process involved training the Yelp Assistant to use certain words and to sound positive. After all that fine-tuning, Saldanha said they’re finally seeing higher usage numbers for Yelp’s AI features. 



Source link

You might also like
Leave A Reply

Your email address will not be published.