Last week, Cerulean attended The AI Conference in San Francisco, where engineers and researchers shared insights on AI’s rapid progress. Key topics included the increasing sophistication of AI agents, retrieval-augmented generation (RAG) techniques, and how companies are harnessing AI in their businesses.
In a conversation with an executive from a leading grocery delivery app, we learned two specific ways they are using AI:
1. AI-generated relevance labels: Search results are typically guided by human annotations for relevance. For example, a human knows ‘blueberry yogurt’ isn’t relevant when searching for ‘blueberries.’ An LLM, however, can now make this distinction as well, so the developer is using LLMs to classify the relevance of grocery items in response to search terms. This has reduced operational costs and sped up improvements to their recommendation models.
2. Identifying inaccuracies: the app provides product descriptions, but occasionally these don’t match the actual products, leading to customer confusion and returns. Manually finding these mismatches is labor-intensive. The company is using LLMs to identify inconsistencies and categorize them as urgent, medium, or minor issues to address, dramatically reducing the prevalence of inaccuracies.
Other recent applications of AI in the news:
1. Best Buy: Launched an AI-powered live delivery tracking system, offering customers real-time order updates, enhancing transparency and improving the overall delivery experience.
2. Accenture: Implementing Salesforce’s AgentForce to build autonomous sales agents capable of acting as sales reps, coaches, or service agents.
3. Amazon: Released Project Amelia, an AI assistant that helps third-party sellers resolve issues and manage their businesses more effectively.
Also last week, OpenAI introduced its new o1 model, the first in a series designed to ‘think’ before responding—enabling it to tackle more complex tasks.
Why This Matters
It’s worth taking a moment to understand the rationale behind this shift.
Since the launch of ChatGPT 3.5 in November 2022, which became the fastest-growing consumer product in history, each AI model has built upon the idea of predicting the next word. But these models are path-dependent: one mistake can lead to compounding errors.
To improve accuracy, techniques like “Chain of Thought” prompt engineering have emerged, encouraging LLMs to break down tasks into steps—similar to guiding a junior employee through a large project. OpenAI’s o1 takes this further by generating multiple responses, evaluating them, and delivering the one it calculates to be the best.
For example, here is o1 taking 16 seconds to consider the puzzle, how can 8+8=4:

Ben Thompson from Stratechery illustrated the power of o1 by asking several models to solve a 7x7 crossword from the New York Times. Here were the results of his test, showing how long each model took and whether they correctly solved the crossword:

Should You Be Using o1?
It’s definitely worth experimenting with o1 to see how it differs from existing models. However, for most everyday tasks, GPT-4o remains faster, more efficient, and better suited for internet-connected tasks. As OpenAI’s Head of Applied Research, Boris Power, put it, “It’s not a mass product that unlocks new value for everyone effortlessly.”
Looking Ahead
As companies continue to enhance their operations with existing AI tools, the arrival of a new generation of reasoning models suggests even greater potential for streamlining business processes. Take customer support as an example: while AI currently handles basic Tier 1 issues, more advanced models could manage Tier 2 inquiries as well, limiting human intervention to only the most complex Tier 3 escalations.
AI progress is evolving like Hemingway’s famous quote on going bankrupt: “Two ways: gradually, then suddenly.” We’re experiencing steady improvements, but together, they are leading to transformative changes.
Finally, our CEO had the chance to take his first Waymo ride in San Francisco. Robotics is no longer science fiction—if you haven’t seen a Waymo ride, check this out. Seeing truly is believing!