Pongo reduces incorrect RAG outputs by 80% 📈

Plus: Co-founder Caleb on RAG, multimodal and vector search...

Published 21 May 2024

CV Deep Dive

Today, we’re talking with Caleb John, Co-Founder and CEO of Pongo.

Pongo reduces incorrect RAG outputs by 80%. They do this using a semantic filter, which developers can incorporate into their RAG applications using one line of code. Founded in 2023 by Caleb and his co-founder Jamari Morrison, Pongo’s goal is to help developers build better AI experiences.

How it works: Pongo’s semantic filter uses multiple models to analyze the query and document together, thus minimizing information loss and hallucinations in your outputs. As a result, developers get much more accurate results compared to vector and hybrid search approaches alone. Pongo’s target audience are AI developers, and they currently have hundreds of developers incorporating their tool into their RAG workflows.

In this conversation, Caleb walks us through the founding premise of Pongo, why simplicity is the future of the AI stack, and Pongo’s goals for the next 12 months.

Let’s dive in ⚡️

Read time: 8 mins

Our Chat with Caleb 💬

Caleb - welcome to Cerebral Valley! First off, give us a bit about your background and what led you to co-found Pongo.

My name is Caleb, and I'm from Seattle. I've been an entrepreneur my whole life. I taught myself to code when I was eleven years old and started a freelance business after that for a bit. My first job was at a company called Cedar Robotics while in high school, where I was doing robots for restaurants. That startup actually got featured in TechCrunch when I was 16 years old - I ended up winning engineering awards for the localization system I built utilizing computer vision and sensor technology.

I ended up going to college at the University of Washington for a year, then dropped out to join Anduril, where I worked on their AI platform and focussed on a lot of interesting projects. After that, I started Pongo, where I and my co-founder Jamari explored various ideas. Over time, we realized that semantic search was something we could add a lot of value to - it was very broken and a real bottleneck for applications. We went really deep into it and developed this new technology that we call a semantic filter.

Give us an overview of Pongo for those who haven’t heard of you before. How would a new user get started?

Think about it this way: Pongo, with one line of code, can reduce incorrect or partially correct RAG outputs by 80%.

Integrating Pongo is literally one line of code. After pulling the results from your vector store, and right before sending them to your LLM for the actual RAG output just add "pongo.filter()” and you’ll see a boost right away. You could get up and running in 5 to 10 minutes and have this in production in a half hour.

Who are your users today? Who’s finding the most value in what you’re building with Pongo?

Today we're working with early-stage companies that have a critical need to get the right answer or have a non-trivial data set to search. For example, we have a company that is like Perplexity for skincare products. Their customers ask long, complex queries, and they have to find the exact product to serve their specific case. They were one of our most first customers, and we've been able to help them build a better experience for their users.

Broadly-speaking, we didn't expect e-commerce to be a use case, but Pongo ended up being useful for AI e-commerce startups. We also have companies in the healthcare and insurance space using our products - these are high-stakes environments where you must provide the right answer to users. We’ve found our audience in the application layer - we really enable applications to take the next step and provide a better experience to their end customers.

Lastly, I think one of the most exciting use cases for us is actually agents. Going forward, we haven't seen companies put agents into production yet because, if you look at their success rate, a three step agent with a 90% success rate at each step means the system only works about 73% of the time. If you move it to 98%, it works about 94% of the time. We can help a lot of these agent companies take that next step and make retrieval not be a bottleneck anymore. What excites me is the potential of what people will build in the future with Pongo. Agents are just the tip of the iceberg.

There are a number of teams exploring building with RAG. Why should developers incorporate Pongo into their stack?

For us, vector search is a great starting point, but it does fall flat when you have something with more complex meaning or multiple subjects - it's tough to capture the entire semantic meaning in a single vector. The correct answer is to often rank two or three, or not in the top ten at all. This is due to information loss from having a single vector and not having the context of what the query is.

After you retrieve results from your vector database or search index, you send them to us and we use our proprietary ranking algorithm and a multitude of models to ensure your LLM has the optimal documents in the optimal order.

Could you share some metrics around performance and latency? What impact does Pongo have on these?

We do add a little bit of latency over vector search alone because, well, nothing will always be faster than something.

Pricing-wise, we have a couple of tiers. Our Lightning tier is a bit more expensive than our standard Deploy tier, which starts at $60 a month for 60,000 queries. We also have a $250 a month plan for 350,000 queries. For 100 documents and around 500 tokens, you’ll see around 550 to 600 milliseconds of latency. In the Lightning tier, you'll see around 350 to 400 milliseconds for the same workload.

If you're sending this to GPT right after, your users probably won’t even notice the added step. That's the workflow we recommend. We had to put a lot of work into making it performant.

What’s been the biggest technical challenge around building Pongo to where it is today?

The biggest challenge was putting together infrastructure that's fast and reliable. We had to build a bunch of stuff around load balancing and serving requests from scratch. Luckily, with our background in AI infrastructure, we knew how to handle a lot of this. Serving these models efficiently was key because if you were to self-host the same setup, like spinning up a GPU on AWS, you'd see latencies of five to ten seconds if you actually chained everything together.

Figuring out when to use which model and how to combine them was also a challenge. That's a whole internal process we have. The combination of these factors made it incredibly difficult to get this into production. It's a big reason why we're confident in our moat—we know this isn't something you can just copy over a weekend.

Multimodal and agents are two very hot spaces in AI - how does either factor into your roadmap over the next few months? And, aside from those, what else do you have coming up?

Multimodal is 100% on our roadmap. It should be out in the next six to eight months. This will allow us to facilitate those workloads more effectively. With agents, I see us as a foundational element for a lot of them. If you need a knowledge graph or something similar, our product will be extremely useful.

Additionally, if you have an agent that relies on finding the right piece of content or document, we can help with that too. Both of these applications are very exciting, and we look forward to helping companies work with them.

Those aside, our product is pretty simple, so we haven't seen a bunch of feature requests. The biggest thing customers asked for when we initially launched was to cut latency. About three or four weeks after launch, we pushed an update that cut our latency by about 80% and increased performance significantly - that was our first major update.

The biggest thing customers love is that we solve one of their RAG issues. They often have cases where a query doesn’t work correctly, but with Pongo, it starts working. It’s almost like a magic wand that erases all your ranking problems so you can focus on other stuff. It's very much like a magic wand that erases all your RAG problems. You can focus on other stuff.

As a small team, how do you stay up to date with the latest happening in AI research?

I mean, I just have to say this: I'm so chronically online and addicted to information. I try almost every single model I see, whether it’s an LLM or embeddings, whatever gets dropped. I'm glued to it. Anytime there's something new, I try it. That's actually a big reason to choose to build on Pongo.

If you’re an application company out there, instead of re-vectorizing or having to keep up with the newest and greatest stuff, we’ll keep updating it behind the scenes. We handle RAG so you don't have to. We're constantly updating and looking for the newest advancements, but our number one priority is always stable infrastructure and a great experience that companies can rely on.

Lastly, tell us about the team you’re trying to build. What makes you and your co-founder special, and what are you looking for in new team members?

So my co-founder and I actually met on a Discord server. Funnily enough, this was a bit later during COVID. We ended up becoming really good friends and raised our first round of funding before we had met in person. He has an incredible story. I’m from Seattle, and he’s from the Midwest. He grew up in Japan and Jamaica—a very fascinating guy. It was a random connection, but he turned out to be absolutely committed and dedicated to building a startup.

Working with him has been incredible. We’re both technical, so we built a lot of stuff on our own. We did a lot of experimentation until we were confident we had a product that customers really love, has a strong position in the market, and is in a market we want to be in long-term. This product checked all three boxes, and now we’re looking to expand our team.

If you’re an AI researcher in NLP or a software engineer, we’d love to chat with you. Feel free to send me an email, and hopefully, we can connect soon.

Anything else you want to let people know about Pongo?

Don't just look at the benchmarks. Try it out for yourself. We have benchmarks on our site, but really there’s nothing more powerful than trying it out and seeing it work with your own eyes.

Conclusion

To stay up to date on the latest with Pongo, follow them on X and learn more about them at Pongo.

Read our past few Deep Dives below:

If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email (newsletter@cerebralvalley.ai) or DM us on Twitter or LinkedIn.

Join Slack | All Events | Jobs