Langfuse (YC W23), the open-source LLM engineering platform 💻

Plus: Co-founders Clemens and Marc on evals, multi-modal and YC...

Published 14 Jun 2024

CV Deep Dive

Today, we’re talking with Marc Klingen and Clemens Rawert, Co-Founders of Langfuse.

Langfuse is an open-source platform designed to help developers iterate and improve their LLM applications. Co-founded by Marc, Clemens, and Max during their time at YC in 2023, the platform started with application tracing and has since expanded to include evaluations, manual annotations, testing and experimentation, as well as collaborative prompt management. Today, Langfuse supports a wide range of integrations by being framework and cloud provider agnostic. Their mission is to empower engineering teams with tools to efficiently build and manage complex LLM applications.

Currently, Langfuse is used by a diverse set of customers, from YC startups to large enterprises including notable companies such as Khan Academy, who use them to keep tabs on quality, cost and latency of their apps. The startup has gained traction for its robust open-source approach and production-ready capabilities, making it a valuable tool for anyone involved in LLM development.

In this conversation, Marc and Clemens take us through the founding story of Langfuse, what makes Langfuse unique, and their roadmap for the next 12 months.

Let’s dive in ⚡️

Read time: 8 mins

Our Chat with Marc and Clemens 💬

Marc and Clemens - welcome to Cerebral Valley! First off, just give us a bit about your backgrounds and what led you to co-found Langfuse?

Together with Max, our third co-founder, we joined Y Combinator (YC) last winter for the W23 batch. During the batch, we pivoted through a ton of ideas involving LLMs. We eventually spent the most time working on code generation agents to go from GitHub Issue to Pull Request. We realized our demos were cool, but back then it was hard to make this useful in practice. Working with LLMs introduced completely new workflows and it required new tooling. Existing MLOps and DevOps tools didn’t adapt too well to it because of the experimentation and evaluation-driven nature of the workflow. This and the multi-step logic of building and prompting LLM applications are really new problems in software engineering.

At the same time, we noticed that everyone around us had similar problems. This gave us the confidence to just go ahead and build something we found useful.

One big unlock of YC for us was actually finding that first set of users. With so many companies in the batch, we found a few people working on interesting applications with LLMs that were game to work closely with us. We chatted about potential workflows and started solving their problems. The initial companies we started working with are still important customers today (e.g. Alphawatch.ai, BerryApp or juicebox.ai).

Give us a top-level overview of Langfuse - how would you describe it to the uninitiated AI/ML developer or enterprise team?

Our one-liner is: "Langfuse - the open-source LLM engineering platform”. We started out with application tracing, which represents step-by-step the most important steps (LLM, retrieval, non-LLM functions) that happen within an LLM application. From there, we expanded to include other features around evaluations, manual annotation, testing, and collaborative prompt management. There are many aspects of developing an application that our platform connects to, but tracing and building a rich set of data is really at the core. Everything is open-source and well integrated both within Langfuse and across other open source projects (such as Llama Index, LangChain, LiteLLM, Posthog and more)— that's what we do.

Who are your users today? Who’s finding the most value in what you’re building with Langfuse?

We focus on engineering teams that build complex LLM applications, agents, and chains—anything more complex than just passing through LLM responses. We started with 100% startups and individual developers, and now we have about 50% startups, including many YC companies, and 50% larger companies and enterprises. Going open source has really helped us to work with platform teams in large corporations that now deploy our software as part of their LLM AI stack. For example, we could recently announce that Khan Academy is running on our platform.

From day zero, we offered both a cloud version and a self-hostable version. The cloud version is free to sign up for, completely product-led, and attracts a lot of individual developers and startups who want to get started quickly. However, making self-hosting super easy has been important for us personally and that’s made Langfuse an easy choice for teams in larger organizations to try without having to go through procurement or review processes.

There are a number of teams focussed on tracing and evals in the generative AI space - what sets Langfuse apart? What are you doing differently from the others?

We wrote a bit about this on our website, explaining our approach. Since we are open source, we offer the flexibility to self-host, but it’s not a requirement. We are agnostic to using a specific framework, LLM model, or cloud provider. We are targeting teams of professional developers that need a highly performant and flexible tool. We believe this piece of the application stack needs to be open because it's the intellectual property of our users - you would not want to be locked-in to a vendor and their strategy here. That’s why we put it Langfuse out there in open source, with an open and powerful API, and invest heavily in our core tracing technology. One big differentiator for us is that we focused on production readiness from day one, ensuring our system is asynchronous and not in the critical path.

You can use our platform in development, but when you scale out to production, you don’t need to worry about us breaking your application. That was our number one goal. We are now on version 2.54.0, or whatever iteration we’re at. We take semantic versioning and maintaining backward compatibility very seriously. Serious people depend on us, and that has become a key value proposition for us.

There’s been an explosion of interest in agentic workflows. Any compelling use-cases that you’d like to highlight, that Langfuse is having a high impact with?

Currently, we have a good split between various types of agents on Langfuse, like RAG agents and more state machine-based agents. In the end, it’s all about tracking them and having full observability into quality/evals, cost and latency across iterations and experiments.

I was recently very excited about how juicebox uses Langfuse to reduce latency of their AI recruiting co-pilot. Creating a great customer experience often comes down to giving the user a simple interface and managing 20 parallel calls on the backend that may use smaller models instead of just one big prompt. Tracing at the core of our observability solutions makes analyzing and optimizing for latency really easy.

Many of our other customers are very focused on quality and building a loop of user feedback, model-based evaluations, and manual annotation in production and development. It’s the usual process of getting an application from development to production by learning what does not work in production and continuously building datasets in development to benchmark and improve an application with good evals.

How do you measure success with Langfuse’s product? Any specific customer stories you’d like to share with our audience?

We spoke with a large European Fintech yesterday for a case study we will publish soon. They told us about being able to reduce the latency of their application by half through tracing the impact individual calls made. By analyzing and benchmarking different solutions to the individual steps of their application, they found out they could swap an LLM call out against a “traditional” algorithm and - bang - they made a core part of their application twice as fast.

What’s the hardest technical challenge around building Langfuse?

The most difficult part has been the production tracing aspect, really getting all of our SDKs and integrations right. We integrate with libraries like Llama Index, LangChain, Haystack, and the OpenAI SDK + built native SDKs/decorators for Python and JS/TS. Maintaining a great developer experience while common workflows and model/framework interfaces change rapidly has been a challenge. We want to make it easy to get started with Langfuse and to only then subsequently adopt other features. Each library and LLM produces different events, and we have to standardize them all to make them useful within Langfuse and ensure that it works not just in a notebook, but also in production environments.

All of our integrations are fully async, add minimal latency, and do not affect uptime of your application. That has been a core design objective and consistent effort given various execution environments. It’s been 1 year since launching the core integrations - trust us, these are now very well-tested and performant.

Another challenge is that more and more teams are pushing for more no-code or iterative capabilities within Langfuse. We need to nail the UI/UX for this because, right now, Langfuse is a very technical product designed for people who don’t mind reading through the documentation and tinkering themselves. You’ll see us adding more opinionated features that teams can pick up and use right away in addition to the more abstract and flexible core of Langfuse.

Could you give us the top few items on your roadmap for the next 6-12 months? What are you currently focussed on most?

Our Roadmap is open-source. We’re in a fast-moving space and we very actively communicate with and take feedback from our users on what we will build next.

Beyond that, on a macro level, the challenge is how to nail supporting the whole developer lifecycle from development to handling millions of traces at scale. Especially for the local development piece, we want to double down on CI testing. For example, once you have a dataset and want to test on it, we aim to provide an experience with all green checks for testing for regressions. There are a couple of good point solutions to achieve this. We feel that we are in a great position to create something meaningful on top of our datasets product.

Beyond that, the analytics part is crucial. How can you dissect 100 million traces in a useful way and help the user zoom in on where he can improve his app. This means classifying by intent and understanding, for example, why something broke in a very complex agent. This would involve more of a diagnostic kind of experience. And, of course, multimodal support is a big one we're looking into.

There’s a huge debate around the importance of open-source in AI today - could you talk to us about the impact of open-source on Langfuse’s product development?

There's a big debate on the model layer. On the application layer and the infra layer where we build, we think this is less pronounced or controversial. The space is moving so quickly, so being open-source just helps us work with our users, customers, and partners much more effectively. We maintain tons of integrations with other teams that are also open-source. We believe there's an open ecosystem for building great applications, and we want to focus on that. We try to embrace the open source spirit with open roadmaps and open discussions with customers about what we should build. – If we were application developers we would not have liked to rely on a closed source solution for such a critical piece of our stack.

Fun fact, being OSS helped onboard our first set of users when we initially launched on Hacker News and ProductHunt.

Most of our strategy, and even our enterprise sales documents, are out there publicly because it helps us scale quickly by making everything accessible. If someone asks a question, we just add it to the document so we don't have to answer it again, which has been really helpful. We're inspired by a few open-source companies. For example, within the YC context, PostHog is doing this really well. We now heavily lean on GitHub discussions to publicly discuss the project’s roadmap, which we like a lot and helps us see patterns of feedback from our users.

So, it's about getting feedback and, as Clemens mentioned earlier, by being open-source, we now have a big footprint in the enterprise. I think most startups don't go for the enterprise early, especially not with a small bottom-up team. We didn't expect it to work this well, but we’re really happy about it.

How do you both keep up with all of the ongoing developments in AI?

We feel like we’re sitting at the firehose because most of our customers build really exciting applications with AI. When a new model is launched, all of our users jump on it, and we can listen to their feedback. What we are most interested in is learning about new techniques that teams use to build useful applications.

Our users try so many things and come up with requests that make us think, “Oh, interesting, we never expected or saw that before.” Whenever we hear something like this, we drop it into a Slack channel within our team. Everyone is seeing so many things, we love to share these and say, “This is cool, have a look at it.”

Personally, I like the “Last Week in AI” podcast, which provides some paper and news summaries that I might otherwise miss. I can listen to it on my bike. And then obviously Twitter.

Tell us a bit about your team - are you hiring, and what do you look for in prospective members joining the team?

We’re a team of five - three co-founders, Marc, Max, and me, and two engineers, Hassieb and Marlies. We all work in person in Berlin, Germany. We’re hiring two or three more people over the next couple of months. As we focus on our OSS product, our entire team is technical, and we'll only hire engineers for the foreseeable future.

In terms of hiring, we’re looking for a backend engineer that will help with scaling this product to tens and hundreds of thousands of users, each with high workloads. We’re also hiring a product engineer, which is basically someone that holistically works across the entire product, takes features from inception to launch, and continues to iterate on them personally with high ownership.This could be someone that wants to start a company themselves soon.

Lastly, we want to add a developer advocate to the team. We want someone technical who is excited about open source and working closely with devs. We want someone to advocate for them and really focus on how we can communicate better: through our docs, through content and in our product.

We’re a small team. We're looking for people with high ownership, that are interested in OSS, devtools, and building with LLMs. We all chat with users daily, take their feedback, improve the product, and then go back to users. Everyone is entrepreneurial and enjoys owning whole aspects of the overall product.

Talk to us about YC - what are some of the most important benefits you received personally from going through the program this year?

Maybe I’ll just throw out a few bullet-points where it really helped us. It gave us a great community of people facing similar challenges. This is huge because starting a company can be pretty lonely and just straight tough some days. Then, it definitely pushed us to move a lot faster. We thought we were already moving quickly, but in the batch, it was all about bouncing ideas off each other, like, "Hey, why not try building it in two days instead of a week?"

The space was so new. I think if we hadn’t been working on our coding agent alongside the rest of the batch, we wouldn’t have made it this far. Everyone was trying to figure out how to make something useful with LLMs, and it was just an awesome community to be part of.

Last question - as native Berliners building in AI, what are the strengths and weaknesses of the European ecosystem compared to Silicon Valley?

I think we’re lucky to know people who are very SF-minded, here in Berlin. But overall, the quantity is obviously lower.

Europe does have a lot going on. You mentioned you just spent time in Paris with Cerebral Valley. There’s also London, Zurich and we’re in Berlin, and there’s other places with exciting things happening and really, really good people. The big but is, they’re still spread across different countries whereas the Bay Area is so much more concentrated. It’s the global center for AI and startups.

It’s funny, because we spend a lot of time in SF and end up meeting a lot of Europeans in San Francisco that we don’t meet ‘back home’. For instance, we have an integration with Posthog, and the founders are based in the UK. We met in San Francisco, discussed it and hatched the plan here instead of meeting in Europe. So, that anecdote might tell you a bit about where things stand.

Conclusion

To stay up to date on the latest with Langfuse, follow them on X and learn more about them at Langfuse.

Read our past few Deep Dives below:

If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email (newsletter@cerebralvalley.ai) or DM us on Twitter or LinkedIn.

Join Slack | All Events | Jobs