Vana is championing 'user-owned AI' for the masses 🌏

PLUS: CEO Anna on decentralization, AI alignment and Vana's mission...

Published 05 Apr 2024

CV Deep Dive

Today, we’re talking with Anna Kazlauskas, Co-founder and CEO of Vana.

Vana is a decentralized platform that lets people reclaim their data and make it portable across applications. Founded by Anna and co-founder Art in 2021, Vana is built on the idea of “user-owned AI”, where users run a node and host their own data on Vana, bringing it with them to different apps, and can even combine their data with others to create collectively owned models. Short for ‘Nirvana' - a play on freeing data - Vana aims to be the foremost digital playground for your AI to build relationships, do economic work and live freely from Big Tech’s data silos.

Today, Vana has millions of users on its platform, using it for a range of activities from self-exploration to participating in data collectives for building user-owned AI. The startup also went viral last year with people wanting to turn their data into AI clones for use across social media. The startup has raised funding from Paradigm Capital, Polychain Capital, Packy McCormick and more to build the digital future for your AI clones.

In this conversation, Anna walks us through the founding premise of Vana, why decentralization is a critical component of the AI revolution, and her goals for the next 6-12 months.

Let’s dive in ⚡️

Read time: 12 mins

Our Chat with Anna 💬

Anna - welcome to Cerebral Valley. First off, give us a bit of background on yourself and what led you to start Vana?

Hey! I’m Anna, Founder of Vana. A little bit about myself: I’ve always been super interested in the world of programming and modeling the world with data. I learned to program on a graphing calculator in middle-school, and in high-school I got really into economics and how central banks work. I spent time interning at the Fed before going to MIT, where I got really into decentralization and how it could be used to impact currencies and markets. I later ended up at the World Bank, where I ended up automating a bunch of their document processes with ML - and before I knew it, I was selling this document-sorting software I’d created to government agencies. So, I ended up dropping out of MIT and going through YC. This was in 2017, which is when the Attention is all you need paper had just come out.

While at MIT, I was taking classes with Regina Barzilay, who was a leader in the NLP space. It was still early days for the generative models we’re seeing today, but I saw that the only thing that matters for these models is data. If we have better data to train them, we get much better models - and so ultimately, the thing that's important is having very high quality data to create models from. If you look at how data is owned today, it mostly sits in the siloes of Big Tech, and so they’re really positioned to build super-powerful AI. But, there are ways where you can have the same decentralized approach that people took with currencies and apply it to data.

This is what Vana is doing - we’re focussed on the question of: how do we use the tools that have worked really well for decentralizing finance and apply them to data in AI? How can we make a user-owned modesl, where people are still in control of their own AI rather than at the whims of a big tech platform that could cut them off if they don’t like what they were saying?

Fast-forward, I met my co-founder Art while doing my undergrad at MIT. He comes from a legal background and was doing grad school at Harvard, and was previously selling data to large companies like Facebook and figuring out how to get people to directly sell their data. So we’re both pretty deep in this world of data ownership!

Describe Vana to someone who hasn’t heard of it before. What does having personalized AI mean in its fully realized form?

Vana is a decentralized platform that lets people reclaim their data so that it is portable across applications and can fuel the creation of AI built on collective data. We think of it as ‘user-owned AI’ - comparable to Urbit or Solid Project, which are personal server architectures where you can run a node on your own and host your data, and then bring it with you to different apps. Vana is similar, plus a permissions and incentives layer that makes it so that even if most users don't want to self-host, they still have a way to interact and maintain that same agency.

One term that you hear a lot in AI is the ‘alignment problem’ - meaning, how do we align AI with human values? But, the reality is that every single human being has different values - so the idea of making one AI that represents all of our values seems literally technically impossible.

From my perspective, everyone should solve their own alignment problem of having their AI exist in a certain way, maybe by giving it a whole bunch of context on yourself and your past via your notes or messages, for example. The overall premise is around customizing AI to have a really deep understanding of yourself, your values, your preferences and your experiences. With this, you could unlock completely new applications over the next 1-2 years - for example, something as fun as watching a Netflix show about you and your friends, all the way to having a very intense debate with an AI version of yourself and simulating alternate realities.

With AI, it really matters how these models are trained, and you want to avoid having it be censored in a way that you disagree with or that feels biased. One example of this going wrong is Google Bard, which attempted to rewrite history on top of being blatantly offensive. When AI becomes our source of truth, whoever has that AI shouldn’t be the one who decides what is going to be true for everybody. We think that everyone should choose their own truth and have that control and agency over their own model.

Who are Vana’s users today? Who’s finding the most value in what you’re building?

Two years ago, when we were building these tools to help you bring your data across applications, people were like “What would I use this for? Why would I care about portability or bringing my data together?” Now, as we’ve started to see generative AI models come to life, that's unlocked a lot of user interest in us. A lot of our users harness their data to create models of themselves - for example, image models are really popular right now, and we also support voice, text and personality.

One unexpected emergent use-case is people from our community pooling their data in collectives for the creation of user-owned foundation models. Just this week, the world’s first Data DAO, r/datadao (www.rdatadao.org) launched on the Vana network. It was incredible to see how Vana users are so mission-aligned. The DAO was built to protest against Reddit selling user data to Google for $60 million a year and has already hit over 20k members. Experiencing waves of virality has been a huge learning for me in building consumer products. How can you build a tech stack and a team that can scale and rise to these special moments? If you unlock that, then you can capitalize on these unexpected wins.

We want to offer users as much control over their data as they want so we also have an option for users to self-host Vana with their data and models to run on their MacBook. The crowd that does this is definitely more tech-forward and hobbyist - an experimental locallama crew, for example, who use it for self understanding, searching their data, and memory type stuff. This feature is still quite early, and we just released that a few weeks ago.

Lastly, we have some really awesome AI consumer devs building on our Vana API, which lets you onboard a user’s data and model much more easily than having to onboard them within your own application. One app that’s taken off here is Chirper, which is a social network only for AIs - no human beings allowed. Chirper lets people create fictional characters that live autonomously in their digital world, which fits in nicely with Vana, where we’re focused specifically on exploring the boundaries of agency and freedom with AI.

I'm excited to see emergent use cases emerge for Vana as everyday people become more familiar with the possibilities in decentralized AI and devs become more familiar with the image API, the text API, and the underlying data too. There is a great deal of interest right now in spinning up other data collectives to build user-owned foundation models.

How do you see Vana’s position in the AI space evolving over the next 6-12 months, given the insane pace of AI breakthroughs happening on a weekly basis? What are you most excited about?

The first thing I’d highlight is around cost, which is such a bottleneck in AI today. If you’re a consumer app that goes viral, congrats - you now have a $50k bill you owe AWS. Of course, the local, self-hosted models enable much more interesting applications because you’re no longer cost-constrained; today, though, so much energy is going towards cost reduction, which is limiting innovation. For example, agent products that require 100 OpenAI API calls for every interaction are going to be way too expensive for individuals to work on themselves. So, I’m definitely excited about the cost of inference coming down to unlock cool applications on Vana.

The other set of applications that I'm excited about are around AI creating economic value - what does it mean when your AI can earn money for you and actually do ‘work’? I think an early successful version of this was CarynAI - influencers scaling themselves in NSFW industries, which is often where emerging technologies will find their first use-cases. That said, I think there’s a huge opportunity for people to scale themselves in an interesting way, and have their AI go join the workforce and earn money.

It is great to see data collectives emerge on Vana, signaling a future of user-owned foundation models. These collectives allow you to own a piece of the core technology that powers your AI. It makes it possible to build AI like open source software in a way that benefits everyone who contributes. The technical architecture of data DAOs can be applied to model DAOs, where users and developers contribute data, compute, and research in exchange for ownership and use of the model. I’m excited about collectively owned models, especially as model merging techniques advance to allow a distributed group of users to train large, capable models.

What’s the hardest technical challenge around what you’re building with Vana?

Some of our biggest questions are: how do you personalize models and make data portable in a way that works across applications? How do you modify the model so that it's usable, but also be able to store some fraction of the amount of information we have in our human brains? What does it mean for my model to be a personal model? Broadly, our biggest challenge is around model personalization.

The other question we’re thinking about is: how can you use model personalization to create better models across many users? The furthest out example of this is making a better user-owned foundation model where you have 100 million people contributing their personal piece of the model, and stacking them together. For example, how do you get 100 of the best psychologists to train an amazing model on their notes and their process and combine it all together? This is hard to do even if you have all the data in the same place, but it’s even harder if you want to do it in a distributed way where you're training part of the model on every single person's device, such that you can have a really strong privacy guarantee. That's just a really hard technical problem.

Broadly-speaking, our biggest question is how do we make data portable and easy to use? We’re working with a super-personal dataset that we have to also keep secure, and we do a lot of client-side encryption to achieve that. How do you get that portability while still having a strong convenience, all while having strong security guarantees?

Tell us how you navigate the choppy waters of data privacy and controls, when users are porting over their personal data in order to make the AI clones of themselves realistic?

One thing to note is that we want to be very careful around how we communicate the data security aspects of what we do to the user - because questions around data privacy and security do tend to spook a lot of people. As a product person, you actually want your product to work super seamlessly and then have strong security guarantees.

If you want the strongest guarantee, you can actually run your model locally and then none of your data will ever leave your machine. Your personal model stays secure, as you're running inference locally and then returning the output, and you can use your personal model across the different applications. So, if you want to use an AI dating app that simulates your future with another person, but you also don't want to upload all of your messages and journal entries and the history of all your breakups into a random application, that gives you a strong guarantee.

For the hosted version of the application, we encrypt all of the data and then put the users in control. The other thing I would mention is actually having strong terms of service and our privacy policy that both state that the user owns their data and models - all of it is theirs. I think that's a piece that people often don't think about because they just gloss over it, but you can actually give people very strong guarantees from a legal perspective too, in addition to the technical side of things.

Describe the culture of your team today. Are you hiring, and what do you look for in prospective team members?

We’re currently a team of 14 people - we like keeping a very small team, and we’re very build-heavy. Culture-wise, I’d say that everyone is very mission-driven - we all believe that users should own their data, and that's somewhat of an ideological foundation for us. We’re also really excited by the challenge of what we’re working on - a user-owned foundation model trained by 100 million people is not like a YC SaaS company that you sell in six months. This is going to take a while, and I think everyone's aligned towards that.

Lastly, I’d say we try to foster a culture of kindness and love throughout the team, which I’m very grateful for - like, how can you give a little more love to everyone throughout their day? This is super important to us.

Conclusion

To stay up to date on the latest with Vana, follow them on X(@withvana) and learn more at them at Vana.

Read our past few Deep Dives below:

3/15: Baseten is pushing the boundaries of AI Inference ⚡️
3/11: Martian's interpretable alternative to the Transformer 🔌
3/8: Our chat with Groq's Chief Evangelist, Mark Heaps
2/26: MultiOn is building software with a brain 🧠
2/23: Galileo AI's groundbreaking prompt-to-UI tool ✨
2/19: Our chat with OpenAI’s Logan Kilpatrick

If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email (newsletter@cerebralvalley.ai) or DM us on Twitter or LinkedIn.

Join Slack | All Events | Jobs | Home