Zilliz Cloud is powering vector search for the enterprise 🔋

Plus: CEO Charles on unstructured data, RAG and Oracle...

Published 31 May 2024

CV Deep Dive

Today, we’re talking with Charles Xie, CEO and Founder of Zilliz.

Zilliz is an enterprise-grade vector database solution that enables large companies to store, retrieve and interact with their proprietary data. Born out of Milvus, the hugely-popular open-source vector database founded in late 2019, Zilliz is now at the forefront of building next-generation databases and search technologies for AI and LLM applications as the generative AI revolution takes hold. Zilliz Cloud, the company’s fully-managed version of Milvus, also places a huge emphasis on speed, security and scalability in the enterprise context.

Today, Zilliz Cloud is in use by some of the largest organizations in the world - including eBay, PayPal, AT&T, Walmart, IKEA and more - to simplify the deploying and scaling of their vector search applications without having to construct and maintain complex infrastructure. The company last raised a $60m round led by Prosperity7 Ventures, and included previous backers such as Temasek’s Pavilion Capital, Hillhouse Capital and 5Y Capital.

In this conversation, Charles walks us through the founding premise of Zilliz, why unstructured data is critical to the AI stack, and his background in databases at Oracle.

Let’s dive in ⚡️

Read time: 8 mins


Our Chat with Charles 💬

Charles - welcome to Cerebral Valley. First off, introduce yourself and give us some background on your work prior to co-founding Zilliz.

Hey there! My name is Charles and I’m the CEO and Founder of Zilliz. My journey in database systems began in 2002. Prior to joining Zilliz, I was one of the founding engineers of Oracle's 12c cloud database project. At Zilliz, I spearheaded the development of Milvus, a premier open-source vector database that now serves over 5,000 enterprises globally.

I've been a database engineer my whole life. I started doing database research and building various kinds of database systems in 2002 as a graduate student. I went to the University of Wisconsin-Madison for my PhD, which is a top research university specializing in building database systems. After that, I joined Oracle and worked at their global headquarters in Redwood Shores. I was one of the founding engineers of Oracle's cloud database system, which was called Oracle 12c at the time, with the 'c' standing for 'cloud.' This project was a stealth mode initiative led by Oracle's CEO Larry Ellison.

Back in 2009, Oracle was transitioning from traditional on-prem business to cloud-related business, which was still in its early stages. Larry believed in cloud computing, so he assembled a small team of five people, and I was fortunate to be one of them. I worked there for six years to build several initial versions of Oracle's cloud database system, which turned out to be a huge success. Now, every release of Oracle's database comes with a 'c.' They even decided to keep the letter 'c' forever, transitioning from earlier versions like 10g and 11g.

What insights did you have at Oracle that led you to think about leaving to co-found Zilliz?

After working for Oracle for almost six years, around 2016, I saw the rise of modern AI, driven by innovations in deep learning. I saw AI's great potential from a data perspective, especially considering that more than 80% of the world's data is unstructured. This includes images, videos, human voices, natural languages, and a lot of information collected by online companies about user behavior, interests, and profiles. In pharmaceuticals, there are 3D structures of proteins and molecules. By analyzing these structures, we can do virtual drug discovery, among other applications.

In the past 60 years, database systems have primarily focused on structured data processing. We were dealing with numbers and strings, but not images, videos, or natural languages. I was excited about the possibilities modern AI offered. In 2016, I saw the opportunity to manage and make sense of unstructured data, and I felt it was my calling. It took me just one month to leave Oracle and start my journey.

Building a vector database company six years ago was not easy. It has gained a lot of hype in the past year, but back then, no one wanted to invest in it. Investors found it to be a weird idea. I ended up funding my company with my own savings, which came from my work at Oracle and investments in companies like Tesla. I faced a critical decision: use my $300K savings as a down payment on a house or invest in my own startup. I chose to invest in ZX, and it took us a year and a half to build a prototype and attract initial investors.

Over the years, I've seen that vector databases and unstructured data processing have significantly accelerated. We've seen applications in natural language processing, computer vision (from image retrieval to video analysis), recommendation systems used by companies like Walmart and Salesforce, fraud detection, cybersecurity, and pharmaceuticals, including the analysis of 3D protein structures and virtual drug discovery.

The technology behind cybersecurity and fraud detection is similar to recommendation systems, but instead of finding the good guys, you are identifying outliers—the bad guys. As a result, we've seen a wide range of applications and a growing interest in our product.

Today, Zilliz is best known as an enterprise-grade vector database solution with a very heavy focus on open-source. Could you walk us through the evolution of Zilliz from Milvus to now?

If you look at the history of Zilliz, it's very interesting. In the first four years of building Zilliz, we put a lot of resources into building the open-source community, technology, and project. Initially, we never thought about commercializing our technology. Our long-term vision was to democratize AI data infrastructure. Vector databases are not a new technology. Hyperscalers like Google, Meta, and Microsoft have been using similar technology for over a decade. For instance, Meta has been building its image recognition engine using a vector similarity search library called FAISS for more than ten years.

However, individual developers and many other companies don't have the resources to build their own technology. That's why, from day one, we decided to build an open-source company and open-source every technology we developed. Open source has given us a huge return. Over the years, we've gained more than 5,000 enterprise users worldwide without any sales or business development personnel. It's been purely organic. We've also collected a lot of feedback from the community and open-source users, which has helped us improve the product.

Building a database system is very tough and complex. It usually requires tens or even hundreds of engineers to optimize it. As a startup with limited resources, we prioritize tasks and improvements by listening to the community. If you look at our GitHub repo, you'll see that we've resolved over 10,000 issues over the years, and the number keeps growing. This feedback has helped us iterate our product faster, roll out new features, and fix critical bugs more efficiently. Open source is definitely something that makes us unique.

We aim to avoid premature commercialization. We only started our commercialization journey last year, in the fifth year of our corporation. Our goal is to put our technology into the hands of everyone who needs it. It's all about innovation and value creation. We want to create value for our users and customers. As long as we can create value, we will become a valuable company and achieve our own success.

How do you see Zilliz progressing in the next 6-12 months? How do you see the product evolving?

Zilliz is dedicated to significantly improving the accuracy of unstructured data retrieval while also working to lower the costs involved in vector search operations.

At present, RAG systems often experience limitations, achieving retrieval accuracies in the range of 60% to 70%. This level of accuracy is not sufficient for applications where high precision is critical. To address this, we are on the brink of launching Milvus 3.0 and Zilliz Cloud 3.0. These forthcoming releases will integrate the capabilities of semantic search with Approximate Nearest Neighbor Search (ANNS), enhance text search capabilities through sparse indexing, and enable dynamic property filtering with an advanced data analytics engine. This forward-thinking strategy aims to push search accuracy rates above 95%.

On another note, it's recognized that vector database systems are generally more costly than traditional relational database systems, especially when considering that the world contains 5 to 10 times more unstructured than structured data. With this in mind, we are setting an ambitious goal to reduce the cost of vector database services by 100 times over the next 12 months.

In the short term, many people think that vector databases are a huge hype. However, from my experience building database systems over the past six years, I believe vector databases are still highly underestimated in the long term. Vector databases are all about making sense of unstructured data, which is five to ten times more abundant than structured data in the world.

If you look at the market for structured databases, you see companies like Oracle, Databricks, and Snowflake, which operate in a very large market. But the unstructured data market is ten times bigger than the structured data market, presenting even greater potential for vector databases. This makes the future of vector databases incredibly promising, especially for companies like ours working in this space.

How does Zilliz view the debate between open-source and closed-source, given Milvus made such an impact in OSS. Any perspective here?

With making unstructured data processing widely accessible in mind, we firmly believe that open-source extends beyond a mere licensing format or development approach; it embodies a philosophy that propels innovation, nurtures collaboration, and fast-tracks the adoption of new technologies. The level of innovation and growth we've realized through open-source collaboration and community engagement cannot be overstated. Milvus is now embraced by over 5000 enterprises worldwide, with its downloads and installations exceeding 20 million. Imagining reaching such milestones without the open-source approach is unthinkable to me. My firm belief is that open-source is particularly crucial in the realm of AI technology, forming the bedrock for developing AI technologies that are not only secure, trustworthy, reliable, and manageable but also ethical. This stands as a crucial determinant in shaping the future of human destiny.

Zilliz is the pioneering force behind the Milvus project. Initially created by Zilliz and subsequently contributed to the Linux Foundation, Milvus stands as an open-source vector database system renowned for its high performance, scalability, and readiness for enterprise use, capable of delivering search results on a billion-scale vector dataset within milliseconds. To complement this, Zilliz has introduced Zilliz Cloud, a fully-managed cloud service that hosts the Milvus vector database, offering users seamless and cost-effective database solutions.

What’s Zilliz’s internal approach to choosing what to build? How does the team prioritize?

As an open-source company, Zilliz has developed three main engines to guide its decisions on what to build and how to prioritize: community feedback, customer input and usage data, and technical vision and innovation.

The importance of these engines varies at different stages of the company's development. When we decided to develop Milvus five years ago, our technical vision and the drive for innovation were the primary motivators. Between 2019 and 2022, we devoted a significant amount of effort to building the open-source community, with developers serving as our main source of information. Over the past year, as we embarked on our commercial journey, we have gradually introduced a data-driven approach that incorporates customer feedback into our internal decision-making process.

Looking ahead, it's clear that our approach will continue to evolve, balancing these three critical dimensions to guide our future direction.

How does your team ingest and apply the latest in AI when research is evolving at such a fast pace every week? And how do you think about balancing research vs. productization?

Continual learning, maintaining curiosity, and embracing a spirit of exploration are essential. At Zilliz, we dedicate time and resources to cultivate a dynamic environment of learning. As an example, we conduct weekly paper-reading sessions every Friday, where we explore and exchange knowledge on the latest advancements in AI research.

I hold an optimistic view on the evolution of AI technology and its myriad applications. Instead of passively awaiting its maturity, we've taken an active role in shaping its future. Five years back, we established our research division at Zilliz, marking the beginning of a journey filled with notable achievements. Over the past three years, our team has made significant contributions to the field, presenting 5 papers at prestigious database conferences such as SIGMOD, VLDB, and ICDE, underscoring our dedication to research and academic exchange. Furthermore, we're integrating AI technologies into our operational workflows to boost the core functionalities of our vector database.

By focusing on query optimization and system parameter tuning, we aim to pioneer the development of a fully autonomous database for unstructured data processing.

Lastly, how would you describe the culture at Zilliz? Are you hiring? What do you look for in prospective team members?

Our culture at Zilliz is fundamentally mission-driven, with a clear focus on democratizing the processing of unstructured data.

In the realm of AI, I believe there are three pillars: algorithms and models, computational hardware, and the processing of unstructured data. Our dedication lies in creating a robust platform for the latter. We are on the lookout for exceptional individuals who share our vision and are eager to contribute to our mission. Indeed, we are in the process of expanding our team, seeking talents ranging from engineers to go-to-market specialists.


Conclusion

To stay up to date on the latest with Zilliz, follow them on X and learn more about them at Zilliz.

Read our past few Deep Dives below:

If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email (newsletter@cerebralvalley.ai) or DM us on Twitter or LinkedIn.

Join Slack | All Events | Jobs

Subscribe to the Cerebral Valley Newsletter