Data Ownership In AI’s Age

You swim in data. You create new data every day. If your health app counts your steps? It’s new data. Oura ring that tracks your biometrics? Valuable data. Your posts on social media, even the stupid jokes that got zero likes? More data.

These are all data that AI companies would love to harvest. You cannot build good AI without good data, which is why many consider data as the “new oil” during the AI. The problem, though, is that even if your data is valuable In theoryThe reality is that it is difficult to make money from your own personal data as you have no leverage as an individual. (Open AI is not knocking on your door to buy your old tweets.)

Go into Vana. “I think data is this basic resource that drives the next generation of AI, and really the next generation of our digital economy,” says Anna Kazlauskas, co -founder of Vana and CEO of Open Data Labs. “Many people are honestly not aware that they actually own their data.”

But you own your data. And it is valuable … if you can somehow go with millions of others who also own their data. This would give you negotiating power. And that is VANA’s mission: to create an ecosystem for user -owned data, which in turn burns the user -owned AI.

This ecosystem involves a mix of Data Daos (a “Workers’ Union” for data), decentralized data market sites, the recently launched VRC-20 token and a new collaboration with Flower Labs to build the world’s first user-owned basic model. (Exhibition A, who decentralized AI, crawls into mainstream: Vana/Flower collaboration was covered by the cable.)

Kazlauskas will give a keynote speaker at the AI summit in Consensus 2025, outlining this vision and she gives a glimpse here. And she sees the momentum change. “We are already starting to see this shift where more people are aware that ‘my data is really important to AI’ and ‘I am actually the owner of it.'” She predicts that over 100 million users about 100 million users will be on board. In 10 years? “The world’s population. Over 10 billion.”

Interview has been condensed and easily edited for clarity.

Why is user -owned data so important to you?

Anna Kazlauskas: Most people assume that data is owned by the platforms on which they sit, but that is not the case. Just as when you put your car in a parking lot, the parking lot doesn’t own your car. You can always take it back. You have full ownership of it.

And there is a huge amount of money earned today, mostly by large tech companies, out of this data, but users are the legal owners. So I think it’s important that we restore that ownership, both from a user perspective and from a developer’s perspective.

Can you connect the dots for how this helps developers?

As a developer, especially in an AI world, it is really important to have access to the right data. And it’s super hard to do right now because most of the data are locked inside the fenced gardens in Big Tech. So many of my really smart friends doing things in AI are working on the big laboratories, because that’s where the data is and that’s where the calculation is. But that doesn’t have to be the case.

How does Data Daos exactly fit into this vision?

So a Datadao is a bit like a trade union for data. Where you basically have a large group of people gathering their data together and then can make collective decisions about what happens to this data.

The reason why it is important is that your data alone is so useful, right? It is much more useful when there is a large pool of it. When there is plenty of it to train an AI model.

What are some of the data you are most excited about?

There are a few in the health room that are really interesting. There is an early one that actually performs full export of patient medical items, which I think can really help promote a lot of research in the room. There are some related to biometrics, sleep and health. There is one with dlp [Driver Loyalty Program] Labs; They build car data. And within their data sets, the Tesla data is really interesting because most people think of Tesla as valuable because they have a data cord, right? In fact, users can get a lot of that data set.

You turn from theory to practice with the new collaboration with Flower Labs to build Collective-1. What is the target there?

Collective-1 is the first user-owned foundation model. Usually when people think of a foundation model, they typically think of a company that runs a very large training job in a single data center, right? Like Openai. And the reason why it is typically done in a centralized way is because it requires, one, a whole lot of computing power and two, a whole lot of data.

Flowers AI are kind of leading in federal [decentralized] training. They have done a really good job of building these large open source libraries. They have come in from the training side and the algorithm. And with Vana, we really focus on that dating piece, right? So we basically have all this data that people can train. Then you give users ending ownership of the model and users can decide what the model is allowed to do? So this is the first foundation model of its kind.

And the theory is that you eventually with better data can build AI that is not only Competitive with the central players but betterIs that right? So it’s not just about ideology, but also performance.

Exactly, yes it is 100% right. From a decentral context, I often believe that people in principle agree that “yes, we should have AI, owned by the people. We should have decentralized ai.” But what is it that we can actually do better in a decentralized context? Data is the answer. For each company, they only have their individual sections of a dataset. Apple has their data. Googles has their data. But if you review the user, you can cut across platforms and actually build better data sets than a single company. Data is the secret sauce that makes it all work.

Love it. Thanks Anna, see you at the AI summit in Toronto.

Jeff Wilser hosts the AI summit in Consensus 2025 and hosts People’s AI: The Decentralized Ai Podcast.

Must Read

Leave a Comment Cancel Reply