Vana plans to let users rent out their Reddit data to train AI. -

WhatsApp Group Join Now

Telegram Group Join Now

Instagram Group Join Now

Image credit: Tech Crunch

In the creative AI boom, data is the new oil. So why shouldn’t you be able to sell yours?

From big tech firms to startups, AI makers are licensing eBooks, images, videos, audio and more from data brokers, all of which are more capable (and more legally defensible) AI-powered products. Are in pursuit of product training. Shutterstock has signed deals with Meta, Google, Amazon and Apple to supply millions of images for model training, while OpenAI has signed deals with several news organizations to train its models on news archives. have done

In many cases, the individual creators and owners of this data have never seen a dime of cash change hands. A startup called Vana wants to change that.

Anna Kazlauskas and Art Abal, who met in a class at the MIT Media Lab focused on building technology for emerging markets, co-founded Vana in 2021. Automation startup, Iambiq, out of Y Combinator. A corporate attorney by birth, training and education, he was an associate at Cadmus Group, a Boston-based consulting firm, before heading Impact Sourcing at data annotation company Appen.

With Vana, Kazlauskas and Abal set out to build a platform that lets users “pool” their data — including chats, speech recordings and photos — into datasets that are then used to train creative AI models. can go. They also want to create more personalized experiences—for example, daily motivational voicemails based on your wellness goals, or an art-creating app that understands your style preferences—that By fitting public models to the data.

“Vana’s infrastructure essentially creates a treasure trove of user-owned data,” Kazlauskas told TechCrunch. “It does this by allowing users to collect their personal data in a non-custodial way … Vana allows users to own AI models and use their data in AI applications.”

Here’s how Vana presents its platform and API to developers:

The Vana API combines cross-platform user personal data … to allow you to personalize your application. Your app gets instant access to the user’s personalized AI model or underlying data, simplifying onboarding and eliminating compute cost concerns … We think users should keep their personal data in walled gardens, e.g. Instagram, Facebook and Google, should be able to be brought to your application, so you can create an amazingly personalized experience the first time a user interacts with your AI application.

Creating an account with Vana is quite easy. After verifying your email, you can associate data with digital avatars (such as selfies, self-descriptions and voice recordings) and explore apps built using Vana’s platform and datasets. can. The app selection ranges from ChatGPT-style chatbots and interactive storybooks to a Hinge profile generator.

Image credit: Vana

Now why, you might ask—in this age of data privacy awareness and rising ransomware attacks—would anyone ever volunteer their personal information to an anonymous startup, much less venture-backed? Are you successful? (Vana has raised $20 million to date from Paradigm, Polychain Capital, and other backers.) Can any for-profit company really be trusted not to abuse or misappropriate any monetizable data it has? But his hand?

Image credit: Vana

In response to this question, Kazlauskas stressed that the whole point of Wana is for users to “take back control of their data”, noting that Wana users have the option to control their data. Self-hosted and controlled data is shared with apps and developers instead of being stored on servers. He also argued that, since Vana makes money by charging users a monthly subscription (starting at $3.99) and charging devs “data transaction” fees (e.g. for transferring datasets for AI model training), the company are encouraged to exploit users and the wealth of personal data they bring with them.

“We want to build models that are owned and governed by users who all contribute their own data, and allow users to bring their data and models with them into any application,” Kazlauskas said.

now that Vana It’s not selling user data to companies to train generative AI models (or so it claims), it wants to allow users to do it themselves if they start with their Reddit posts.

This month, Wana launched what it called the Reddit Data DAO (Digital Autonomous Organization), a program that aggregates multiple users’ Reddit data (including their karma and post history) and lets them decide How this shared data is used. After joining with a Reddit account, submitting a request to Reddit for their data, and uploading that data to The DAO, users license shared data to generative AI companies for shared profits with other DAO members. Get the right to vote on decisions like giving. .

We’ve crunched the numbers and r/datadao is now the largest data DAO in history: Phase 1 welcomed 141,000 reddit users with 21,000 complete data uploads.

— r/datadao (@rdatadao) 11 April 2024

This is sort of a response to Reddit’s recent move to commercialize data on its platform.

Reddit previously did not grant access to posts and communities for AI training purposes. But it changed course late last year ahead of its IPO. Since the policy change, Reddit has raised more than $203 million in licensing fees from companies including Google.

“The broad idea [with the DAO is] to free user data from the big platforms that try to store and monetize it,” Kazlauskas said. “This is the first and allows people to use user-owned data to train AI models. It’s part of our effort to help collect in sets.”

Unsurprisingly, Reddit — which isn’t working with Vana in any official capacity — isn’t happy about The DAO.

Reddit banned Vana’s subreddit dedicated to discussion about The DAO. And a Reddit spokesperson accused Vana of “exploiting” its data export system, which is designed to comply with data privacy regulations like GDPR and the California Consumer Privacy Act.

“Our data arrangements allow us to police such entities, even on public information,” a spokesperson told TechCrunch. “Reddit does not share non-public, personal data with commercial entities, and when Redditors request the export of their data from us, they receive non-public personal data back from us in accordance with applicable laws. Reddit And direct partnerships between vetted organizations, clear terms and accountability, matters, and these partnerships and agreements prevent misuse and misuse of people’s data.”

But does Reddit have any real reason to be concerned?

Kazlauskas envisions The DAO growing to the point where it affects what Reddit charges users for its data. That’s far-fetched, assuming it ever happens. The DAO has only 141,000 members, a tiny fraction of Reddit’s 73-million-strong user base. And some of these members may be bots or duplicate accounts.

Then there is the matter of how to fairly distribute the payments that the DAO can receive from data buyers.

Currently, The DAO rewards users who match their Reddit karma with “tokens” — cryptocurrency. But karma may not be the best measure of quality contributions in a data set — especially in smaller Reddit communities where there are fewer opportunities to earn it.

Kazlauskas floats the idea that DAO members could choose to share their cross-platform and demographic data, potentially making the DAO more valuable and incentivizing signups. But that would require consumers to trust Vana even more to responsibly serve their sensitive data.

Personally, I don’t see Wana reaching KDAO in a big way. There are many obstacles in the way. However, I think this won’t be the last grassroots effort to gain control over the data that is increasingly being used to train generative AI models.

Startups like Spawning are working on ways to allow creators to implement rules to guide how their data is used for training, while vendors like Getty Images, Shutterstock and Adobe Continue to experiment with compensation schemes. But no one has cracked the code yet. Can do this too. to be Cracked? Given the cutthroat nature of the generative AI industry, this is certainly a tall order. But maybe someone will find a way — or policymakers will force it.

Post Views: 362

WhatsApp Group Join Now

Telegram Group Join Now

Instagram Group Join Now

Leave a Comment Cancel reply