FTC’s Take on Data Clean Rooms: The Hidden Risks and the Path Forward

The US Federal Trade Commission (FTC) recently posted an article on their Technology Blog which demystifies some nuanced but important aspects of Data Clean Rooms (DCRs). In a nutshell, DCRs, while a useful tool in the modern data platform landscape, always invoke some form of data sharing, and thus are not inherently safe and come with some hefty risks that must be carefully managed. 

The Rise of Data Clean Rooms in an AI World

Taking a step back, those in the enterprise data platform world may recognize DCRs as not a new concept, but they’ve been gaining broader attention recently with the increasing interest in collaboration on sensitive data in this age of enterprise AI. Historically, DCRs have mainly been used for evaluating and exchanging consumer data across collaborating organizations for advertising and audience building purposes. But now with rapidly increasing AI-driven data needs, enterprises are on the hunt for tools and platforms that can enable safe data usage, especially when it involves sensitive data and collaboration across multiple organizations. For example, if an organization wishes to purchase third party data to improve the performance of one of their models, they will want to test the predictive value of that data before purchasing it, while keeping the data from both parties safe.

What’s the Catch? The Risks of DCRs

On the surface, DCRs hit a lot of the right notes: data safety controls; collaboration; configurability. But as the FTC article points out, the devil’s in the details. The practical reality of using DCRs comes with the following risks:

  • Privacy: By default, DCRs are not privacy preserving and don’t automatically prevent impermissible disclosure or use of data.
  • Increased Exposure: DCRs can increase the risk of data leaks and breaches by adding more data access 
  • Misconfiguration: Default configurations allow both parties full access to all the data, making mistakes and misconfigurations remarkably easy and costly

The root cause of these risks is the underlying mechanism of how DCRs work: they enable workloads (e.g., examination of records, joins, analytics, model development) on the combined data via some form of data movement and sharing. And since data must move to enable the workloads, the precise configuration of the controls (e.g., what data is visible to which parties, what data can be exported, what work loads can be performed) is both challenging and risky. 

Where organizations are looking for a platform to enable repeatable, scalable, safe-by-default ways to collaborate on data with external parties, DCRs are not fit for purpose since each collaboration and workload requires close scrutiny to ensure appropriate configuration. At scale, this becomes an untenable overhead. Further, where there are strict data residency requirements on the data as a result of regulations or contracts, it can be hard or impossible to use DCRs while ensuring compliance. 

It’s worth noting that some DCRs will advertise the non-movement of data as a feature, but this refers to convenience-oriented capabilities that do not require the explicit export of data outside of the customer’s environment. In reality, these DCRs are just abstracting what still requires some form of data movement and co-mingling in order to execute the workloads. For example, Delta Sharing enables convenient cross-environment workloads, but fundamentally involves data movement.

A New Paradigm: Federated Learning

This fundamental problem with data movement is what has motivated integrate.ai to take an entirely different approach to data collaboration. Using Federated Learning as a foundational technology, it’s possible to execute a range of workloads in a distributed manner, without sharing or centralizing raw data. There are of course trade-offs (e.g., Federated Learning explicitly does not support direct examination of raw records) and where the ultimate goal is to create and export joined data sets, DCRs may be the most appropriate tool. But for an increasingly common range of modern data science workloads, from analytics to AI model development at scale, using a Federated Learning platform provides both the safest and most flexible approach. 

Given the complexity of these systems, the FTC has done an important job of highlighting the implications of how DCRs work. Determining fitness for purpose is challenging, and marketing terminology often doesn’t help, especially in the context of the risks involved.

To learn more about the Data Clean Rooms and how they compare to Federated Learning, you can download our whitepaper.

Similar posts

News, insights, and opinions about federated learning and analytics.
Close Cookie Preference Manager
Cookie Preferences
By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage and assist in our marketing efforts. Privacy Policy
Strictly Necessary (Always Active)
Cookies required to enable basic website functionality.
Made by Flinch 77
Oops! Something went wrong while submitting the form.
Cookie Preferences
X Close
Please provide a business or institutional email to continue.
We have a date... to federate!

Your request for a 14-day free trial has been received. You will receive an email within 1 business day with instructions to access your account.
Oops! Something went wrong while submitting the form.