DNAstack, a World Economic Forum Technology Pioneer, is on a mission to accelerate collaborative discoveries in genomics and health for infectious diseases, neuroscience, oncology, rare diseases, and more.
Although a software company at heart, DNAstack is also a connector of a global community of precision medicine initiatives, research consortiums, patient advocacy groups, hospitals, startups, funders, pharma companies, and governments. Traditionally, health research is facilitated by the use of data sharing agreements, but given the number of stakeholders and volumes of sensitive data needed to drive meaningful insights, DNAstack knew that the traditional approach could not match their ambitions.
“There's a timely opportunity to advance precision medicine if we can only harness the collective power of the world's siloed genomics and health data. To date, the only alternative has been to centralize data, which is not scalable or secure. We believe that the future of discoveries will be made by connecting and analyzing across globally federated data networks."
Dr. Marc Fiume, CEO, DNAstack
DNAstack developed a federated data sharing platform to make it easy for researchers to share, explore, and access distributed genomic and health data while protecting data privacy. They turned to integrate.ai’s private machine learning platform to enable researchers to ask questions and generate insights across their federated networks.
DNAstack understood that its success in helping researchers uncover the mechanisms underlying complex diseases was a function of the quantity of individual patient data available for training machine learning models on its platform.
Outside of healthcare, the primary approach to doing machine learning with distributed data is to centralize the data in a data lake or data warehouse. However, due to the privacy risks and costs of sharing identifiable patient health information, data centralization is impractical.
DNAstack chose to leverage federated learning, a privacy-preserving machine learning method, to empower researchers to collaboratively generate insights across patient data sets.
“We're exploring federated learning as a means to enable joint analyses of data across different data silos while avoiding data movement — both for cost and privacy reasons.”
Jim Vlasblom, CTO, DNAstack
Unlike traditional machine learning, federated learning enables researchers to train models without bringing the data together. The central federated learning server transmits training instructions to each private data server, where a local model is trained. Local model parameters are sent back to the federated learning server, where they are aggregated into one global model.
DNAstack initially explored open source frameworks to build a federated learning solution, but open source came with hidden costs and was not easily scalable. It required significant engineering work to set up the federated learning system, including deploying and managing a persistent controller, or a new controller for every workflow run. Furthermore, machine learning models with differential privacy capabilities would need to be coded manually.
Working with integrate.ai allowed DNAstack to accelerate their own mission without needing to reinvent the wheel.
As a fully hosted and managed federated machine learning service, the integrate.ai platform facilitates the secure execution of multiple parallel federated machine learning sessions that operate on data served through the DNAstack platform. The easy to use integrate.ai SDK embeds seamlessly into DNAstack’s software, providing a scalable solution. integrate.ai serves as the federated learning controller, and manages all session security through API tokens. Built-in support for most machine learning model types and default differential privacy makes the solution applicable across disease areas and data types.
“When dealing with genomics and health data at scale, we need a robust commercial solution. We need to be able to scale to every cloud and hundreds of thousands of nodes, if needed. integrate.ai can stand up servers and communicate to clients much more smoothly, and with much less manual work, than other solutions we explored.”
Dr. Marc Fiume, CEO, DNAstack
In addition to integrate.ai’s technology, the two companies also share a common vision and cultural alignment. The integrate.ai team’s responsiveness and helpfulness throughout the partnership has enabled DNAstack to focus on its core mission of bringing researchers together to accelerate collaborative discoveries
With federated learning technology embedded into its federated data sharing platform, DNAstack is helping researchers access more data to drive faster discoveries and crack the code on complex disorders.
DNAstack leads the Autism Sharing Initiative, an international collaboration to create the largest federated network of autism data, and is using integrate.ai’s platform to empower better genetic insights and accelerate precision healthcare approaches to the disease area.
“Autism is complex and research has shown the value of connecting massive datasets to drive critical insights. Federated learning will empower us to ask new questions about autism across global networks while preserving privacy of research participants.”
Dr. Marc Fiume, CEO, DNAstack
DNAstack plans to continue to leverage integrate.ai’s federated learning platform as it unlocks collaborative research opportunities in oncology, rare diseases, and other disease areas.
Integrate.ai provides a set of developer-friendly APIs for health platforms and applications to extend their products’ capabilities for machine learning and analytics on distributed health data. Connect with us and learn more about opportunities to partner together.