A Client-centric Approach to Transactional Datastores
Modern applications must collect and store massive amounts of data. Cloud storage offers these applications simplicity: the abstraction of a failure-free, perfectly scalable black-box. While appealing, offloading data to the cloud is not without challenges.…
Froid
Froid is an extensible, language-agnostic framework for optimizing imperative functions in databases. The purpose of Froid is to enable developers to use the abstraction of UDFs without compromising on performance.
MLlib*: Fast Training of GLMs using Spark MLlib
Visualization for People + Systems
Making sense of large and complex data requires methods that integrate human judgment and domain expertise with modern data processing systems. To meet this challenge, my work combines methods from visualization, data management, human-computer interaction,…
Microsoft and UW demonstrate first fully automated DNA data storage
Researchers from Microsoft and the University of Washington have demonstrated the first fully automated system to store and retrieve data in manufactured DNA — a key step in moving the technology out of the research…
Calling all aspiring women in Data Science
What started as a one-day conference organized by Stanford University in 2015, Women in Data Science (WiDS) has blossomed into a movement bringing together women data scientists and aspiring data scientists via a series of…
Get Your Data Together! Algorithms for Managing Data Lakes
Data lakes (e.g., enterprise data catalogs and Open Data portals) are data dumps if users cannot find and utilize the data in them. In this talk, I present two problems in massive, dynamic data lakes:…
Cloud computing aids researchers in solving the unsolvable in medical data labeling
It’s not uncommon for physicians to disagree about a diagnosis. That’s why people often seek a second or third opinion when faced with a serious or complex health concern. What if instead of a second…