Ibrahim Sabek, postdoc at MIT and an NSF/CRA Computing Innovation Fellow, will present "Building Better Data-Intensive Systems Using Machine Learning."
Abstract: Database systems have traditionally relied on handcrafted approaches and rules to store large-scale data and process user queries over them. These well-tuned approaches and rules work well for the general-purpose case, but are seldom optimal for any actual application because they are not tailored for the specific application properties (e.g., user workload patterns). One possible solution is to build a specialized system from scratch, tailored for each use case. Although such a specialized system is able to get orders-of-magnitude better performance, building it is time-consuming and requires a huge manual effort. This pushes the need for automated solutions that abstract system-building complexities while getting as close as possible to the performance of specialized systems.
In this talk, I will show how we leverage machine learning to instance-optimize the performance of query scheduling and execution operations in database systems. In particular, I will show how deep reinforcement learning can fully replace a traditional query scheduler. I will also show that—in certain situations—even simpler learned models, such as piece-wise linear models approximating the cumulative distribution function (CDF) of data, can help improve the performance of fundamental data structures and execution operations, such as hash tables and in-memory join algorithms.
Bio: Ibrahim Sabek is a postdoc at MIT and an NSF/CRA Computing Innovation Fellow. He is interested in building the next generation of machine learning-empowered data management, processing, and analysis systems. Before MIT, he received his PhD from the University of Minnesota, Twin Cities, where he studied machine learning techniques for spatial data management and analysis. His PhD work received the University-wide Best Doctoral Dissertation Honorable Mention from University of Minnesota in 2021. He was also awarded the first place in the graduate student research competition (SRC) in ACM SIGSPATIAL 2019 and the best paper runner-up in ACM SIGSPATIAL 2018.
To attend virtually, email Christine Hopper for the Zoom link.