Building a Data Platform with Apache Spark on Kubernetes—Jihwan Chun & Gyutak Kim31:29 413 views 100% Published 5 months ago
Kubernetes has achieved one of the dominating platform for container-based infrastructure. Many platforms are starting to support Kubernetes as first-class and Apache Spark, an analytics engine for large-scale data processing, is one of them. From Spark 2.3, Spark can run on clusters managed by Kubernetes. PUBG Corporation, serving an online video game for 10s of millions of users, decided to migrate its on-demand data analytics platform using Spark on Kubernetes. At this talk, Jihwan Chun and Gyutak Kim will describe the challenges and solutions building a brand-new data platform project powered by Spark on Kubernetes. Sphynx, the project which will be discussed at the talk, is a platform for managing on-demand Spark clusters and connected Jupyter Notebooks as containerized applications on Kubernetes.
Visit the largest developer playground in Europe!