Book Review: Learning Spark

Better than the official documentation to get started but not so much helpful to truly learn Spark.

Written by core developers, Learning Spark is targeted to data scientists and developers, trying to tackle big datasets in an easy way. The book succeeds in presenting the Spark capabilities. After a well-written introduction to the subject and the indispensable chapter to install Spark, the authors explains the Spark’s core abstraction for working with data, the resilient distributed dataset (RDD).

The following chapters address important topics (key/value pairs, loading/saving data and advanced features), but the content clearly lacks real-world examples. The authors lists basic examples (< 10 lines) for each supported language (Python, Scala, Java), insufficient to grasp the full potential of Spark.

The book ends with the main built-in libraries: Spark SQL, Spark Streaming and MLlib. These chapters are interesting but the examples are also too basic and the content too close to the official documentation.

In definitive, if you want to learn Spark, there is no many resources available out there, and Learning Spark is probably our best choice before a new edition of this book.

About the author

Julien Sobczak works as a software developer for Scaleway, a French cloud provider. He is a passionate reader who likes to see the world differently to measure the extent of his ignorance. His main areas of interest are productivity (doing less and better), human potential, and everything that contributes in being a better person (including a better dad and a better developer).

Read Full Profile

Tags