Better than the official documentation to get started but not so much helpful to truly learn Spark.


Written by core developers, Learning Spark is targeted to data scientists and developers, trying to tackle big datasets in an easy way. The book succeeds in presenting the Spark capabilities. After a well-written introduction to the subject and the indispensable chapter to install Spark, the authors explains the Spark’s core abstraction for working with data, the resilient distributed dataset (RDD).

The following chapters address important topics (key/value pairs, loading/saving data and advanced features), but the content clearly lacks real-world examples. The authors lists basic examples (< 10 lines) for each supported language (Python, Scala, Java), insufficient to grasp the full potential of Spark.

The book ends with the main built-in libraries: Spark SQL, Spark Streaming and MLlib. These chapters are interesting but the examples are also too basic and the content too close to the official documentation.

In definitive, if you want to learn Spark, there is no many resources available out there, and Learning Spark is probably our best choice before a new edition of this book.

About the author

Julien Sobczak works as software developer for Scaleway, a French cloud provider. As an avid reader, his main area of focus are developer productivity, mental literacy, and everything that resolves around personal development.

Read Full Profile

Tags