Purchasing the book

You can purchase the book on Amazon and Packt.

With this book, you will learn about a wide variety of topics including Apache Spark and the Spark 2.0 architecture; build and interact with Spark DataFrames using Spark SQL; learn how to solve graph and deep learning problems using GraphFrames and TensorFrames respectively; and read, transform, and understand data and use it to train machine learning models with MLlib and ML.

Spark SQL Engine / Catalyst Optimizer

Table of contents:

  1. Understanding Spark
  2. Resilient Distributed Dataset
  3. DataFrames
  4. Preparing Data for Modeling
  5. Introducing MLlib
  6. Introducing the ML Package
  7. GraphFrames
  8. TensorFrames
  9. Polyglot Persistence with Blaze
  10. Structured Streaming
  11. Packaging Spark Applications

The code samples within this book can be found at: https://github.com/drabastomek/learningPySpark.