Learning PySpark videos are up!

In this tutorial, we provide a brief overview of Spark and its stack. This tutorial presents effective, time-saving techniques on how to leverage the power of Python and put it to use in the Spark ecosystem. You will start by getting a firm understanding of the Apache Spark architecture and how to set up a Python environment for Spark.

First, you'll learn about different techniques for collecting data, and distinguish between (and understand) techniques for processing data. Next, we provide an in-depth review of RDDs and contrast them with DataFrames. We provide examples of how to read data from files and from HDFS and how to specify schemas using reflection or programmatically (in the case of DataFrames). The concept of lazy execution is described and we outline various transformations and actions specific to RDDs and DataFrames.

Finally, we show you how to use SQL to interact with DataFrames. By the end of this tutorial, you will have learned how to process data using Spark DataFrames and mastered data collection techniques for distributed data processing.

Style and Approach

Filled with hands-on examples, this course will help you understand RDDs and how to work with them; you will learn about RDD actions and Spark DataFrame transformations. You will learn how to perform big data processing and use Spark DataFrames.

Table of Contents

  • A BRIEF PRIMER ON PYSPARK
  • RESILIENT DISTRIBUTED DATASETS
  • RESILIENT DISTRIBUTED DATASETS AND ACTIONS
  • DATAFRAMES AND TRANSFORMATIONS
  • DATA PROCESSING WITH SPARK DATAFRAMES

What You Will Learn

  • Learn about Apache Spark and the Spark 2.0 architecture.
  • Understand schemas for RDD, lazy executions, and transformations.
  • Explore the sorting and saving elements of RDD.
  • Build and interact with Spark DataFrames using Spark SQL
  • Create and explore various APIs to work with Spark DataFrames.
  • Learn how to change the schema of a DataFrame programmatically.
  • Explore how to aggregate, transform, and sort data with DataFrames.

The course can be purchased here: https://www.packtpub.com/big-data-and-business-intelligence/learning-pyspark-video

PySpark and TensorFrames!

PySpark and TensorFrames---a bridge between Spark and TensorFlow---were the topics of a workshop by Denny Lee and Tom Drabas at PyData Seattle on July 5, 2017.

 

Things covered:

  • Neural networks and deep learning
  • Feature learning
  • Feature engineering
  • TensorFlow introduction
  • Building a multinomial logistic regression and a Convolutional Neural Network to recognize handwritten digits (MNIST)
  • Tensorframes introduction

In case you missed it -- here are the materials we prepared for this event: http://aka.ms/pydatatfs

Denny and Tom on stage presenting PySpark and TensorFrames
Denny and Tom on stage presenting PySpark and TensorFrames

Learning PySpark is available on Amazon!

You can now order paperback version of our book (should you prefer that format) from Amazon and Packt!
You can now order paperback version of our book (should you prefer that format) from Amazon and Packt!

 

Long coming but it's finally here!

 

You can now order paperback version of our book (should you prefer that format) from Amazon and Packt!

Follow either of these links to get your copy: http://bit.ly/learnPySparkAmazon or http://bit.ly/learnPySparkPackt!

 

While checking out our book don't forget to pre-order the awesome book by Holden Karau -- High-performance Spark: http://bit.ly/holden_HPS!

Learning PySpark is getting published!

We are super excited to inform you that our Learning PySpark book will be released next week!

Head first into the big and fast data world with PySpark!

Over the past 8 months Denny and I have both been working tirelessly to get all the material done for this book. We spent countless hours  playing with PySpark, devising the code and writing up are finally coming to fruition and we hope you will like what you read.

Hopefully, the material we came up with will help you to learn your ways around PySpark and will be generic enough so you can adapt it to your needs!

Big thanks!

Words of appreciation go to Packt Publishing for their support along the way.

Also, the fabulous Holden Karau has been instrumental in improving the content of this book! Without her reviews the book would not have been as good as it is right now! Big thank you!

Don't forget to get your copy at Amazon and Packt Publishing!