TRAININGS

Getting started with Spark

Getting started with Spark (2 days)

Apache Spark has emerged as the next generation big data processing engine, and is being applied throughout the industry faster than ever. Compared to Hadoop’s MapReduce it is much faster, much easier to use due to its rich APIs, and goes far beyond batch applications to support a variety of workloads, including interactive queries, streaming, machine learning, and graph processing. This course provides all the essentials to get you started with Spark quickly from using the Spark API interactively, learning the the details of available operations and distributed execution, and understanding the advantages of higher-level libraries for SQL, stream processing and machine learning.

Course Outline

  • Introduction to Spark
  • Programming with RDDs
  • Working with Key/Value pairs
  • Loading and saving your data
  • Advanced Spark programming
  • Spark SQL
  • Introduction to Spark streaming
  • Machine Learning with MLlib

Who should attend

Data Scientists, statisticians, and information technology engineers who want to get started with Spark and need to make better use of their data.

Did not find the training you are looking for? Please feel free to ask for any other Advanced Analytics training.

Contact us