Getting started with Spark

Getting started with Spark (2 days)

Apache Spark has emerged as the next generation big data processing engine, and is being applied throughout the industry faster than ever. Compared to Hadoop’s MapReduce it is much faster, much easier to use due to its rich APIs, and goes far beyond batch applications to support a variety of workloads, including interactive queries, streaming, machine learning, and graph processing. This course provides all the essentials to get you started with Spark quickly from using the Spark API interactively, learning the the details of available operations and distributed execution, and understanding the advantages of higher-level libraries for SQL, stream processing and machine learning.

Course Outline

Introduction to Spark
Programming with RDDs
Working with Key/Value pairs
Loading and saving your data
Advanced Spark programming
Spark SQL
Introduction to Spark streaming
Machine Learning with MLlib

Who should attend

Data Scientists, statisticians, and information technology engineers who want to get started with Spark and need to make better use of their data.

Did not find the training you are looking for? Please feel free to ask for any other Advanced Analytics training.

TRAININGS

Getting started with Spark

FUNDAMENTALS

GENERATIVE AI

ADVANCED ANALYTICS & ALGORITHMS

USE CASES & SOLUTIONS

DATA SCIENCE PROGRAMMING

SMART FACTORY TRAINING SERIES

DATA VISUALIZATION & MONITORING

PARTNER TRAININGS