Apache Spark Standalone Cluster on Docker

The project was featured on an article at MongoDB official tech blog! 😱

The project just got its own article at Towards Data Science Medium blog! ✨

Introduction

This project gives you an Apache Spark cluster in standalone mode with a JupyterLab interface built on top of Docker. Learn Apache Spark through its Scala, Python (PySpark) and R (SparkR) API by running the Jupyter notebooks with examples on how to read, process and write data.

TL;DR

curl -LO https://raw.githubusercontent.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker/master/docker-compose.yml
docker-compose up

Quick Start

Cluster overview

Application	URL	Description
JupyterLab	localhost:8888	Cluster interface with built-in Jupyter notebooks
Spark Driver	localhost:4040	Spark Driver web ui
Spark Master	localhost:8080	Spark Master node
Spark Worker I	localhost:8081	Spark Worker node with 1 core and 512m of memory (default)
Spark Worker II	localhost:8082	Spark Worker node with 1 core and 512m of memory (default)

Prerequisites

Install Docker and Docker Compose, check infra supported versions

Download from Docker Hub (easier)

Download the docker compose file;

curl -LO https://raw.githubusercontent.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker/master/docker-compose.yml

Edit the docker compose file with your favorite tech stack version, check apps supported versions;
Start the cluster;

docker-compose up

Run Apache Spark code using the provided Jupyter notebooks with Scala, PySpark and SparkR examples;
Stop the cluster by typing ctrl+c on the terminal;
Run step 3 to restart the cluster.

Build from your local machine

Note: Local build is currently only supported on Linux OS distributions.

Download the source code or clone the repository;
Move to the build directory;

cd build

Edit the build.yml file with your favorite tech stack version;
Match those version on the docker compose file;
Build up the images;

chmod +x build.sh ; ./build.sh

Start the cluster;

docker-compose up

Run Apache Spark code using the provided Jupyter notebooks with Scala, PySpark and SparkR examples;
Stop the cluster by typing ctrl+c on the terminal;
Run step 6 to restart the cluster.

Tech Stack

Infra

Component	Version
Docker Engine	1.13.0+
Docker Compose	1.10.0+

Languages and Kernels

Spark	Hadoop	Scala	Scala Kernel	Python	Python Kernel	R	R Kernel
3.0	3.2	2.12.10	0.10.9	3.7.3	7.19.0	3.5.2	1.1.1
3.1	3.2	2.12.10	0.10.9	3.7.3	7.19.0	3.5.2	1.1.1
3.2	3.2	2.12.10	0.10.9	3.7.3	7.19.0	3.5.2	1.1.1
2.x	2.7	2.11.12	0.6.0	3.7.3	7.19.0	3.5.2	1.1.1

Apps

Component	Version	Docker Tag
Apache Spark	2.4.0 \| 2.4.4 \| 3.0.0 \| 3.3.1	<spark-version>
JupyterLab	2.1.4 \| 3.0.0 \| 3.5.0	<jupyterlab-version>-spark-<spark-version>

Metrics

Image	Size	Downloads
JupyterLab
Spark Master
Spark Worker

Contributing

We'd love some help. To contribute, please read this file.

Contributors

A list of amazing people that somehow contributed to the project can be found in this file. This project is maintained by:

André Perez - dekoperez - [email protected] Zhiwei Zhang - [email protected]

Support

Support us on GitHub by staring this project ⭐

Support us on Patreon. 💖

Name		Name	Last commit message	Last commit date
Latest commit History 188 Commits
.github		.github
build		build
docs/image		docs/image
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.md		CONTRIBUTORS.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache Spark Standalone Cluster on Docker

Introduction

TL;DR

Contents

Quick Start

Cluster overview

Prerequisites

Download from Docker Hub (easier)

Build from your local machine

Tech Stack

Metrics

Contributing

Contributors

Support

About

Releases

Packages

Languages

License

zhiwei2017/spark-standalone-cluster-on-docker

Folders and files

Latest commit

History

Repository files navigation

Apache Spark Standalone Cluster on Docker

Introduction

TL;DR

Contents

Quick Start

Cluster overview

Prerequisites

Download from Docker Hub (easier)

Build from your local machine

Tech Stack

Metrics

Contributing

Contributors

Support

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages