Andy, aka noootsab, is a mathematician turned into a distributed computing engineer, mainly in the Geospatial world. When the Big Data age came in, he decided to enjoy it at most and created NextLab, a Big/Smart Data oriented company. Since then, he had fun working for IoT, Genomics, Automotive and Smart cities projects. Building Spark jobs, feeding Cassandra rings and shooting data with machine learning guns. He's also a certified Scala/Spark trainer and wrote the Learning Play! Framework 2 book for Packt Publishing.
So you already wanted to tryout some big data, or more precisely, distributed computing tools?
Thus, Apache Spark is the one to go. However it can be tedious to setup at the very first time, or sometimes you want to launch it for quick experiments (like a simple REPL).
For all these cases, the Spark Notebook is probably what you're looking for.
Using Apache Spark, Spark Notebook and Docker, we'll see how to setup a simple environment, execute some analyses and share your work.