Here we will take you through setting up your development environment with Intellij, Scala and Apache Spark. You will be writing your own data processing applications in no time! 

Read More on Learn Scala Spark: 5 Books Every Spark & Scala Developer Should Own

Intellij, Scala and sbt

Prerequesities

  • Some technical knowledge. E.g you know how to download and install software and start a terminal session
  • Java 8 installed

Install Intellij

Intellij is an IDE that integrates well with Scala and Spark. Intellij is available in two editions: Ultimate and Community. The community edition is free so we recommend you get started with that.

  1. Download IntelliJ IDEA community edition for your operating system.
  2. Install Intellij on your system
    • Windows
      1. Run the ideaIC.exe or the ideaIU.exe file you have downloaded.
      2. Follow the instructions in the install wizard.
    • macOS
      1. Double-click the ideaIC.dmg or ideaIU.dmg file you have downloaded to mount the macOS disk image.
      2. Copy IntelliJ IDEA to the Applications folder.

Starting Intellij for the First Time

When starting Intellij for the first time you will be greeted with a set up window. You don’t need to change much here until you get to the “Install additional plugins” window.

Click the button to install the Scala plugin for Intellij. You can click through the rest of the windows.

Scala Hello World

It’s time to create your first Scala project using Intellij and sbt! This will also verify the install completed succesfully in the previous steps.

  • In Intellij click new project
  • The default project is set to Java. Make sure to select Scala and sbt if you already have the scala SDK installed. Or click download to create a new scala SDK install.
  • Give the new project a name. For example learnscalaspark
  • Choose the right version of Scala: 2.11.12
  • Choose the right version of sbt: 0.13

The new project could take a few minutes to set up as it downloads jar files and dependencies from the internet. Once done you will see a new project in the “project” window of Intellij called learnscalaspark.

It contains the following structure:

  1. A src directory with the structure src/main/scala. src/main/scala is the root directory to hold your Scala code.
  2. .idea file. This contains intellij files
  3. Project file. This contains additional plugins and settings for sbt
  4. Target file. This contains generated files such as the .jar file
  5. A build.sbt file containing the following
    • name – The name of the project
    • version – project version (0.1)
    • scalaVersion – scala version
name := "learnscalaspark"

version := "0.1"

scalaVersion := "2.11.12"

Now you have set up the project, follow these steps to create your hello world Scala application. 

  • Right click on the src/main/scala directory 
  • Click new -> Scala Class
  • Change the type from Class to Object and name the Object HelloWorld
  • Replace the code with the below snippet
object HelloWorld {

  def main(args: Array[String]): Unit = {
    println("Hello  World")
  }

}
  • Right click the HelloWorld object and click “Run”

After running the application you should see “Hello World” printed to the console. If you have any issues with the above steps don’t hesitate to ask in the comment section below!

Congratulations! You have just ran your first Scala program using Intellij. Next we will demonstrate how you can use sbt to build your project.

sbt

sbt is an open source build tool for Scala and Java projects. It has native language support for compiling Scala code and integrating with common Scala test frameworks.

We are going to be using sbt to build the HelloWorld Scala application we wrote in the previous step. This will package your Scala code into a .jar file which can then be migrated to other environments to be run. Building a .jar file is especially useful for big data applications, where you might be developing your data processing code on your local machine using a small subset of data, then executing it on a large cluster with the power to process the full dataset.

You can run sbt from within Intellij, however Intellij comes with its own version of the Scala compiler. In some cases (if you have very complex Scala structures) the Intellij Scala compiler can throw errors, even if your code is correct. Therefore it is safer to download sbt and run sbt commands from the command line using the normal Scala compiler.

  • Download the relevant version of sbt for your system from here. For windows you will need the .msi file.
  • Open a command prompt/ terminal session and cd to your project directory. You can easily get the path by right clicking on the project folder in Intellij and clicking get path.
  • Check you are in the right place. if you run ls in your console you should see your project files including the build.sbt
  • Now run sbt package 
  • This will build the .jar file. You should see this output in the target/scala-x/ directory
  • Run the program by using sbt run command. You should see hello world printed to the console

You have now written, built and ran your first Scala application! Next we will add Apache Spark to your project.

Apache Spark

Adding Spark Dependencies

Now we will demonstrate how to add Spark dependencies to our project and start developing Scala applications using the Spark APIs. Firstly, we need to modify our .sbt file to download the relevant Spark dependencies.

  • Add the following  line to the .sbt file

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"

  • Click enable auto import, or click the refresh button within the sbt window in the top right corner of intellij.
  • This will download the core spark dependencies. It might take some time to download all the files. You should be able to see the .jar files being downloaded within the external libraries directory.

Set up winutils

Winutils is required by the HDFS apis on windows machines. This allows Spark to read from several different file types including HDFS, s3, local and many others. 

Follow these steps to install winutils:

  • Click here to download the 64bit winutils.exe file
  • Create a new directory structure as follows C:/hadoop/bin
  • Set up a new environment variable called HADOOP_HOME and point it at the new Hadoop directory
    • Open the windows search bar and type “environmental variables”
    • Click “Add Environmental Variables
    • There are two types of environmental variables: user variables and system variables.
    • Click “add” to add a new system variable
    • Name: HADOOP_HOME
    • Value: C:/Hadoop
  • Now choose “Path” and append a new entry separated by a semi colon. (Do not delete the value already there)
    • Click “Path” then click “Edit”
    • Add new entry %HADOOP_HOME%\bin

That’s it, all finished.

Spark “Hello World”

Finally, we are going to modify the HelloWorld object we wrote earlier to run using Spark. This will verify the spark packages downloaded correctly and demonstrate how to initialise a spark context on our machine.

Replace the HelloWorld object with the following code:

import org.apache.spark.{SparkConf, SparkContext}

object HelloWorld {

   def main(args: Array[String]): Unit = {

      val conf = new SparkConf().
         setMaster("local").
         setAppName("LearnScalaSpark")
      val sc = new SparkContext(conf)
      sc.setLogLevel("ERROR")

   val helloWorldString = "Hello World!"
   print(helloWorldString)

   }
}

Notice there are a few extra lines compared to our previous version. Firstly, we import the Apache Spark packages required to initialise the spark context. We then create a rudimentary spark configuration consisting of master local and an app name (click here to find out more about Spark configuration options). Next the Spark Context is initialised using the configuration and we also set the logging level to “Error”.

When you click “run” you will see a lot of messages printed to the console. Spark normally makes a lot of noise when starting up so don’t worry too much about these. Finally, you should see HelloWorld output to the console.

Summary

Congratulations! Getting to this point means you now have a complete Spark and Scala development environment set up and tested on your machine. 

You have:

  • Installed Intellij
  • Installed and tested Scala using the sbt build tool
  • Set up and tested Spark using the sbt build tool

Next, we will show you how to use Spark to process some real NASA asteroid data because, well, who doesn’t love space…

Read More on Learn Scala Spark: 5 Books Every Spark & Scala Developer Should Own

Last modified: December 31, 2019

Author

Comments

Write a Reply or Comment