Skip to content

jizhang/spark-sandbox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

fedfcd4 · Dec 8, 2018

History

63 Commits
Nov 27, 2018
Oct 8, 2018
Dec 2, 2018
Oct 10, 2018
Jul 31, 2017
Apr 15, 2017
Oct 10, 2018

Repository files navigation

Spark Sandbox

Build Status

Install sbt

  • Download sbt-launch.jar, and put it into $HOME/bin.
  • Create $HOME/bin/sbt, and change mode to 755. The content is:
SBT_OPTS="-Xms512M -Xmx1536M -Xss1M -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256M -Dsbt.override.build.repos=true"
java $SBT_OPTS -jar `dirname $0`/sbt-launch.jar "$@"
  • Create $HOME/.sbt/repositories, content is:
[repositories]
  local
  my-ivy-proxy-releases: http://10.20.8.31:8081/nexus/content/groups/ivy-releases/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext]
  my-maven-proxy-releases: http://10.20.8.31:8081/nexus/content/groups/public/

Install Spark

  • Download Spark, choose the version corresponding to your HDFS.
  • Extract the tar ball, say /path/to/spark
  • Setup $SPARK_HOME=/path/to/spark
  • Add $SPARK_HOME/bin to $PATH

Import Project

$ git clone git@github.com:jizhang/spark-sandbox
$ cd spark-sandbox
$ sbt eclipse

And import the project into Eclipse, provided ScalaIDE for Eclipse is installed.

Wordcount

  • Run locally:
$ cd spark-sandbox
$ sbt "run-main Wordcount data/wordcount.txt"
  • Submit to cluster:
$ sbt package
$ spark-submit --class Wordcount --master local target/scala-2.10/spark-sandbox_2.10-0.1.0.jar data/wordcount.txt

Logistic Regression

$ spark-submit --class LogisticRegression --master local target/scala-2.10/spark-sandbox_2.10-0.1.0.jar data/lr_data.txt 10 10

In-Memory Log Mining

$ spark-submit --class LogMining --master local target/scala-2.10/spark-sandbox_2.10-0.1.0.jar data/logs.txt

Streaming Wordcount

$ nc -lk 9999
$ spark-submit --class StreamingWordcount --master local[2] target/scala-2.10/spark-sandbox_2.10-0.1.0.jar

KMeans

$ spark-submit --class KMeans --master local target/scala-2.10/spark-sandbox_2.10-0.1.0.jar data/kmeans_data.txt 2 0.01

Recommendation

$ sbt "run-main recommendation.MainClass als"

About

A playground for Spark jobs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages