Spark tutorial – Understanding and exploring spark core elements RDDs
There are terminology which founds difficults to understand by spark developers. I am going to explain those in my own language.
Spark RDDs
Spark RDDs is the resilient distributed dataset. RDDs is immutable datasets which can be created from internal source (i.e. Parallelize way) or external source (i.e. Spark streaming, text file, input file format etc). RDD’s elements can distributed across spark cluster for processing. RDDs can only be created by reading data from a stable storage such as TCP Streaming, Files or by transformations on existing RDDs.
For example- If as a spark developer you write spark TCP streaming job with interval of 1 seconds. Spark continuously receive data and form a RDD form the data receives in that one seconds.
Look at the image below. RDD consists of elemens E1, E2 …. En which can be json, text, line or any other string. This RDD can be transfom into new RDD but here i am not talking about this.
Elements of RDDs can be processed using foreach element loop. And then these elements will be distributed across executors in spark clusters.
Spark RDDs |
Point (1) – This explain how RDD is created from the data received on spark tcp socket in 1 seconds.
Point (2) – This explain that elements of RDD assigned to executors for processing.
1 | SparkConf sparkConf = new SparkConf().setMaster("spark://10.1.0.5:8088").setAppName("App name"); <br /> JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, new Duration(3000)); <br /> JavaDStream<String> stream = ssc.socketTextStream("socket IP", 9000, StorageLevels.MEMORY_AND_DISK_SER); <br /> stream.foreachRDD(new VoidFunction<JavaRDD<String>>() { <br /> private static final long serialVersionUID = 1L; <br /> public void call(JavaRDD<String> rdd) throws Exception { <br /></code><h3 style="text-align: left"><br /><code style="color: black"> </code><code><span style="color: red">/** Point (1) **/ </span></code></h3><br /><code style="color: black"> rdd.foreach(new VoidFunction<String>() { <br /> private static final long serialVersionUID = 1L; <br /> public void call(String s) throws Exception { <br /></code><h3 style="text-align: left"><br /><code style="color: black"> </code><code><span style="color: red"> /** Point (2) **/ </span></code></h3><br /><code style="color: black"> System.out.println(s); <br /> } <br /> }); <br /> } <br /> }); <br /> ssc.start(); <br /> ssc.awaitTermination(); <br /> |