Apache Spark Framework programming tutorial

--Ads--

Apache Spark Tutorials - Apache Spark Framework is 100 times faster framework then Map Reduce

Apache Spark Framework is next innovation in Big Data cluster computing and it runs 100 faster in memory as compared to Map Reduce. This framework is under open source license which was originally developed in the AMPLab at University of California, Berkeley. Later on this framework was Apache Software Foundation and Apache Software Foundation working on this framework.

This is designed in a such a way that it allows the application program to load the data in the memory of cluster's computer and then query/process the data very fast. Data can be queried repeatedly and get the results very fast. This framework also well-suited framework for running the machine learning algorithms.

Spark Framework can't run alone it requires the cluster manager software along with distributed storage system.

Following Cluster Manager software are supported by Spark Framework:

Standalone (native Spark cluster)
Hadoop YARN
Apache Mesos

Following distributed storage system are supported by Apache Spark Framework:

Hadoop Distributed File System (HDFS)
CassandraOpenStack
Swift
Amazon S3
Kudu
It also custom storage system

Here is the tutorials of Apache Spark Framework:

Apache Spark 2.4.0

Spark Scala

PySpark

Creating SparkSession - Learn to create SparkSession in your PySpark program
sc.parallelize pyspark - use of sc.parallelize in pyspark program

Apache Spark Article/Tutorials