Apache Spark Framework programming tutorial

Apache Spark Framework is the very fast framework for processing the data in Big Data environment. In section we are going to provide you tutorials, articles and examples of using the framework in programming.

Apache Spark Framework programming tutorial

Apache Spark Tutorials - Apache Spark Framework is 100 times faster framework then Map Reduce

Apache Spark Framework is next innovation in Big Data cluster computing and it runs 100 faster in memory as compared to Map Reduce. This framework is under open source license which was originally developed  in the AMPLab at University of California, Berkeley. Later on this framework was Apache Software Foundation and Apache Software Foundation working on this framework.

This is designed in a such a way that it allows the application program to load the data in the memory of cluster's computer and then query/process the data very fast. Data can be queried repeatedly and get the results very fast. This framework also well-suited framework for running the machine learning algorithms.

Spark Framework can't run alone it requires the cluster manager software along with distributed storage system.

Following Cluster Manager software are supported by Spark Framework:

  • Standalone (native Spark cluster)
  • Hadoop YARN
  • Apache Mesos

Following distributed storage system are supported by Apache Spark Framework:

  • Hadoop Distributed File System (HDFS)
  • CassandraOpenStack
  • Swift
  • Amazon S3
  • Kudu
  • It also custom storage system

Here is the tutorials of Apache Spark Framework:

  1. Introduction to Apache Framework
  2. What is Spark Framework?

Apache Spark 2.4.0

Spark  Scala

PySpark

Apache Spark Article/Tutorials

  1. How to setup Apache Spark Development Environment?
  2. Features and Benefits of Apache Spark
  3. 5 Invincible Reasons to use Spark for Big Data Analytics?
  4. IBM Commits to Advance Apache Spark
  5. Why is Apache Spark So Hot?
  6. Spark 2.2 Features