Hadoop 3.1 Features

Hadoop 3.1 is major release of Hadoop 3.x - Check Hadoop 3.1 Features

Hadoop 3.1 is major release with many significant changes and improvements over previous release Hadoop 3.0. In this article we are discussing the features of Apache Hadoop 3.1 Big Data platform.

Hadoop 3.1.0 comes with new features, bug fixes, improvements and many other changes.

List of Major features of Hadoop 3.1

Hadoop 3.1.0 is another major release of Big Data platform and is targeted to provide high performance compute system to meet the needs of today's machine learning. It supports training and deployment of deep learning frameworks. Data scientist can run jobs in parallel to achieve high performance in near real-time.

1. YARN to support long running job natively

This the major feature as Hadoop 3.1.0 allows to run long running jobs natively. YARN provides API and host long running natively which means developers can run long running job using YARN.

2. YARN as container orchestration platform

In this version YARN comes with advanced features with the support for container orchestration and it manages containerized services. YARN provides the support for docker container and and process based based containers.

3. GPU support

YARN provides first class support for GPU scheduling and isolation for docker/non-docker containers. On YARN currently only Nvidia GPUs are supported.

In resource-types.xml you should add following properties to enable GPU:

<configuration>
   <property>
      <name>yarn.resource-types</name>
      <value>yarn.io/gpu</value>
   </property>
</configuration>

Support for GPU is very exciting features and now machine learning programs can use GPU on the Hadoop cluster for fast machine learning processing.

4. FPGA scheduling and isolation on YARN

In Hadoop 3.1.0 YARN comes with first-class support for FPGA scheduling and isolation. And this is available for both docker/non-docker containers over the YARNresource management and job scheduling technology stack.

5. Support for providing absolute resource

Hadoop 3.1.0 provides ability to admin to specify absolute resources in terms of X Memory, Y VCores, Z GPUs, etc. So, its gives better ability to manage processes in the Hadoop cluster. Earlier there was only option of providing the resources in percentage values.

6. Support for data stored outside HDFS

A new storage type PROVIDED is added to the Hadoop 3.1.0 which allows the admin to configure mapped drive. This mapped drive is subsequently managed by HDFS.

7. New Features added in Hadoop 3.1.0

Here is the list of new features added in the Hadoop version 3.1.0:

Support meta tag element in Hadoop XML configurations
[Umbrella] Extend the YARN resource model for easier resource-type management and profiles
[Umbrella] Support maintenance state for datanodes
Implement linkMergeSlash and linkFallback for ViewFileSystem
Add additional deSelects params in RMWebServices#getAppReport
Tool to estimate resource requirements of an application pipeline based on prior executions
Support for head in FSShell
[Umbrella] Native YARN framework layer for services and beyond
[Umbrella] Simplified discovery of services via DNS mechanisms
Add S3A committer for zero-rename commits to S3 endpoints
Allow HDFS block replicas to be provided by an external storage system
[Umbrella] Rich placement constraints in YARN
SnapshotDiff - Provide an iterator-based listing API for calculating snapshotDiff
Source: http://hadoop.apache.org/docs/r3.1.0/hadoop-project-dist/hadoop-common/release/3.1.0/CHANGES.3.1.0.html

8. Major fixes in Hadoop 3.1.0

Following are major fixes that comes with the Hadoop 3.1.0:

141 in Hadoop Common - 141 Bugs are fixed in Hadoop Common
266 in HDFS - 266 bugs are fixed in HDFS library
329 in YARN - 329 fixes are done for YARN
32 in MapReduce - and 32 bugs are fixed in MapReduce of Hadoop

You can view full details at https://s.apache.org/apache-hadoop-3.1.0-all-tickets.

Check more tutorials at Apache Spark Tutorials.