Course: Big Data & Hadoop (Admin)
Duration: 4 Days

 

The Big Data Hadoop Certification course is designed to give you in-depth knowledge of the Big Data framework using Hadoop and Spark, including HDFS, YARN, and MapReduce. You will learn to use Pig, Hive, and Impala to process and analyze large datasets stored in the HDFS, and use Sqoop and Flume for data ingestion with our big data training. Big Data Hadoop Administrator training is designed to enhance your knowledge and skills to become a successful Hadoop administrator. You will master real-time data processing using Spark, including functional programming in Spark, implementing Spark applications, understanding parallel processing in Spark, and using Spark RDD optimization techniques.

Who should attend:-

  • Systems Administrators and IT managers
  • Windows Administrators
  • Linux Administrators
  • IT administrators and operators
  • IT Systems Engineer
  • Data Engineer and Web Engineer
  • Data Analytics Administrator and DB Administrators
  • Cloud Systems Administrator
  • Mainframe professionals and IT managers
  • Big data Architects

Candidates should have the fundamental knowledge of any programming language and Linux environment. Prior knowledge of Apache Hadoop is not required.

Introduction to Big Data and Hadoop

  • What is Big Data?
  • Types of Data & Data Growth
  • Need for Big Data
  • Characteristics of Big Data
  • Big Data Technology – Capabilities
  • Big Data—Use Cases
  • Traditional Data Warehouse – Definition & Limitations
  • Big Data Ware House
  • Introduction to Hadoop
  • Problems with Distributed Processing
  • Hadoop Core Components & Key Characteristics
  • History, Milestones and Ecosystem of Hadoop

Hadoop Architecture

  • Hadoop Cluster in commodity hardware
  • Hadoop core services and components
  • Regular file system vs. Hadoop
  • HDFS Key Features
  • HDFS Architecture
  • HDFS operation principle
  • Understanding Hadoop directory structure
  • Data Node Failures
  • HDFS Shell Commands
  • HDFS File Permissions

Planning Your Hadoop Cluster

  • Understanding Configuration
  • FilesRack Awareness
  • Single Node Hadoop setup on Local Machine / GCP Platform
  • Multi-Node Hadoop setup on Local Machine / GCP Platform

MapReduce & YARN

  • Introduction to MapReduce
  • Understanding JobTracker, TaskTracker
  • Hadoop MapReduce example (WordCount)
  • Hadoop MapReduce Characteristics
  • Understanding Blocks and Input Splits
  • Understanding Record-Reader, Mapper, Combiner, Partitioner
  • Understanding Shuffle, Sort, Reducer
  • Writing MapReduce application

YARN

  • Understanding issues with MapReduce
  • YARN Architecture

Cluster Maintenance

  • Checking the HDFS Status
  • Copying data between Clusters
  • Adding and Removing Cluster Nodes
  • Rebalancing the Cluster
  • Cluster Upgrading
  • Understanding Checkpointing, Safemode, Metadata, and Data Backup
  • Understanding High Availability
  • Understanding Federation

Installing & Managing Hadoop Ecosystem Projects

  • PIG
  • Hive
  • Sqoop
  • Flume
  • Zookeeper
  • HBase
  • Oozie

Cluster Monitoring, Troubleshooting, and Optimizing

  • Name Node and JobTracker WebUIs
  • View and Manage Hadoop Log Files
  • Ganglia Monitoring Tool
  • Understanding Nagios

Populating HDFS from External Sources

  • Using Sqoop to import data from RDBMS to HDFS
  • Gathering Logs from multiple systems using Flume
  • Using Hive as Data Warehouse Tool
  • Using Pig and HBase


Commercial Distributions

  • Setting up and using Cloudera
  • Setting up and using Hortonworks