Big Data Hadoop

Course Info

  • Category: IT Program
  • TRAINER: MR.ASHUTOSH
  • Duration: 2 Months.
  • Share Course:

Big Data Hadoop Course

Session 1 - Introduction to Big Data

1 .Importance of Data
2. ESG Report on Analytics
3. Big Data & It’s Hype
4. What is Big Data?
5. Structured vs Unstructured data
6. Definition of Big Data
7 .Big Data Users & Scenarios
8. Challenges of Big Data

Session 2 – Hadoop

1. History Of Hadoop
2. Hadoop Ecosystem
3. Hadoop Animal Planet
4. When to use & when not to use Hadoop
5. What is Hadoop?
6. Key Distinctions of Hadoopv
7. Hadoop Components/Architecture
8. Understanding Storage Components
9. Understanding Processing Components
10. Anatomy Of a File Write
11. Anatomy of a File Read

Session 3 – Understanding Hadoop Cluster

1. Handout discussion
2. Walkthrough of CDH setup
3. Hadoop Cluster Modes
4. Hadoop Configuration files
5. Understanding Hadoop Cluster configuration
6. Data Ingestion to HDFS

Session 4 - MapReduce

1. Meet MapReduce
2. Word Count Algorithm – Traditional approach
3. Traditional approach on a Distributed system
4. Traditional approach – Drawbacks
5. MapReduce approach
6. Input & Output Forms of a MR program
7. Map, Shuffle & Sort, Reduce Phases
8. Workflow & Transformation of Data
9. Word Count Code walkthrough

Session 5 - MapReduce

1. Input Split & HDFS Block
2. Relation between Split & Block
3. MR Flow with Single Reduce Task
4. MR flow with multiple Reducers
5 .Data locality Optimization
6. Speculative Execution

Session 6 – Advanced MapReduce

1. Combiner
2. Partitioner
3. Counters
4. Hadoop Data Types
5. Custom Data Types
6 .Input Format & Hierarchy
7. Output Format & Hierarchy
8. Side Data distribution - Distributed cache

Session 7 – Advanced MapReduce

1. Joins
2. Map side Join using Distributed cache
3. Reduce side Join
4. MR Unit – An Unit testing framework

Session 8 – Pig

1. What is Pig?
2. Why Pig?
3. Pig vs Sql
4. Execution Types or Modes
5. Running Pig
6. Pig Data types
7. Pig Latin relational Operators
8. Multi Query execution
9. Pig Latin Diagnostic Operators

Session 9 – Pig

1. Pig Latin Macro & UDF statements
2. Pig Latin Commands
3. Pig Latin Expressions
4. Schemas
5. Pig Functions
6. Pig Latin File Loaders
7. Pig UDF & executing a Pig UDF

Session 10 – Hive

1. Introduction to Hive
2. Pig Vs Hive
3. Hive Limitations & Possibilities
4. Hive Architecture
5. Metastore
6. Hive Data Organization
7. Hive QL
8. Sql vs Hive QL
9. Hive Data types
10. Data Storage
11. Managed & External Tables

Session 11 – Hive

1. Partitions & Buckets
2. Storage Formats
3. Built-in Serdes
4. Importing Data
5. Alter & Drop Commands
6. Data Querying

Session 12 – Hive

1 .Using MR Scripts
2. Hive Joins
3. Sub Queries
4. Views
5. UDFs

Session 13 – HBase

1. Introduction to NoSql & HBase
2. Row & Column oriented storage
3. Characteristics of a huge DB
4. What is HBase?
5. HBase Data-Model
6. HBase vs RDBMS
7. HBase architecture
8. HBase in operation
9 .Loading Data into HBase
10. HBase shell commands
11. HBase operations through Java
12. HBase operations through MR

Session 14 – ZooKeeper & Oozie

1. Introduction to Zookeeper
2. Distributed Coordination
3. Zookeeper Data Model
4. Zookeeper Service
5. Zookeeper in HBase
6. Introduction to Oozie
7. Oozie workflow

Session 15 – Sqoop

1 .Introduction to Sqoop
2. Sqoop design
3. Sqoop Commands
4. Sqoop Import & Export Commands
5. Sqoop Incremental load Commands

Session 16 – Hadoop 2.0 & YARN

1. Hadoop 1 Limitations
2. HDFS Federation
3. NameNode High Availability
4. Introduction to YARN
5. YARN Applications
6. YARN Architecture
7. Anatomy of an YARN application

Session 17 – Project Discussion

1. Java to MapReduce Conversion
2. MapReduce Project

Session 18 – Project Discussion

1. Hive Project
2.Pig Project

Course Highlights:

1. Hands on Assignments from each session
2. Instructor led learning sessions
3.Interactive sessions & Hands on Practice
4. Lifetime access to Knowledge Base
5.Interview and Job perspectives