Big Data/Hadoop Course Outline
Training mode : We offer face to face class room training , Online and Fasttrack training programs
This course has been targeted for Architects, Administrators and developers
Attend Once and Play any role as you wish !
http://big-data-training-in-chennai.blogspot.in/
Training mode : We offer face to face class room training , Online and Fasttrack training programs
This course has been targeted for Architects, Administrators and developers
Attend Once and Play any role as you wish !
Module 1
Big data Getting Started
|
What is Big Data?
What is Apache Hadoop ?
History of Hadoop
Understanding distributed file systems and Hadoop
Hadoop eco system components
Hadoop use cases
Ubuntu Installation
JDK Installation
|
Module 2
Hadoop Distributed File system
|
Hadoop V1 & V2 Architecture
Eclipse Installation
Overview of HDFS
Communication Protocols
Rack Awareness
Hadoop cluster Topology
Setting up SSH for Hadoop Cluster
Running Hadoop –
1.
Pseudo-distributed mode
Linux basic commands
HDFS file commands
Reading and writing to HDFS programmatically
Hands-on Lab Exercises
|
Module 3
MapReduce Framework
|
Java Basics
Anatomy of a MapReduce Program
Writables
InputFormat
OutputFormat
Streaming API
Inherent failure handling
Reading and writing
Hands-on Lab Exercises
|
Module 4
Advanced MapReduce Programming
|
Input splits, Record Reader, Mapper, Partition & Shuffle, Reduce,
OutputFormat
Writing MapReduce program
Streaming in Hadoop
Counters
Performance Tuning
Joins
Sorting
Determining Optimal number of reducers, partitions
Hadoop cluster – Performance tuning
Hands-on Lab Exercises
|
Module 5
Apache Hadoop Administration
|
Best Practices for Hadoop setup and infrastructure
Hadoop cluster Installation preparation
Ø Cluster
network design
Ø Installation
of Linux operating system
Ø Configuring
SSH
Ø Walkthrough
on Rack topology and set up
Managing Hadoop cluster
Ø HDFS
cluster management
Ø Secondary
Name node configuration
Ø Task
Tracker management
Ø Configuring
the HDFS quota
Ø Configuring
Fair Scheduler
Ø Upgrading
Hadoop
Ø Deploying
and managing Hadoop clusters
with Ambari
Monitoring Hadoop cluster
Ø Monitoring
Hadoop cluster with Ganglia
Ø Monitoring
Hadoop cluster with Ambari
Ø Monitoring
Hadoop cluster with Nagia
Hadoop Cluster Performance Tuning
Ø Benchmarking
and profiling
Ø Using
compression for input and output
Ø Configuring
optimal map and reduce
slots for the TT
Ø Fine
tuning Job Tracker config
Ø Fine
tuning Task Tracker config
Ø Tuning
Shuffle, merge and sort parameters
Security Implementation
Kerberos security
Implementation
Workflow Scheduler
Capacity
Scheduler
Fair
Scheduler
dfsadmin & mradmin commands
Administration of Hcatalog and Hive
Backup and Recovery
Scenario based exercises
-
Data node failure & Recovery
-
Name Node Failure & Recovery
-
JT & TT failure & Recovery
-
Removing data nodes
-
Adding Data nodes
-
Commissioning and decommissioning of nodes
|
Module 6
Pig and Pig Latin
|
Installation and configuration
Running Pig Lating through grunt
Writing programs
-
Filter , Load & Store functions
Writing user defined functions
Working with Scripts
Lab Exercises
|
Module 7
HBase and ZooKeeper
|
NoSQL Vs SQL
Cap Theorem
Architecture
Installation
Configuration
Java API
MR integration
Performance Tuning
Lab Exercises
|
Module 8
Hive
|
Features of Hive
Architecture
Installation and configuration
HiveQL
Lab Exercises
|
Module 9
Other Hadoop eco system components
Apache Spark and eco system |
Overview of Ambari, Oozie ,Mahout
Installing & configuring Sqoop, mysql-server
Installing & configuring flume
Scala Environment setup REPL Control Statements Variables and expressions Classes, Objects, Traits, Types and Methods First Class Functions, Higher order functions, Procedures Closures Currying Working with SBT Introduction to Spark and its EcosystemSpark Overview Spark Installation & Ecosystem walkthrough Running a sample Spark Architecture and Programming with RDDs Spark Architecture Spark Shell Spark Context RDD Introduction Basic programming with RDDs Common Transformation and Actions Working with Spark jobs Parquet Data model Parquet File Format Writing and Reading parquet files Analytics with Spark SQL Spark SQL Basics. Create tables and working with data Advanced Sqark SQL Queries DataFrames/Schema RDDs Loading and Saving data from/to RDD, Hive, Databases Performance of MapReduce Vs Spark Advanced Programming Persistence and Caching Accumulators and Broadcast Variables Pair and Numeric RDD Operations Pre-partition Working with various file formats and file systems RDD Partitioning Running Spark in Cluster Tuning and Debugging
Lab Exercises
|
Module 10
Hadoop on Cloud
|
Hosting Hadoop on Amazon EC2
EMR Hands-on
|
Nice article.... thanks for sharing it helped me in getting the near by good institute.
ReplyDeletebig data training and placement in bangalore