04 November 2020

#Apache Hadoop

  • Hadoop Distributed File System - Datanode, Namenode, Secondary Namenode

Scikit and Introduction to Hadoop
  • Introduction to Scikit-Learn
  • Inbuilt Algorithms for Use
  • What is Hadoop and why it is popular
  • Distributed Computation and Functional Programming
  • Understanding MapReduce Framework Sample MapReduce Job Run

Hadoop and Python

  • PIG and HIVE Basics
  • Streaming Feature in Hadoop
  • Map Reduce Job Run using Python
  • Writing a PIG UDF in Python
  • Writing a HIVE UDF in Python
  • Pydoop and MRjob Basics
HADOOP
  • Big Data and Hadoop Introduction
  • What is Big Data and Hadoop?
  • Challenges of Big Data
  • Traditional approach Vs Hadoop
  • Hadoop Architecture
  • Distributed Model
  • Block structure File System
  • Technologies supporting Big Data
  • Replication
  • Fault Tolerance
  • Why Hadoop?
  • Hadoop Eco-System
  • Use cases of Hadoop
  • Hadoop Ecosystem
  • Fundamental Design Principles of Hadoop
  • Comparison of Hadoop Vs RDBMS

Hadoop Cluster Architecture
  • Hadoop Cluster and Architecture
  • 5 Daemons
  • Hands-On Exercise
  • Typical Workflow
  • Hands-On Exercise
  • Writing Files to HDFS
  • Hands-On Exercise
  • Reading Files from HDFS
  • Hands-On Exercise
  • Rack Awareness
  • Before Map Reduce
Module-9
  • Joins & Sub queries, Views
  • Integration, Data manipulation with Hive
  • User Defined Functions
  • Appending Data into existing Hive Table
  • Static partitioning vs dynamic partitioning

No comments:

Post a Comment

Most views on this month