FAQ: #Apache Hadoop

04 November 2020

#Apache Hadoop

Hadoop Distributed File System - Datanode, Namenode, Secondary Namenode

MapReduce - Map phase, Reduce phase
Yarn - Node manager, Resource manager
Hive - MetaStore, Driver, Query compiler, Hive server
Pig
HBase - HBase server, Region server
Mahout
Zookeeper
Oozie
Sqoop
Flume - Source, Channel, Sink
Ambari
Apache Drill
Apache Spark
Solr And Lucene
Scala
Presto

Scikit and Introduction to Hadoop

Introduction to Scikit-Learn
Inbuilt Algorithms for Use
What is Hadoop and why it is popular
Distributed Computation and Functional Programming
Understanding MapReduce Framework Sample MapReduce Job Run

Hadoop and Python

PIG and HIVE Basics
Streaming Feature in Hadoop
Map Reduce Job Run using Python
Writing a PIG UDF in Python
Writing a HIVE UDF in Python
Pydoop and MRjob Basics

HADOOP

Big Data and Hadoop Introduction
What is Big Data and Hadoop?
Challenges of Big Data
Traditional approach Vs Hadoop
Hadoop Architecture
Distributed Model
Block structure File System
Technologies supporting Big Data
Replication
Fault Tolerance
Why Hadoop?

Hadoop Eco-System

Use cases of Hadoop
Hadoop Ecosystem

Fundamental Design Principles of Hadoop

Comparison of Hadoop Vs RDBMS

Hadoop Cluster Architecture

Hadoop Cluster and Architecture

5 Daemons
Hands-On Exercise
Typical Workflow
Hands-On Exercise
Writing Files to HDFS
Hands-On Exercise
Reading Files from HDFS
Hands-On Exercise
Rack Awareness
Before Map Reduce

Module-9

Joins & Sub queries, Views
Integration, Data manipulation with Hive
User Defined Functions
Appending Data into existing Hive Table
Static partitioning vs dynamic partitioning

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)