Hadoop

NOTE: Hadoop compatible version is 2.7.7
NOTE: Hbase compatible version is 1.2.4

Change DNS Servers To Config Without Ip

append this to /etc/hosts

<server-ip> <hostname>

Create User For Hadoop

adduser hadoop
usermod -aG sudo hadoop

Modify /etc/hosts file

for example this is for master node:

127.0.0.1 localhost.localdomain localhost
master-ip server.domain.com server
slave1-ip slave1
slave2-ip slave2
slave3-ip slave3
master-ip master

Download

releases Download With wget and extract with tar xvf
then mv folder /usr/local/hadoop
then with chown grant access to hadoop folder for hadoop user :

chown -R hadoop:hadoop /usr/local/hadoop

very important: From now on change user to hadoop with su hadoop

In `.bashrc`

change /usr/lib/jvm/jdk1.8.0_211 to Your Java Home Directory

export J2SDKDIR="/usr/lib/jvm/jdk1.8.0_211"
export J2REDIR="/usr/lib/jvm/jdk1.8.0_211/jre"
export JAVA_HOME="/usr/lib/jvm/jdk1.8.0_211"
export DERBY_HOME="/usr/lib/jvm/jdk1.8.0_211/db"
export HADOOP_HOME="/usr/local/hadoop"
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/db/bin:$JAVA_HOME/jre/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

WARNING:You must source .bashrc and hadoop-env.sh after changing them

Initial Configuration for Each Node

In `core-site.xml`

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
    </property>
</configuration>

In `hadoop-env.sh`

export JAVA_HOME="/usr/lib/jvm/jdk1.8.0_211"
export HDFS_NAMENODE_USER="hadoop"
export HDFS_DATANODE_USER="hadoop"
export HDFS_SECONDARYNAMENODE_USER="hadoop"
export YARN_RESOURCEMANAGER_USER="hadoop"
export YARN_NODEMANAGER_USER="hadoop"

In `hdfs-site.xml` for master

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name> # this namenode property is only for master
                <value>file:///usr/local/hadoop/hdfs/data</value>
        </property>
        <property>
                <name>dfs.permission</name>
                <value>false</value>
        </property>
</configuration>

In `hdfs-site.xml` for other slaves

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name># this datanode property is only for slaves
                <value>file:///usr/local/hadoop/hdfs/data</value>
        </property>
        <property>
                <name>dfs.permission</name>
                <value>false</value>
        </property>
</configuration>

In `mapred-site.xml`

<configuration>
    <property>
        <name>mapreduce.jobtracker.address</name>
        <value>master:54311</value>
    </property>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

In `yarn-site.xml`

<configuration>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
    </property>
</configuration>

Format HDFS Data

hadoop namenode -format

If you are using a firewall, you’ll need to open port 9000, 54311, 50070(monitoring hadoop hdfs), 8088(monitoring yarn).

Set Up SSH for Each Node:

do this for each node; Before doing su hadoop and make sure it is hadoop@server Do the following:

ssh-keygen -p ""

check if the key is generated:

cat ~/.ssh/id_rsa.pub

copy this key.
Go to all other nodes and run nano ~/.ssh/authorized_keys and copy the ssh key there. You can also copy ssh using this:

ssh-copy-id hadoop@hostname.example.com

On master open this file:

nano ~/.ssh/config

and write these:

Host master
    HostName [hostname]
    User hadoop
    IdentityFile ~/.ssh/id_rsa

Host slave1
    HostName [hostname]
    User hadoop
    IdentityFile ~/.ssh/id_rsa

Host slave2
    HostName [hostname]
    User hadoop
    IdentityFile ~/.ssh/id_rsa

Host slave3
    HostName [hostname]
    User hadoop
    IdentityFile ~/.ssh/id_rsa

Configure the Master Node:

DO like others.

Configure the slaves and masters:

As master node:

nano $HADOOP_HOME/etc/hadoop/slaves

Write:

localhost
hadoop-worker-01-server-ip
hadoop-worker-02-server-ip
hadoop-worker-03-server-ip

and:

nano $HADOOP_HOME/etc/hadoop/masters

Write:

master

NOTE:Write masters for all nodes.

NOTE: It's maybe necessary to add below line to hadoop-env.sh

export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

Set SSH port for Hadoop And Hbase

In hadoop-env.sh there is the HADOOP_SSH_OPTS environment variable. I'm not really sure what it does, but you are welcome to try and set a port like so.

export HADOOP_SSH_OPTS="-p <num>"

Also not sure about this one, but in hbase-env.sh

export HBASE_SSH_OPTS="-p <num>"

Once done setting all the configs, restart the Hadoop services([Alireza]: Don't use these. They are deprecated.)

stop-all.sh
start-all.sh

[Alireza]: Use these instead:

start-dfs.sh
start-yarn.sh
stop-dfs.sh
stop-yarn.sh

first and the best and this is for ubuntu 16.04

https://linuxconfig.org/how-to-install-hadoop-on-ubuntu-18-04-bionic-beaver-linux

hadoop bundled dependencies

Hadoop

Change DNS Servers To Config Without Ip

Create User For Hadoop

Modify /etc/hosts file

Download

In .bashrc

Initial Configuration for Each Node

In core-site.xml

In hadoop-env.sh

In hdfs-site.xml for master

In hdfs-site.xml for other slaves

In mapred-site.xml

In yarn-site.xml

Format HDFS Data

Set Up SSH for Each Node:

Configure the Master Node:

Configure the slaves and masters:

Set SSH port for Hadoop And Hbase

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

In `.bashrc`

In `core-site.xml`

In `hadoop-env.sh`

In `hdfs-site.xml` for master

In `hdfs-site.xml` for other slaves

In `mapred-site.xml`

In `yarn-site.xml`