Skip to content
alirezaAsadi2018 edited this page Aug 19, 2019 · 24 revisions

NOTE: Hadoop compatible version is 2.7.7
NOTE: Hbase compatible version is 1.2.4

Change DNS Servers To Config Without Ip

append this to /etc/hosts

<server-ip> <hostname>

Create User For Hadoop

adduser hadoop
usermod -aG sudo hadoop

Modify /etc/hosts file

for example this is for master node:

127.0.0.1 localhost.localdomain localhost
master-ip server.domain.com server
slave1-ip slave1
slave2-ip slave2
slave3-ip slave3
master-ip master

Download

releases Download With wget and extract with tar xvf
then mv folder /usr/local/hadoop
then with chown grant access to hadoop folder for hadoop user :

chown -R hadoop:hadoop /usr/local/hadoop

very important: From now on change user to hadoop with su hadoop

In .bashrc

change /usr/lib/jvm/jdk1.8.0_211 to Your Java Home Directory

export J2SDKDIR="/usr/lib/jvm/jdk1.8.0_211"
export J2REDIR="/usr/lib/jvm/jdk1.8.0_211/jre"
export JAVA_HOME="/usr/lib/jvm/jdk1.8.0_211"
export DERBY_HOME="/usr/lib/jvm/jdk1.8.0_211/db"
export HADOOP_HOME="/usr/local/hadoop"
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/db/bin:$JAVA_HOME/jre/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

WARNING:You must source .bashrc and hadoop-env.sh after changing them

Initial Configuration for Each Node

In core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
    </property>
</configuration>

In hadoop-env.sh

export JAVA_HOME="/usr/lib/jvm/jdk1.8.0_211"
export HDFS_NAMENODE_USER="hadoop"
export HDFS_DATANODE_USER="hadoop"
export HDFS_SECONDARYNAMENODE_USER="hadoop"
export YARN_RESOURCEMANAGER_USER="hadoop"
export YARN_NODEMANAGER_USER="hadoop"

In hdfs-site.xml for master

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name> # this namenode property is only for master
                <value>file:///usr/local/hadoop/hdfs/data</value>
        </property>
        <property>
                <name>dfs.permission</name>
                <value>false</value>
        </property>
</configuration>

In hdfs-site.xml for other slaves

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name># this datanode property is only for slaves
                <value>file:///usr/local/hadoop/hdfs/data</value>
        </property>
        <property>
                <name>dfs.permission</name>
                <value>false</value>
        </property>
</configuration>

In mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.jobtracker.address</name>
        <value>master:54311</value>
    </property>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

In yarn-site.xml

<configuration>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
    </property>
</configuration>

Format HDFS Data

hadoop namenode -format

If you are using a firewall, you’ll need to open port 9000, 54311, 50070(monitoring hadoop hdfs), 8088(monitoring yarn).

Set Up SSH for Each Node:

do this for each node; Before doing su hadoop and make sure it is hadoop@server Do the following:

ssh-keygen -p ""

check if the key is generated:

cat ~/.ssh/id_rsa.pub

copy this key.
Go to all other nodes and run nano ~/.ssh/authorized_keys and copy the ssh key there. You can also copy ssh using this:

ssh-copy-id hadoop@hostname.example.com

On master open this file:

nano ~/.ssh/config

and write these:

Host master
    HostName [hostname]
    User hadoop
    IdentityFile ~/.ssh/id_rsa

Host slave1
    HostName [hostname]
    User hadoop
    IdentityFile ~/.ssh/id_rsa

Host slave2
    HostName [hostname]
    User hadoop
    IdentityFile ~/.ssh/id_rsa

Host slave3
    HostName [hostname]
    User hadoop
    IdentityFile ~/.ssh/id_rsa

Configure the Master Node:

DO like others.

Configure the slaves and masters:

As master node:

nano $HADOOP_HOME/etc/hadoop/slaves

Write:

localhost
hadoop-worker-01-server-ip
hadoop-worker-02-server-ip
hadoop-worker-03-server-ip

and:

nano $HADOOP_HOME/etc/hadoop/masters

Write:

master

NOTE:Write masters for all nodes.

NOTE: It's maybe necessary to add below line to hadoop-env.sh

export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

Set SSH port for Hadoop And Hbase

In hadoop-env.sh there is the HADOOP_SSH_OPTS environment variable. I'm not really sure what it does, but you are welcome to try and set a port like so.

export HADOOP_SSH_OPTS="-p <num>"

Also not sure about this one, but in hbase-env.sh

export HBASE_SSH_OPTS="-p <num>"

Once done setting all the configs, restart the Hadoop services([Alireza]: Don't use these. They are deprecated.)

stop-all.sh
start-all.sh

[Alireza]: Use these instead:

start-dfs.sh
start-yarn.sh
stop-dfs.sh
stop-yarn.sh

first and the best and this is for ubuntu 16.04

https://linuxconfig.org/how-to-install-hadoop-on-ubuntu-18-04-bionic-beaver-linux

hadoop bundled dependencies

Clone this wiki locally