Tuesday, 15 December 2015

Installation of Hadoop 2.7.1 in Ubuntu 14.04 in a Pseudo mode - for Naive Users

When I started installation of Hadoop, I looked at it as a mountain. But eventually figured out that with the right instructions all it takes is 5-10 minutes. This blog just tries to show how to install hadoop without much explanation as to the need for each line.

Install Java

Check if you have the latest version of java using the command

$ java -version
At the time of publishing this blog, the latest compatible version was java-7.
If you do not have the latest version installed, then install using the command
$ sudo apt-get install openjdk-7-jdk

 

Install ssh

$ sudo apt-get install ssh

 

Configure ssh

$ ssh-keygen -t rsa -P ""

You are prompted
"Enter file in which to save the key(/root/ssh/id_rsa)"
Do not give any name and just press the enter key

$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
$ ssh localhost

Download and Extract Hadoop

Download  hadoop 2.7.1 binary tarball from http://hadoop.apache.org/releases.html
Extract and save in /usr/local

$ cd /usr/local
$ sudo tar -xzf  hadoop-2.7.1.tar.gz  

 

Update .bashrc

It can be found in ~. If you are unable to list it with ls, use 
$ ls -al
$ vi .bashrc

# Set JAVA_HOME - check location in your system
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop-2.7.1 

# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
 
# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
hadoop fs -cat $1 | lzop -dc | head -1000 | less
}

Update hadoop-env.sh

The location in my system : /usr/local/hadoop-2.7.1/etc/hadoop/hadoop-env.sh 

$ vi  /usr/local/hadoop-2.7.1/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

Update core-site.xml

#The location in my system :  /usr/local/hadoop-2.7.1/etc/hadoop
# Copy within <configuration> ... </configuration>

<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property> 

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.
</description>
</property>

Update mapred-site.xml

 #The location of mapred-site.xml.template in my system :  /usr/local/hadoop-2.7.1/etc/hadoop
#Rename mapred-site.xml.template to mapred-site.xml

<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>

Update hdfs-site.xml

 #The location in my system :  /usr/local/hadoop-2.7.1/etc/hadoop
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>

 

Formatting the HDFS filesystem via the NameNode

$ /usr/local/hadoop-2.7.1/bin/hdfs namenode -format

 

Starting Hadoop

$ /usr/local/hadoop-2.7.1/sbin/start-dfs.sh
$ /usr/local/hadoop-2.7.1/sbin/start-yarn.sh

 

Check if it is working right

$ jps 
# ResourceManager, Jps, SecondaryNameNode, NodeManager, NameNode, DataNode
# Note that JobTracker and TaskTracker have been replaced with ResourceManager and NodeManager

To Stop

$ /usr/local/hadoop-2.7.1/sbin/stop-dfs.sh
$ /usr/local/hadoop-2.7.1/sbin/stop-yarn.sh

To Check the Status

Type the following in browser http://localhost:50070/
It should show the namenodes, live nodes should be 1
Also, jps will not display DataNode

If jps doesnot display DataNode

$ /usr/local/hadoop-2.7.1/sbin/stop-dfs.sh
$ /usr/local/hadoop-2.7.1/sbin/stop-yarn.sh
$ rm -Rf  /app/hadoop/tmp/*
$ /usr/local/hadoop-2.7.1/bin/hdfs namenode -format
$ /usr/local/hadoop-2.7.1/sbin/start-dfs.sh
$ /usr/local/hadoop-2.7.1/sbin/start-yarn.sh

  



No comments:

Post a Comment