When I started installation of Hadoop, I looked at it as a mountain. But eventually figured out that with the right instructions all it takes is 5-10 minutes. This blog just tries to show how to install hadoop without much explanation as to the need for each line.
Install Java
Check if you have the latest version of java using the command
$ java -version
At the time of publishing this blog, the latest compatible version was java-7.
If you do not have the latest version installed, then install using the command
$ sudo apt-get install openjdk-7-jdk
Install ssh
$ sudo apt-get install ssh
Configure ssh
$ ssh-keygen -t rsa -P ""
You are prompted
"Enter file in which to save the key(/root/ssh/id_rsa)"
Do not give any name and just press the enter key
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
$ ssh localhost
Download and Extract Hadoop
Download hadoop 2.7.1 binary tarball from http://hadoop.apache.org/releases.html
Extract and save in /usr/local
$ cd /usr/local
$ sudo
tar -xzf hadoop-2.7.1.tar.gz
Update .bashrc
It can be found in ~. If you are unable to list it with ls, use
$ ls -al
$ vi .bashrc
# Set JAVA_HOME - check location in your system
export
JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop-2.7.1
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
# Some convenient aliases and functions for
running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
# If you have LZO compression enabled in
your Hadoop cluster and
# compress job outputs with LZOP (not
covered in this tutorial):
# Conveniently inspect an LZOP compressed
file from the command
# line; run via:
#
# $ lzohead
/hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
hadoop fs -cat $1 | lzop -dc | head -1000 |
less
}
Update hadoop-env.sh
The location in my system : /usr/local/hadoop-2.7.1/etc/hadoop/hadoop-env.sh
$ vi /usr/local/hadoop-2.7.1/etc/hadoop/hadoop-env.sh
export
JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
Update core-site.xml
#The location in my system :
/usr/local/hadoop-2.7.1/etc/hadoop
# Copy within <configuration> ...
</configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other
temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default
file system. A URI whose scheme and authority determine the
FileSystem implementation. The uri's scheme determines the config property
(fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's
authority is used to determine the host, port, etc. for a
filesystem.
</description>
</property>
Update mapred-site.xml
#The location of mapred-site.xml.template in my system :
/usr/local/hadoop-2.7.1/etc/hadoop
#Rename mapred-site.xml.template to
mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that
the MapReduce job tracker runs
at. If "local", then jobs are run
in-process as a single map
and reduce task.
</description>
</property>
Update hdfs-site.xml
#The location in my system :
/usr/local/hadoop-2.7.1/etc/hadoop
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block
replication.
The actual number of replications can be
specified when the file is created.
The default is used if replication is not
specified in create time.
</description>
</property>
Formatting the HDFS filesystem via the NameNode
$ /usr/local/hadoop-2.7.1/bin/hdfs namenode
-format
Starting Hadoop
$ /usr/local/hadoop-2.7.1/sbin/start-dfs.sh
$
/usr/local/hadoop-2.7.1/sbin/start-yarn.sh
Check if it is working right
$ jps
# ResourceManager, Jps, SecondaryNameNode,
NodeManager, NameNode, DataNode
# Note that JobTracker and TaskTracker have been replaced with
ResourceManager and NodeManager
To Stop
$ /usr/local/hadoop-2.7.1/sbin/stop-dfs.sh
$ /usr/local/hadoop-2.7.1/sbin/stop-yarn.sh
To Check the Status
Type the following in browser http://localhost:50070/
It should show the namenodes, live nodes
should be 1
Also, jps will not display DataNode
If jps doesnot display DataNode
$ /usr/local/hadoop-2.7.1/sbin/stop-dfs.sh
$ /usr/local/hadoop-2.7.1/sbin/stop-yarn.sh
$ rm -Rf
/app/hadoop/tmp/*
$ /usr/local/hadoop-2.7.1/bin/hdfs namenode
-format
$ /usr/local/hadoop-2.7.1/sbin/start-dfs.sh
$
/usr/local/hadoop-2.7.1/sbin/start-yarn.sh