Big Data: December 2015

When I started installation of Hadoop, I looked at it as a mountain. But eventually figured out that with the right instructions all it takes is 5-10 minutes. This blog just tries to show how to install hadoop without much explanation as to the need for each line.

Install Java

Check if you have the latest version of java using the command

$ java -version

At the time of publishing this blog, the latest compatible version was java-7.

If you do not have the latest version installed, then install using the command

$ sudo apt-get install openjdk-7-jdk

Install ssh

$ sudo apt-get install ssh

Configure ssh

$ ssh-keygen -t rsa -P ""

You are prompted

"Enter file in which to save the key(/root/ssh/id_rsa)"

Do not give any name and just press the enter key

$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

$ ssh localhost

Download and Extract Hadoop

Download hadoop 2.7.1 binary tarball from http://hadoop.apache.org/releases.html

Extract and save in /usr/local

$ cd /usr/local

$ sudo tar -xzf hadoop-2.7.1.tar.gz

Update .bashrc

It can be found in ~. If you are unable to list it with ls, use

$ ls -al

$ vi .bashrc

# Set JAVA_HOME - check location in your system

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

# Set Hadoop-related environment variables

export HADOOP_HOME=/usr/local/hadoop-2.7.1

# Add Hadoop bin/ directory to PATH

export PATH=$PATH:$HADOOP_HOME/bin

# Some convenient aliases and functions for running Hadoop-related commands

unalias fs &> /dev/null

alias fs="hadoop fs"

unalias hls &> /dev/null

alias hls="fs -ls"

# If you have LZO compression enabled in your Hadoop cluster and

# compress job outputs with LZOP (not covered in this tutorial):

# Conveniently inspect an LZOP compressed file from the command

# line; run via:

# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo

# Requires installed 'lzop' command.

lzohead () {

hadoop fs -cat $1 | lzop -dc | head -1000 | less

}

Update hadoop-env.sh

The location in my system : /usr/local/hadoop-2.7.1/etc/hadoop/hadoop-env.sh

$ vi /usr/local/hadoop-2.7.1/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

Update core-site.xml

#The location in my system : /usr/local/hadoop-2.7.1/etc/hadoop

# Copy within <configuration> ... </configuration>

<name>hadoop.tmp.dir</name>

<value>/app/hadoop/tmp</value>

<description>A base for other temporary directories.</description>

</property>

<name>fs.default.name</name>

<value>hdfs://localhost:54310</value>

<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming

the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.

</description>

</property>

Update mapred-site.xml

#The location of mapred-site.xml.template in my system : /usr/local/hadoop-2.7.1/etc/hadoop

#Rename mapred-site.xml.template to mapred-site.xml

<name>mapred.job.tracker</name>

<value>localhost:54311</value>

<description>The host and port that the MapReduce job tracker runs

at. If "local", then jobs are run in-process as a single map

and reduce task.

</description>

</property>

Update hdfs-site.xml

#The location in my system : /usr/local/hadoop-2.7.1/etc/hadoop

<name>dfs.replication</name>

<description>Default block replication.

The actual number of replications can be specified when the file is created.

The default is used if replication is not specified in create time.

</description>

</property>

Formatting the HDFS filesystem via the NameNode

$ /usr/local/hadoop-2.7.1/bin/hdfs namenode -format

Starting Hadoop

$ /usr/local/hadoop-2.7.1/sbin/start-dfs.sh

$ /usr/local/hadoop-2.7.1/sbin/start-yarn.sh

Check if it is working right

$ jps

# ResourceManager, Jps, SecondaryNameNode, NodeManager, NameNode, DataNode

# Note that JobTracker and TaskTracker have been replaced with ResourceManager and NodeManager

To Stop

$ /usr/local/hadoop-2.7.1/sbin/stop-dfs.sh

$ /usr/local/hadoop-2.7.1/sbin/stop-yarn.sh

To Check the Status

Type the following in browser http://localhost:50070/

It should show the namenodes, live nodes should be 1

Also, jps will not display DataNode

If jps doesnot display DataNode

$ /usr/local/hadoop-2.7.1/sbin/stop-dfs.sh

$ /usr/local/hadoop-2.7.1/sbin/stop-yarn.sh

$ rm -Rf /app/hadoop/tmp/*

$ /usr/local/hadoop-2.7.1/bin/hdfs namenode -format

$ /usr/local/hadoop-2.7.1/sbin/start-dfs.sh

$ /usr/local/hadoop-2.7.1/sbin/start-yarn.sh

Big Data

Tuesday, 15 December 2015

Installation of Hadoop 2.7.1 in Ubuntu 14.04 in a Pseudo mode - for Naive Users

Install Java

Install ssh

Configure ssh

Download and Extract Hadoop

Update .bashrc

Update hadoop-env.sh

Update core-site.xml

Update mapred-site.xml

Update hdfs-site.xml

Formatting the HDFS filesystem via the NameNode

Starting Hadoop

Check if it is working right

To Stop

To Check the Status

If jps doesnot display DataNode