Hi, Here i am going to show you , How to setup up a pseudo distributed (Single Node) hadoop cluster on Ubuntu VM.
check java installation by giving command - 'java -version '
Command is - ' $>sudo apt-get install openssh-server '
it might prompt you about disk size and ask you if you want to continue or not, just type 'y'
or type this command on terminal
' $>wget http://archive.apache.org/dist/hadoop/core/hadoop-1.2.0/hadoop-1.2.0.tar.gz'
wait for few mins till it gets downloaded then extract it
command is - ' $>tar -xvf hadoop-1.2.0.tar.gz '
Prerequisite -
- understanding of Hadoop
- VM ware player
- ubuntu
Things you need
- VMWare Player, Just a simple google will take you to vmware web site, which will give you info and download link to the latest version of the Virtual player. Download and install it.
- ubuntu VM Image, again google will help you here, (the version I used is ubuntu-14.04).
- and off course a Laptop :-).
setting up VM-
after installing vmware player and extracting ubuntu to Directory of your choice, double click the vmware icon on your Desktop, click on open virtual machine and go to the directory where you have extracted ubuntu, you will find a ubuntu.vmx file, double click on it. and then Play the VM. (you can later edit the settings of VM, if you want to).
next thing you need is to install java development kit (jdk), as Hadoop requires java installations. for hadoop-1.x cluster jdk-6 version is better, we are seeing installation of hadoop1.2.0 here as Hadoop-2.x needs some extra configurations- (YARN settings etc.)
Updating ubuntu
Go to terminal and give this command - ' $>sudo apt-get update '.
type 'password', when prompted to enter password - "(Note that the user name is 'user' and password is 'password' for it )"
Installing JDK
check java installation by giving command - 'java -version '
SSH Settings
Hadoop need password-less access, as jobtracker , namenodes needs to frequently communicate to tasktrackers and datanodes. so lets install an openssh-server now.Command is - ' $>sudo apt-get install openssh-server '
it might prompt you about disk size and ask you if you want to continue or not, just type 'y'
Download Hadoop
now download and extract Hadoop - either go to http://archive.apache.org/dist/hadoop/core/hadoop-1.2.0or type this command on terminal
' $>wget http://archive.apache.org/dist/hadoop/core/hadoop-1.2.0/hadoop-1.2.0.tar.gz'
wait for few mins till it gets downloaded then extract it
command is - ' $>tar -xvf hadoop-1.2.0.tar.gz '
configuring hadoop
There are some files for controlling the configuration of hadoop installation, but at this point we require only 4 which are.
Command- ' $>sudo gedit hadoop-1.2.0/conf/core-site.xml '
and just type it in gedit window for core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>
- core-site.xml
- hdfs-site.xml
- mapred-site.xml
- hadoop-env.sh
core-site.xml
It contains configuration information for default values for core Hadoop Properties.
and just type it in gedit window for core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>
hdfs-site.xml
for configuration settings for hdfs Demons - "namenode, secondary namenode, datanode. ", it is single node cluster so replication factor is set to '1'.
Command - ' $>sudo gedit hadoop-1.2.0/conf/hdfs-site.xml '
modify below properties in it
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
mapred-site.xml:
It is for cofiguration settings for mapreduce demons- " Jobtracker, and tasktrackers
Command- ' $>sudo gedit hadoop-1.2.0/conf/mapred-site.xml '
modify bellow properties under <configuration /> block
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
Getting IP address of your machine
Command- ' $>ifconfig '
then note down the IP address written after 'inet addr: ' on your terminal.
now edit etc/hosts, command for it is as follows
command- ' $>sudo gedit /etc/hosts '
in host file type your IP address which you have noted down a while ago and type localhost after space, save and close the window.
Creating ssh key
Command: ssh-keygen -t rsa –P ""
Now move the key to authorized key by the following command
Command: cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Reboot the VM
configuration for hadoop-env.sh
Command: sudo gedit hadoop-1.2.0/conf/hadoop-env.sh
Now set the path for JAVA_HOME in hadoop-env.sh file , Un-comment the shown export in it and add the below the path.
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386
now change the directory to your hadoop installations ex. cd hadoop-1.2.0
format the namenode by command
bin/hadoop namenode -format
now type command
bin/start-dfs.sh
this will start Hadoop dfs demon, after starting it now start map-reduce demon- (jobtracker and tasktrackers) , command is as follows
bin/start-mapred.sh
Thats it :-) If you want to check your hadoop cluster has started correctly or not just shoot the command jps.
thats it for single node hadoop 1 cluster setup on vmware player, next we will try to setup a multi node hadoop-2.6 cluster, watch out for next post
thats it for single node hadoop 1 cluster setup on vmware player, next we will try to setup a multi node hadoop-2.6 cluster, watch out for next post
Thanks bro!
ReplyDelete