Getting Started with HADOOP - Single Server, Multiple Node Simulation


Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

1. Install Java SDK if not already
java -version
In my case java version "1.6.0_22" was available
So
whereis javac

If you don't have a recent java sdk installed download jdk and install
wget http://download.oracle.com/otn-pub/java/jdk/6u30-b12/jdk-6u30-linux-i586-rpm.bin

Make sure to note the location where you install for use later in setting JAVA_HOME in my case JAVA_HOME was set to /usr

2. Do some basic tasks to make life easy

mkdir /hadoop
useradd hadoop
passwd hadoop
groupadd hadoop

3. Get stable release of hadoop, install in to /hadoop/hadoop
cd /hadoop
wget http://ftp.heanet.ie/mirrors/www.apache.org/dist/hadoop/common/stable/hadoop-1.0.3.tar.gz

tar -xvf  hadoop-1.0.3.tar.gz

mv hadoop-1.0.3 hadoop
export HADOOP_INSTALL=/hadoop/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
Set these up in your login profile to ensure they are set each time you login.

4.Setup hadoop local variables

cd /hadoop/hadoop/conf
vi hadoop-env.sh
export JAVA_HOME=/usr

verify its working
hadoop version

5. Setup local configuration files

mkdir -p /var/hadoop/cache/

vi /hadoop/hadoop/conf/core-site.xml
<?xml version="1.0"?>
<!-- core-site.xml -->
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost/</value>
  </property>
 <property>
   <name>hadoop.tmp.dir</name>
   <value>/var/hadoop/cache</value>
 </property>
</configuration>


vi /hadoop/hadoop/conf/hdfs-site.xml

<?xml version="1.0"?>
<!-- hdfs-site.xml -->
<configuration>
  <property>
    <name>dfs.replication</name>

    <value>1</value>
  </property>
</configuration>

vi /hadoop/hadoop/conf/mapred-site.xml
<?xml version="1.0"?>
<!-- mapred-site.xml -->
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:8021</value>
  </property>
</configuration>

6. Setup no-password required, SSH access to local machine
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh localhost # test

7. Format a new distributed file system
hadoop namenode -format

8. Start the hadoop daemons
start-all.sh

9. Test it out, create a folder /testdir, take files in the hadoop conf folder, with the .xml extension and move them in to testdir/conf
mkdir /testdir
cp conf/*.xml /testdir
cd /
hadoop fs -put testdir input
# Check if new distributed folder input exists
hadoop fs -ls
# Check contents of distributed folder
hadoop fs -ls input

10. To view namenode for the pseudo cluster

http://localhost:50070/dfshealth.jsp

To view the jobtracker for the pseudo cluster
http://localhost:50030/

11. To view contents of files you've added to the DFS
hadoop fs -cat input/conf/*

12. To remove files from DFS
hadoop fs -rm input/conf/*

Comments

Popular posts from this blog

Basic Send Message to MQ with Java and IBM MQ JMS

Basic Receive Message to MQ with Java and IBM MQ JMS

Creating a simple Alert / Success Message with ASP.NET/VB using Bootstrap