Getting Started with HADOOP - Single Server, Multiple Node Simulation

Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

1. Install Java SDK if not already
java -version
In my case java version "1.6.0_22" was available
whereis javac

If you don't have a recent java sdk installed download jdk and install

Make sure to note the location where you install for use later in setting JAVA_HOME in my case JAVA_HOME was set to /usr

2. Do some basic tasks to make life easy

mkdir /hadoop
useradd hadoop
passwd hadoop
groupadd hadoop

3. Get stable release of hadoop, install in to /hadoop/hadoop
cd /hadoop

tar -xvf  hadoop-1.0.3.tar.gz

mv hadoop-1.0.3 hadoop
export HADOOP_INSTALL=/hadoop/hadoop
Set these up in your login profile to ensure they are set each time you login.

4.Setup hadoop local variables

cd /hadoop/hadoop/conf
export JAVA_HOME=/usr

verify its working
hadoop version

5. Setup local configuration files

mkdir -p /var/hadoop/cache/

vi /hadoop/hadoop/conf/core-site.xml
<?xml version="1.0"?>
<!-- core-site.xml -->

vi /hadoop/hadoop/conf/hdfs-site.xml

<?xml version="1.0"?>
<!-- hdfs-site.xml -->


vi /hadoop/hadoop/conf/mapred-site.xml
<?xml version="1.0"?>
<!-- mapred-site.xml -->

6. Setup no-password required, SSH access to local machine
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/ >> ~/.ssh/authorized_keys
ssh localhost # test

7. Format a new distributed file system
hadoop namenode -format

8. Start the hadoop daemons

9. Test it out, create a folder /testdir, take files in the hadoop conf folder, with the .xml extension and move them in to testdir/conf
mkdir /testdir
cp conf/*.xml /testdir
cd /
hadoop fs -put testdir input
# Check if new distributed folder input exists
hadoop fs -ls
# Check contents of distributed folder
hadoop fs -ls input

10. To view namenode for the pseudo cluster


To view the jobtracker for the pseudo cluster

11. To view contents of files you've added to the DFS
hadoop fs -cat input/conf/*

12. To remove files from DFS
hadoop fs -rm input/conf/*


Popular posts from this blog

ActiveMQ, easy to use open source message oriented middleware (MOM)

Basic Send Message to MQ with Java and IBM MQ JMS

Apache Apollo