Getting Started with HADOOP - Single Server, Multiple Node Simulation


Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

1. Install Java SDK if not already
java -version
In my case java version "1.6.0_22" was available
So
whereis javac

If you don't have a recent java sdk installed download jdk and install
wget http://download.oracle.com/otn-pub/java/jdk/6u30-b12/jdk-6u30-linux-i586-rpm.bin

Make sure to note the location where you install for use later in setting JAVA_HOME in my case JAVA_HOME was set to /usr

2. Do some basic tasks to make life easy

mkdir /hadoop
useradd hadoop
passwd hadoop
groupadd hadoop

3. Get stable release of hadoop, install in to /hadoop/hadoop
cd /hadoop
wget http://ftp.heanet.ie/mirrors/www.apache.org/dist/hadoop/common/stable/hadoop-1.0.3.tar.gz

tar -xvf  hadoop-1.0.3.tar.gz

mv hadoop-1.0.3 hadoop
export HADOOP_INSTALL=/hadoop/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
Set these up in your login profile to ensure they are set each time you login.

4.Setup hadoop local variables

cd /hadoop/hadoop/conf
vi hadoop-env.sh
export JAVA_HOME=/usr

verify its working
hadoop version

5. Setup local configuration files

mkdir -p /var/hadoop/cache/

vi /hadoop/hadoop/conf/core-site.xml
<?xml version="1.0"?>
<!-- core-site.xml -->
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost/</value>
  </property>
 <property>
   <name>hadoop.tmp.dir</name>
   <value>/var/hadoop/cache</value>
 </property>
</configuration>


vi /hadoop/hadoop/conf/hdfs-site.xml

<?xml version="1.0"?>
<!-- hdfs-site.xml -->
<configuration>
  <property>
    <name>dfs.replication</name>

    <value>1</value>
  </property>
</configuration>

vi /hadoop/hadoop/conf/mapred-site.xml
<?xml version="1.0"?>
<!-- mapred-site.xml -->
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:8021</value>
  </property>
</configuration>

6. Setup no-password required, SSH access to local machine
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh localhost # test

7. Format a new distributed file system
hadoop namenode -format

8. Start the hadoop daemons
start-all.sh

9. Test it out, create a folder /testdir, take files in the hadoop conf folder, with the .xml extension and move them in to testdir/conf
mkdir /testdir
cp conf/*.xml /testdir
cd /
hadoop fs -put testdir input
# Check if new distributed folder input exists
hadoop fs -ls
# Check contents of distributed folder
hadoop fs -ls input

10. To view namenode for the pseudo cluster

http://localhost:50070/dfshealth.jsp

To view the jobtracker for the pseudo cluster
http://localhost:50030/

11. To view contents of files you've added to the DFS
hadoop fs -cat input/conf/*

12. To remove files from DFS
hadoop fs -rm input/conf/*

Comments

Popular posts from this blog

Basic Send Message to MQ with Java and IBM MQ JMS

Basic Receive Message to MQ with Java and IBM MQ JMS

Configure Database Connection using MyBatis