Getting Started with HADOOP - Single Server, Multiple Node Simulation
Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.
1. Install Java SDK if not already
java -version
In my case java version "1.6.0_22" was available
So
whereis javac
If you don't have a recent java sdk installed download jdk and install
wget http://download.oracle.com/otn-pub/java/jdk/6u30-b12/jdk-6u30-linux-i586-rpm.bin
Make sure to note the location where you install for use later in setting JAVA_HOME in my case JAVA_HOME was set to /usr
2. Do some basic tasks to make life easy
mkdir /hadoop
useradd hadoop
passwd hadoop
groupadd hadoop
3. Get stable release of hadoop, install in to /hadoop/hadoop
cd /hadoop
wget http://ftp.heanet.ie/mirrors/www.apache.org/dist/hadoop/common/stable/hadoop-1.0.3.tar.gz
tar -xvf hadoop-1.0.3.tar.gz
mv hadoop-1.0.3 hadoop
export HADOOP_INSTALL=/hadoop/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
Set these up in your login profile to ensure they are set each time you login.
4.Setup hadoop local variables
cd /hadoop/hadoop/conf
vi hadoop-env.sh
export JAVA_HOME=/usr
verify its working
hadoop version
5. Setup local configuration files
mkdir -p /var/hadoop/cache/
vi /hadoop/hadoop/conf/core-site.xml
<?xml version="1.0"?>
<!-- core-site.xml -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop/cache</value>
</property>
</configuration>
vi /hadoop/hadoop/conf/hdfs-site.xml
<?xml version="1.0"?>
<!-- hdfs-site.xml -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
vi /hadoop/hadoop/conf/mapred-site.xml
<?xml version="1.0"?>
<!-- mapred-site.xml -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
6. Setup no-password required, SSH access to local machine
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh localhost # test
7. Format a new distributed file system
hadoop namenode -format
8. Start the hadoop daemons
start-all.sh
9. Test it out, create a folder /testdir, take files in the hadoop conf folder, with the .xml extension and move them in to testdir/conf
mkdir /testdir
cp conf/*.xml /testdir
cd /
hadoop fs -put testdir input
# Check if new distributed folder input exists
hadoop fs -ls
# Check contents of distributed folder
hadoop fs -ls input
10. To view namenode for the pseudo cluster
http://localhost:50070/dfshealth.jsp
To view the jobtracker for the pseudo cluster
http://localhost:50030/
11. To view contents of files you've added to the DFS
hadoop fs -cat input/conf/*
12. To remove files from DFS
hadoop fs -rm input/conf/*
Comments
Post a Comment