搭建hadoop集群

相关设备

主机 系统 hostname ip
master CentOS7.2 master.hadoop 192.168.142.129
slave1 CentOS7.2 1.slave.hadoop 192.168.142.131
slave2 CentOS7.2 2.slave.hadoop 192.168.142.132
slave3 CentOS7.2 3.slave.hadoop 192.168.142.133

所有节点上的操作

配置hosts

vim /etc/hosts/

#添加以下内容
192.168.142.129 master.hadoop
192.168.142.131 1.slave.hadoop
192.168.142.132 2.slave.hadoop
192.168.142.133 3.slave.hadoop

创建用户、用户组和相关目录

  • 创建用户组和用户
    
    groupadd hadoop
    useradd -d /home/hadoop -m hadoop -g hadoop

为hadoop用户设置密码

passwd hadoop

生成密钥

sudo -Hu hadoop ssh-keygen -t rsa


* 创建相关目录
``` shell
mkdir -p /hadoop/tmp
mkdir -p /hadoop/hdfs/data
mkdir -p /hadoop/hdfs/name
chown -R hadoop:hadoop /hadoop/

安装ntp服务及其它相关程序

yum install ntp -y
systemctl enable ntpd
systemctl start ntpd

yum install openssl-devel -y

安装JDK

  • 卸载centos自带的openjdk
#查看已安装的jdk
rpm -qa | grep jdk

#输出示例
java-1.8.0-openjdk-1.8.0.121-0.b13.el7_3.x86_64
java-1.8.0-openjdk-headless-1.8.0.121-0.b13.el7_3.x86_64
java-1.7.0-openjdk-1.7.0.131-2.6.9.0.el7_3.x86_64
java-1.7.0-openjdk-headless-1.7.0.131-2.6.9.0.el7_3.x86_64
copy-jdk-configs-1.2-1.el7.noarch

#卸载
yum remove java-1.7.0-openjdk-headless.x86_64 -y
yum remove java-1.7.0-openjdk -y
yum remove java-1.8.0-openjdk-headless-1.8.0.121-0.b13.el7_3.x86_64 -y
yum remove java-1.8.0-openjdk -y
cd /usr/src/
wget http://download.oracle.com/otn-pub/java/jdk/8u121-b13/e9e7ea248e2c4826b92b3f075a80e441/jdk-8u121-linux-x64.rpm?AuthParam=1489989096_11dbb8b04d10d8c53d34c4aea30bdd71
#上面的这个链接是有有效期的,只能临时用用

#下载后将文件重命名为jdk1.8.0_121.rpm
mv jdk-8u121-linux-x64.rpm?AuthParam=1489989096_11dbb8b04d10d8c53d34c4aea30bdd71 jdk1.8.0_121.rpm

#试过用wget -O参数直接重命名,但是这样不知道为什么,下载速度很慢

rpm -ivh jdk1.8.0_121.rpm
#安装地址  /usr/java/jdk1.8.0_121
#版本不同,安装地址也会有所不同
  • 配置JAVA系统环境变量
vim /etc/profile.d/java.sh

添加以下内容

#!/bin/bash
JAVA_HOME=/usr/java/jdk1.8.0_121
JRE_HOME=$JAVA_HOME/jre
PATH=$PATH:$JAVA_HOME/bin
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar
export PATH JAVA_HOME JRE_HOME CLASSPATH

立即生效并查看

source /etc/profile.d/java.sh
java -version

安装hadoop

  • 下载hadoop
cd /usr/src/
wget http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz

tar -zxvf hadoop-2.7.3.tar.gz

#拷贝至特定目录
cp /usr/src/hadoop-2.7.3/ /usr/local/hadoop/ -R

chown -R hadoop:hadoop /usr/local/hadoop/
  • 配置hadoop环境变量
vim /etc/profile.d/hadoop.sh

添加以下内容

#!/bin/bash
HADOOP_HOME=/usr/local/hadoop
PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export PATH HADOOP_HOME

立即生效

source /etc/profile.d/hadoop.sh
hadoop version

输出示例

Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.3.jar

配置hadoop

HADOOP_HOME=/usr/local/hadoop
因为的配置了JAVA_HOME 系统环境变量,所以无需修改hadoop-env.sh和yarn-env.sh文件手动配置JAVA_HOME

  • core-site.xml
vim /usr/local/hadoop/etc/hadoop/core-site.xml
<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/hadoop/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master.hadoop:9000</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>4096</value>
    </property>
</configuration>
  • hdfs-site.xml
vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/hadoop/hdfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/hadoop/hdfs/data</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>master.hadoop:9001</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
</configuration>
  • /usr/local/hadoop/etc/hadoop/mapred-site.xml
    cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
    vim /usr/local/hadoop/etc/hadoop/mapred-site.xml
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        <final>true</final>
    </property>
    <property>
        <name>mapreduce.jobtracker.http.address</name>
        <value>master.hadoop:50030</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>master.hadoop:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>master.hadoop:19888</value>
    </property>
    <property>
         <name>mapred.job.tracker</name>
         <value>http://master.hadoop:9001</value>
    </property>
</configuration>
  • yarn-site.xml
    vim /usr/local/hadoop/etc/hadoop/yarn-site.xml
<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master.hadoop</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>master.hadoop:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>master.hadoop:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>master.hadoop:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>master.hadoop:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>master.hadoop:8088</value>
    </property>
</configuration>

在各节点上的操作

进行以下步骤前,请确保以上步骤已在所有节点上(包括master)操作完毕

  • master节点上的操作
#配置slaves
vim /usr/local/hadoop/etc/hadoop/slaves

#删掉原先的locahost,添加以下内容
1.slave.hadoop
2.slave.hadoop
3.slave.hadoop
#ssh免密码登陆
#逐行操作,每步会需要hadoop用户账户的密码
su hadoop
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub hadoop@master.hadoop
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub hadoop@1.slave.hadoop
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub hadoop@2.slave.hadoop
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub hadoop@3.slave.hadoop

# 测试连通性
su hadoop
ssh master.hadoop
exit
ssh 1.slave.hadoop
exit
ssh 2.slave.hadoop
exit
ssh 3.slave.hadoop
exit

基本使用

在master节点操作

cd /usr/local/hadoop/sbin/

#切换为hadoop用户进行操作
su hadoop

#启动
./start-all.sh

#停止
./stop-all.sh
添加新评论