我们这里通过oozie调度一个MR的程序的执行,MR的程序可以是自己写的,也可以是hadoop工程自带的,我们这里就选用hadoop工程自带的MR程序来运行wordcount的示例
准备以下数据上传到HDFS的/oozie/input路径
hdfs dfs -mkdir -p /oozie/input vim wordcount.txt hello world hadoop spark hive hadoop将数据上传到hdfs对应目录
hdfs dfs -put wordcount.txt /oozie/input将需要调度的资源都准备好放到一个文件夹下面去,包括jar包,job.properties,以及workflow.xml。 拷贝MR的任务模板
cd /export/servers/oozie-4.1.0-cdh5.14.0 cp -ra examples/apps/map-reduce/ oozie_works/删掉MR任务模板lib目录下自带的jar包
cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib rm -rf oozie-examples-4.1.0-cdh5.14.0.jar从上一步的删除当中,可以看到需要调度的jar包存放在了 /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib这个目录下,所以我们把我们需要调度的jar包也放到这个路径下即可
cp /export/servers/hadoop-2.6.0-cdh5.14.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.0.jar /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib/修改job.properties
cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce vim job.properties nameNode=hdfs://node01:8020 jobTracker=node01:8032 queueName=default examplesRoot=oozie_works oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/map-reduce/workflow.xml outputDir=/oozie/output inputDir=/oozie/input修改workflow.xml
cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce vim workflow.xml <?xml version="1.0" encoding="UTF-8"?> <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <workflow-app xmlns="uri:oozie:workflow:0.5" name="map-reduce-wf"> <start to="mr-node"/> <action name="mr-node"> <map-reduce> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/${outputDir}"/> </prepare> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <!-- <property> <name>mapred.mapper.class</name> <value>org.apache.oozie.example.SampleMapper</value> </property> <property> <name>mapred.reducer.class</name> <value>org.apache.oozie.example.SampleReducer</value> </property> <property> <name>mapred.map.tasks</name> <value>1</value> </property> <property> <name>mapred.input.dir</name> <value>/user/${wf:user()}/${examplesRoot}/input-data/text</value> </property> <property> <name>mapred.output.dir</name> <value>/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}</value> </property> --> <!-- 开启使用新的API来进行配置 --> <property> <name>mapred.mapper.new-api</name> <value>true</value> </property> <property> <name>mapred.reducer.new-api</name> <value>true</value> </property> <!-- 指定MR的输出key的类型 --> <property> <name>mapreduce.job.output.key.class</name> <value>org.apache.hadoop.io.Text</value> </property> <!-- 指定MR的输出的value的类型--> <property> <name>mapreduce.job.output.value.class</name> <value>org.apache.hadoop.io.IntWritable</value> </property> <!-- 指定输入路径 --> <property> <name>mapred.input.dir</name> <value>${nameNode}/${inputDir}</value> </property> <!-- 指定输出路径 --> <property> <name>mapred.output.dir</name> <value>${nameNode}/${outputDir}</value> </property> <!-- 指定执行的map类 --> <property> <name>mapreduce.job.map.class</name> <value>org.apache.hadoop.examples.WordCount$TokenizerMapper</value> </property> <!-- 指定执行的reduce类 --> <property> <name>mapreduce.job.reduce.class</name> <value>org.apache.hadoop.examples.WordCount$IntSumReducer</value> </property> <!-- 配置map task的个数 --> <property> <name>mapred.map.tasks</name> <value>1</value> </property> </configuration> </map-reduce> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>执行调度任务,然后通过oozie的11000端口进行查看任务结果
cd /export/servers/oozie-4.1.0-cdh5.14.0 bin/oozie job -oozie http://hadoop01:11000/oozie -config oozie_works/map-reduce/job.properties -run任务运行成功了,但是oozie那块儿一直在running,不知道是什么bug。
nice!