MapReduce之OutputFormat

it2023-11-10  82

TextOutputFormat

默认的OutputFormat。每一行输出key.toString()+"\t"+value.toString()。输出为文本文件。


SequenceFileOutputFormat

每一行输出文件路径+文件内容。输出为二进制文件。常用于后续MapReduce的输入。


自定义OutputFormat

1)MyOutputFormat类

//k类型为T1,v类型为T2 public class MyOutputFormat extends FileOutputFormat<T1, T2> { //返回一个RecordWriter的实现类对象 @Override public RecordWriter<T1, T2> getRecordWriter(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException { return new MyRecordWriter(taskAttemptContext); } }

2)MyRecordWriter类

public class MyRecordWriter extends RecordWriter<T1, T2> { private FSDataOutputStream fSDataOutputStream; //HDFS输出流 //在构造器中开流 public MyRecordWriter(TaskAttemptContext taskAttemptContext) throws IOException { Configuration configuration = taskAttemptContext.getConfiguration(); //获取当前配置信息 FileSystem fileSystem = FileSystem.get(configuration); //获取当前HDFS String outputDir = configuration.get(FileOutputFormat.OUTDIR); //获取HDFS的输出目录 fSDataOutputStream = fileSystem.create(new Path(outputDir + "myOutputFile")); //自定义输出目录 } //自定义输出 @Override public void write(T1 k, T2 v) throws IOException, InterruptedException { } //关流 @Override public void close(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException { IOUtils.closeStream(fSDataOutputStream); } }

3)在Driver类中修改OutputFormat

job.setOutputFormatClass(MyOutputFormat.class);

 

最新回复(0)