scala快速上手(十) Spark WordCount

it2023-05-28  72

WordCount

准备 Maven,修改 Maven 的 settings.xml 配置,增加阿里云的仓库。

<mirror> <id>aliyunmaven</id> <mirrorOf>*</mirrorOf> <name>阿里云公共仓库</name> <url>https://maven.aliyun.com/repository/public</url> </mirror>

1. 创建 Maven 项目导入 pom.xml 文件

安装 Maven 仓库管理工具,版本要求是 3.2 版本以上。新建 Maven 项目,配置 pom.xml。导入必要的包。

2. Spark-Scala 版本的 WordCount

val conf = new SparkConf() conf.setMaster("local") conf.setAppName("scala-wc") val sc = new SparkContext(conf) val lines = sc.textFile("./data/words") val words = lines.flatMap(line=>{line.split(" ")}) val pairWords = words.map(word=>{new Tuple2(word,1)}) val result = pairWords.reduceByKey((v1:Int,v2:Int)=>{v1+v2}) result.foreach(println)

3. Spark-Java 版本的 WordCount

SparkConf conf = new SparkConf(); conf.setMaster("local"); conf.setAppName("java-wc"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> lines = sc.textFile("./data/words"); JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() { @Override public Iterator<String> call(String s) throws Exception { String[] split = s.split(" "); return Arrays.asList(split).iterator(); } }); JavaPairRDD<String, Integer> pairWords = words.mapToPair(new PairFunction<String, String, Integer>() { @Override public Tuple2<String, Integer> call(String word) throws Exception { return new Tuple2<>(word, 1); } }); JavaPairRDD<String, Integer> result =pairWords.reduceByKey(new Function2<Integer, Integer,Integer>() { @Override public Integer call(Integer v1, Integer v2) throws Exception{ return v1 + v2; } }); result.foreach(new VoidFunction<Tuple2<String, Integer>>() { @Override public void call(Tuple2<String, Integer> tuple2) throws Exception { System.out.println(tuple2); } }); sc.stop();

 

最新回复(0)