SparkSQL 字符串函数

it2023-01-25  56

SparkSQL 字符串函数

1.样例类的定义

case class Student(name:String,subject:String,score:String)

2.测试数据展示

val spark: SparkSession = SparkSession.builder().appName("test").master("local[*]").getOrCreate() //导入隐式包 import spark.implicits._ val df= spark.sparkContext.textFile("file:///E:\\MyEclipse\\java\\myfactory\\data\\data.txt").map(line =>{ val arr: Array[String] = line.split(",") Student(arr(0),arr(1),arr(2)) }).toDF() df.show(false) spark.close() +--------+-------+-----+ |name |subject|score| +--------+-------+-----+ |zhangsan|语文 |90 | |zhangsan|数学 |85 | |zhangsan|英语 |85 | |zhangsan|物理 |95 | |lisi |语文 |90 | |lisi |化学 |100 | |wangwu |物理 |95 | |wangwu |化学 |95 | +--------+-------+-----+

3.collect_set和collect_list

将同组的的字段合并成集合,collect_list(不去重),collect_set(去重) val df1: DataFrame = df.groupBy("name").agg(collect_list("subject").alias("subjects"),collect_set($"score").alias("scores")) .select("name", "subjects","scores") df1.show(false) +--------+----------------+------------+ |name |subjects |scores | +--------+----------------+------------+ |wangwu |[物理, 化学] |[95] | |zhangsan|[语文, 数学, 英语, 物理]|[95, 85, 90]| |lisi |[语文, 化学] |[100, 90] | +--------+----------------+------------+

4.concat_ws

作用1:将字段值为集合的值,按照指定符号进行拼接(集合转字符串) val df2: DataFrame = df1.withColumn("all_subjects", concat_ws("-", $"subjects")) .withColumn("all_scores",concat_ws("-",$"scores")) df2.show(false) +--------+----------------+------------+------------+----------+ |name |subjects |scores |all_subjects|all_scores| +--------+----------------+------------+------------+----------+ |wangwu |[物理, 化学] |[95] |物理-化学 |95 | |zhangsan|[语文, 数学, 英语, 物理]|[95, 85, 90]|语文-数学-英语-物理 |95-85-90 | |lisi |[语文, 化学] |[100, 90] |语文-化学 |100-90 | +--------+----------------+------------+------------+----------+ 作用2:将字段按照指定的字符进行拼接(字符串拼接) val test: DataFrame = df.withColumn("test", concat_ws("-", $"subject", $"score")) test.show(false) +--------+-------+-----+------+ | name|subject|score| test| +--------+-------+-----+------+ |zhangsan| 语文| 90| 语文-90| |zhangsan| 数学| 85| 数学-85| |zhangsan| 英语| 85| 英语-85| |zhangsan| 物理| 95| 物理-95| | lisi| 语文| 90| 语文-90| | lisi| 化学| 100|化学-100| | wangwu| 物理| 95| 物理-95| | wangwu| 化学| 95| 化学-95| +--------+-------+-----+------+

5.concat

将多个字段进行拼接 val df3: DataFrame = df.withColumn("info", concat($"name",$"subject", $"score")) df3.show(false) +--------+-------+-----+------------+ |name |subject|score|info | +--------+-------+-----+------------+ |zhangsan|语文 |90 |zhangsan语文90| |zhangsan|数学 |85 |zhangsan数学85| |zhangsan|英语 |85 |zhangsan英语85| |zhangsan|物理 |95 |zhangsan物理95| |lisi |语文 |90 |lisi语文90 | |lisi |化学 |100 |lisi化学100 | |wangwu |物理 |95 |wangwu物理95 | |wangwu |化学 |95 |wangwu化学95 | +--------+-------+-----+------------+
最新回复(0)