sparksqlError in query: resolved attribute(s)报错分析

it2024-05-13  51

sparksql Error in query: resolved attribute(s)报错分析

项目场景:

sparksql Error in query: resolved attribute(s) XX#4245L missing from XX#4520,XX#684L 错误


问题描述:

提示:spark实际生产环境中 使用sparksql进行同表关联时 报错代码如下

select distinct a.imsi from ( select imsi,ho_time,longitude,latitude from ( select imsi,ho_time,longitude,latitude,row_number() over (partition by imsi order by ho_time) as num from tmp_highrail_users )s where s.num=1 ) a left join ( select imsi,ho_time,longitude,latitude from ( select imsi,ho_time,longitude,latitude,row_number() over (partition by imsi order by ho_time) as num from tmp_highrail_users )s where s.num=2 )b on a.imsi=b.imsi

报错代码

Error in query: resolved attribute(s) ho_time#4245L missing from imsi#4520,ho_time#684L,station_key#824,longitude#4399,section_id#678,cellkey#677,h_railway_name#680,msisdn#4522,hr_distance#682,h_railway_station#681,latitude#4400,user_use_times#683L,day#4382 in operator !Project [day#4382, ho_time#4245L, imsi#4520, msisdn#4522, h_railway_name#680, longitude#4399, latitude#4400];; Aggregate [count(1) AS count(1)#4605L] +- Join LeftOuter, (imsi#1199 = imsi#4520) :- SubqueryAlias a : +- Project [imsi#1199, ho_time#684L, longitude#816, latitude#817] : +- Filter (num#4240 = 1) : +- SubqueryAlias s : +- Project [imsi#1199, ho_time#684L, longitude#816, latitude#817, num#4240] : +- Project [imsi#1199, ho_time#684L, longitude#816, latitude#817, num#4240, num#4240] : +- Window [row_number() windowspecdefinition(imsi#1199, ho_time#684L ASC NULLS FIRST, ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS num#4240], [imsi#1199], [ho_time#684L ASC NULLS FIRST] : +- Project [imsi#1199, ho_time#684L, longitude#816, latitude#817] : +- SubqueryAlias tmp_highrail_users : +- Project [day#799, ho_time#684L, imsi#1199, msisdn#1201, 1 AS is_railway_user#1192, h_railway_name#680 AS stand_railway_name#1193, longitude#816, latitude#817]

原因分析:

根据报错,猜测:以上代码作为子查询,将结果供父查询时, 父查询没有解析到子查询结果中的字段。

联想到曾经在hive官网上看到,在join或者union时,必须指定字段别名,否则会丢失数据。


解决方案:

代码起别名 修改代码如下

select a.aimsi as imsi from ( select s.imsi as aimsi, s.ho_time as aho_time, s.longitude as alongitude, s.latitude as alatitude from ( select imsi,ho_time,longitude,latitude,row_number() over (partition by imsi order by ho_time) as num from tmp_highrail_users )s where s.num=1 ) a left join ( select s.imsi as bimsi, s.ho_time as bho_time, s.longitude as blongitude, s.latitude as blatitude from ( select imsi,ho_time,longitude,latitude,row_number() over (partition by imsi order by ho_time) as num from tmp_highrail_users )s where s.num=2 )b on a.aimsi = b.bimsi
最新回复(0)