Hive高级应用(二)

it2025-01-17  3

目录

九、Hive中表的类型-外部表十 、管理表和外部表的区别以及应用场景十一、分区表(1)分区表概念(2)分区表创建一(3)分区表创建二 十二、Hive分区表在企业中的应用十三、Hive表数据的六种加载方式十四、Hive结果数据的保存方式

九、Hive中表的类型-外部表

管理表/内部表:MANAGED_TABLE外部表:EXTERNAL_TABLE CREATE EXTERNAL TABLE IF NOT EXISTS db_name.table_name (col_name data_type ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; eg: CREATE EXTERNAL TABLE student_ext ( userid string, username string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

hive表,两份数据:1.HDFS源数据 2.metastore数据/元数据

十 、管理表和外部表的区别以及应用场景

1.管理表和外部表的区别:

drop table: 管理表 -> HDFS(删除) + metastore元数据 (删除) 外部表 -> HDFS(不删除) + metastore元数据 (删除)

2.应用场景:一份网站的日志数据,用于共享日志数据

统计分析 (日志数据) -> HDFS(共享日志数据) eg: CREATE EXTERNAL TABLE student_ext_1 ( userid string, username string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/user/hive/warehouse/db_hive.db/student'; 用户推荐 (日志数据) -> HDFS(共享日志数据) eg: CREATE EXTERNAL TABLE student_ext_2 ( userid string, username string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/user/hive/warehouse/db_hive.db/student';

十一、分区表

(1)分区表概念

1.分区概念 将一个大表的数据根据一定的条件拆分成若干个小表,每个小表对应一个分区

2.为什么要分区

数据检索性能最大化数据并行计算的最大化

3.Hive中分区表对应一个单独的文件夹,这个这件夹中的数据就是分区数据

在企业实际应用中,hive表基本都要做分区,例如下面案例:

订单表:hive表(select * from order) 0001,henry,2018-09-09,product-1,50 0002,chenry,2018-10-09,product-2,50 0003,ben,2018-09-09,product-3,50 0004,cherry,2018-10-09,product-4,50 0005,jone,2018-10-09,product-5,50 0006,lili,2018-09-09,product-6,50 0007,chen,2018-10-09,product-7,50 0008,wiwi,2018-09-09,product-8,50 订单分区表:Hive表(201809) (select * from order where order_date = 201809) 0001,henry,2018-09-09,product-1,50 0003,ben,2018-09-09,product-3,50 0006,lili,2018-09-09,product-6,50 0008,wiwi,2018-09-09,product-8,50 订单分区表:Hive表(201810) (select * from order where order_date = 201810) 0002,chenry,2018-10-09,product-2,50 0004,cherry,2018-10-09,product-4,50 0005,jone,2018-10-09,product-5,50 0007,chen,2018-10-09,product-7,50

(2)分区表创建一

1.创建分区表:标准用法

CREATE TABLE order_partition ( userid string, username string, order_date string, product_name string, price string ) PARTITIONED BY (month string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

2.加载数据

load data local inpath '/opt/datas/201809.txt' into table order_partition partition(month = '201809'); load data local inpath '/opt/datas/201810.txt' into table order_partition partition(month = '201810');

3.查看数据

select * from order_partition;

(3)分区表创建二

创建顺序: no partition

1.HDFS上创建数据表的目录

dfs -mkdir -p /user/hive/warehouse/db_hive.db/nopartition;

2.加载数据(put)

dfs -put /opt/datas/order0910.txt /user/hive/warehouse/db_hive.db/nopartition;

3.创建表

CREATE TABLE nopartition ( userid string, username string, order_date string, product_name string, price string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

4.查看数据

select * from nopartition;

partition

1.HDFS上创建数据表的目录

dfs -mkdir -p /user/hive/warehouse/db_hive.db/order_par/month=201809; dfs -mkdir -p /user/hive/warehouse/db_hive.db/order_par/month=201810;

2.加载数据(put)

dfs -put /opt/datas/201809.txt /user/hive/warehouse/db_hive.db/order_par/month=201809; dfs -put /opt/datas/201810.txt /user/hive/warehouse/db_hive.db/order_par/month=201810;

3.创建表

create table order_par(userid string,username string,order_date string,product_name string,price string) partitioned by (month string) row format delimited fields terminated by ',' stored as textfile;

4.创建元数据

alter table order_par add partition(month = '201809'); alter table order_par add partition(month = '201810');

5.查看数据

select * from order_par;

十二、Hive分区表在企业中的应用

日志数据:每天都有一份,我们要所有的数据加载到Hive表中并进行分析。 Hive表如何设计:

分区表(month + day + time)

第一种方案(不可取)

1.创建表 2.加载数据

load data local inpath '数据源' into table 数据表 partition(month = '01' ,day = '01',time ='00')

第二种方案(经常采用的),采用脚本的方式

1.hdfs上创建数据表目录

dfs -mkdir -p /user/hive/warehouse/db_hive.db/order_part/month=201809/day=01/time=00; dfs -mkdir -p /user/hive/warehouse/db_hive.db/order_part/month=201809/day=01/time=01; dfs -mkdir -p /user/hive/warehouse/db_hive.db/order_part/month=201809/day=01/time=02; dfs -mkdir -p /user/hive/warehouse/db_hive.db/order_part/month=201809/day=01/time=03;

2.加载数据

dfs -put /opt/datas/order_2018090100.txt /user/hive/warehouse/db_hive.db/order_part/month=201809/day=01/time=00; dfs -put /opt/datas/order_2018090101.txt /user/hive/warehouse/db_hive.db/order_part/month=201809/day=01/time=01; dfs -put /opt/datas/order_2018090102.txt /user/hive/warehouse/db_hive.db/order_part/month=201809/day=01/time=02; dfs -put /opt/datas/order_2018090103.txt /user/hive/warehouse/db_hive.db/order_part/month=201809/day=01/time=03;

3.创建表

create table 数据表(userid string,username string,order_date string,product_name string,price string) partitioned by (month string,day string,time string) row format delimited fields terminated by ',' stored as textfile;

4.分区数据(metastore元数据)

alter table 数据表 add partition(month = '201809',day='01',time='00'); alter table 数据表 add partition(month = '201809',day='01',time='01'); alter table 数据表 add partition(month = '201809',day='01',time='02'); alter table 数据表 add partition(month = '201809',day='01',time='03'); alter table 数据表 add partition(month = '201809',day='01',time='04'); alter table 数据表 add partition(month = '201809',day='01',time='05'); alter table 数据表 add partition(month = '201809',day='01',time='06');

5.数据分析

十三、Hive表数据的六种加载方式

创建表用于测试

create table order(userid string,username string,order_date string,product_name string,price string) row format delimited fields terminated by ',' stored as textfile;

1.加载本地数据文件到hive表中

load data local inpath '/opt/datas/order0910.txt' into table db_hive.order;

2.加载HDFS文件到hive表中

load data inpath '/user/caizhengjie/datas/order0910.txt' into table db_hive.order;

3.加载数据覆盖表中已有的数据

load data inpath '/user/caizhengjie/datas/order0910.txt' overwrite into table db_hive.order;

4.创建表时通过select加载数据

create table order as select * from nopartition;

5.创建表是,通过insert 加载数据

create table order like nopartition; insert into table order select * from nopartition;

6.创建表是,通过location指定的数据目录加载

1)创建数据表目录 2)将数据文件放到指定的目录 3)创建表并location指定数据目录

十四、Hive结果数据的保存方式

1.结果数据保存到本地文件中

INSERT OVERWRITE LOCAL DIRECTORY '/opt/datas/hive/' SELECT * FROM order INSERT OVERWRITE LOCAL DIRECTORY '/opt/datas/kfk/' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' SELECT * FROM order

2.结果数据保存到hdfs

INSERT OVERWRITE DIRECTORY '/user/caizhengjie/datas/' SELECT * FROM order;

3.管道符

bin/hive -e "select * from db_hive.order;" > /opt/datas/hive/order.txt

以上内容仅供参考学习,如有侵权请联系我删除! 如果这篇文章对您有帮助,左下角的大拇指就是对博主最大的鼓励。 您的鼓励就是博主最大的动力!

最新回复(0)