以词频统计算法为例,来介绍怎么在具体应用中使用Hive
一、 创建input目录,output目录会自动生成
其中input为输入目录,output目录为输出目录。
命令:
cd
/usr
/local
/hadoop
mkdir input
二、 在input文件夹中创建两个测试文件file1.txt和file2.txt
命令:
cd
/usr
/local
/hadoop
/input
echo "hello world" > file1
.txt
echo "hello hadoop" > file2
.txt
三、 在终端编译运行程序
查看文件:
四、 通过HiveQL实现词频统计功能
只要编写下面7行代码,而且不需要进行编译生成jar来执行:
create table docs
(line string
);
load
data inpath
'input' overwrite into table docs
;
create table word_count as
select word
, count
(1
) as count
from
(select explode
(split
(line
,' '))as word
from docs
) w
group by word
order by word
;
五、 创建fusioninsight-hive的Project
六、 为创建好的fusioninsight-hive Project导入依赖jar
七、 创建一个名为com.fusioninsight.hive的Package
八、 创建实验所需的Java Class
在上一步创建好的Package下,创建一个名为HiveJdbc的Java Class:
创建一个名为HiveUDF的Java Class:
九、 编写HiveJdbc代码
package com
.fusioninsight
.hive
;
import java
.sql
.Connection
;
import java
.sql
.DriverManager
;
import java
.sql
.SQLException
;
import java
.sql
.PreparedStatement
;
import java
.sql
.ResultSet
;
import java
.sql
.ResultSetMetaData
;
public class HiveJdbc {
private static final String HIVE_DRIVER
= "org.apache.hive.jdbc.HiveDriver";
private static final String ZK_HOSTS
= "host1:24002,host2:24002,host3:24002";
private static final String SERVICES_DISCOVERY_MODE
= "zooKeeper";
private static final String ZOOKEEPER_NAMESPACE
= "hiveserver2";
static {
try {
Class
.forName(HIVE_DRIVER
);
} catch (ClassNotFoundException e
)
{
e
.printStackTrace();
}
}
private String
getConnectionUrl(){
String url
= "jdbc:hive2://" + ZK_HOSTS
+"/"+";serviceDiscoveryMode="+SERVICES_DISCOVERY_MODE
+";zooKeeperNamspace="+ZOOKEEPER_NAMESPACE
;
return url
;
}
private Connection
getConnection(String url
){
Connection connection
= null
;
try {
connection
= DriverManager
.getConnection(url
);
} catch (SQLException e
)
{
e
.printStackTrace();
}
return connection
;
}
private PreparedStatement
getStatement(Connection connection
, String sql
) throws SQLException
{
return connection
.prepareStatement(sql
);
}
private void execDDL(String sql
) {
Connection connection
= getConnection(getConnectionUrl());
try{
PreparedStatement statement
= getStatement(connection
, sql
);
statement
.executeUpdate();
}catch (SQLException e
){
e
.printStackTrace();
}
}
private ResultSet
execDQL(String sql
) {
ResultSet resultSet
= null
;
Connection connection
= getConnection(getConnectionUrl());
try {
PreparedStatement statement
= getStatement(connection
, sql
);
resultSet
= statement
.executeQuery();
}catch (SQLException e
){
e
.printStackTrace();
}
return resultSet
;
}
private void printResultSet(ResultSet resultSet
) throws SQLException
{
ResultSetMetaData resultSetMetaData
= resultSet
.getMetaData();
int columnCount
= resultSetMetaData
.getColumnCount();
StringBuilder columnName
= new StringBuilder();
StringBuilder value
= new StringBuilder();
for (int i
= 1; i
<= columnCount
; i
++) {
columnName
.append(resultSetMetaData
.getColumnName(i
) + "\t");
}
while (resultSet
.next()) {
for (int i
= 1; i
<= columnCount
; i
++) {
value
.append(resultSet
.getString(i
) + "\t");
}
value
.append("\n");
}
System
.out
.printf("%s\n%s", columnName
, value
);
}
private void execDML(String sql
){
execDDL(sql
);
}
public static void main(String
[] args
) {
HiveJdbc jdbc
= new HiveJdbc();
ResultSet resultSet
= jdbc
.execDQL("SELECT * FROM username.bucket_test LIMIT 5");
try {
jdbc
.printResultSet(resultSet
);
} catch (SQLException e
){
e
.printStackTrace();
}
}
}
十、 编写HiveUDF代码
package com
.fusioninsight
.hive
;
import org
.apache
.hadoop
.hive
.ql
.exec
.UDF
;
public class HiveUDF extends UDF{
public String
evaluate(String str
){
return str
+"hello";
}
public String
evaluate(String
... strs
){
StringBuilder sb
= new StringBuilder();
for (String s
: strs
){
sb
.append(s
+":");
}
return sb
.toString();
}
}
十一、 运行HiveJdbc程序并查看结果
十二、 程序打包,配置Hive自定义函数
十三、 将jar文件put到HDFS上
十四、 创建函数