【ITer自我修养之成长日记】Hbase的Java API操作学习

it2025-08-05  6

Hbase的Java API操作学习

一、java API操作Hbase1. 连接到Hbase的服务1.1 导入依赖1.2 配置本机的hosts(win路径一般在:C:\Windows\System32\drivers\etc),根据实际修改1.3 新建java maven项目,并提前将Hbase的log4j.properties导入到项目的resource目录下。1.4 编码结果 2. 提取成HbaseUtil 工具类2.1 编码测试类模板 3. Hbase中namespace 的CRUD。3.1 编码3.2 部分测试结果 4. Hbase中Table的java API操作4.1 创建表4.2 列出表的列簇4.3 根据表名修改列簇4.4 在表中根据指定列簇名删除列簇4.5 删除表4.6 单个行键 put4.7 多个行键 put4.8 通过行键 查询指定列簇的 列名和列值4.9 scan4.10 删除指定行键==4.11 编码== 二、Hbase高级查询1.过滤器的需求样例2.filter(过滤器)2.1 SingleColumnValueFilter 单列值过滤器结果:tip 2.2 FilterList 多列值过滤器2.3再次重构HBaseUtilFilterList 重写为 2.4 FamilyFilter 列簇过滤器结果结果 2.6 ColumnPrefixFilter 列名前缀过滤器结果 2.7 MultipleColumnPrefixFilter 多个列名前缀过滤器结果 2.8 rowFilter 行键过滤器结果 3.Column Value Comparators 列值比较器3.1 RegexStringComparator 正则串比较器结果中途小改进,对于SingleColumnValueFilter.setFilterIfMissing()进行改进,防止了filterList的干扰 3.2 SubstringComparator 子串比较器结果 3.3 BinaryPrefixComparator 二进制前缀比较器结果 3.4 BinaryComparator 二进制比较器结果 三、布隆过滤器在Hbase中的应用1.什么是布隆过滤器2.布隆过滤器的经典应用3.布隆过滤器在Hbase中到底起到什么作用 四、HBase的寻址机制

一、java API操作Hbase

1. 连接到Hbase的服务

1.1 导入依赖

<!-- hbase --> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>${hbase.version}</version> </dependency>

1.2 配置本机的hosts(win路径一般在:C:\Windows\System32\drivers\etc),根据实际修改

192.168.xx.xxx master 192.168.xx.xxx slave1 192.168.xx.xxx slave2 192.168.xx.xxx master-s 192.168.xx.xxx slave1-s 192.168.xx.xxx slave2-s

1.3 新建java maven项目,并提前将Hbase的log4j.properties导入到项目的resource目录下。

1.4 编码

package cn.hbase; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.client.Admin; import org.apache.hadoop.hbase.client.Connection; import org.apache.hadoop.hbase.client.ConnectionFactory; import org.junit.Test; import java.io.IOException; public class OperateHbase { @Test public void test() throws IOException { //1.创建配置对象,指定zookeeper集群地址 Configuration configuration = new Configuration(); configuration.set("hbase.zookeeper.quorum","master-s:2181,slave1-s:2181,slave2-s:2181"); //2.获取连接对象 Connection connection = ConnectionFactory.createConnection(configuration); Admin admin = connection.getAdmin(); //3.测试 boolean b = admin.tableExists(TableName.valueOf("test1")); System.out.println(b); //4.释放资源 admin.close(); } }//class end
结果
true

说明现在hbase中存在‘test1’表,如果不存在,则返回false

2. 提取成HbaseUtil 工具类

package cn.hbase; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.*; import org.apache.hadoop.hbase.client.*; import org.apache.hadoop.hbase.filter.Filter; import org.apache.hadoop.hbase.filter.SingleColumnValueFilter; import java.io.IOException; import java.util.Iterator; public class HBaseUtils { private final static String KEY = "hbase.zookeeper.quorum"; private final static String VALUE = "master-s:2181,slave1-s:2181,slave2-s:2181"; private static Configuration configuration; static { //1.创建配置对象 configuration = HBaseConfiguration.create(); configuration.set(KEY, VALUE); } public static Admin getAdmin() { try { Connection connection = ConnectionFactory.createConnection(configuration); Admin admin = connection.getAdmin(); return admin; } catch (IOException e) { e.printStackTrace(); return null; } } //关闭 admin 的 public static void close(Admin admin) { try { if (admin != null) admin.close(); } catch (IOException e) { e.printStackTrace(); } } }

2.1 编码测试类模板

package cn.hbase; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.NamespaceDescriptor; import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.client.Connection; import org.apache.hadoop.hbase.client.HBaseAdmin; import org.junit.After; import org.junit.Before; import org.junit.Test; import java.io.IOException; public class DemoNamespace { private Connection connection; private HBaseAdmin admin; @Before public void before(){ //调用自己写的工具类 admin = (HBaseAdmin) HBaseUtils.getAdmin(); } //具体逻辑 列出namespace @Test public void listNamespace() throws IOException { } @After public void after(){ HBaseUtils.close(admin); } }

3. Hbase中namespace 的CRUD。

(增加(Create)、查询(Retrieve)、更新(Update)和删除(Delete))

3.1 编码

package cn.hbase; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.NamespaceDescriptor; import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.client.Connection; import org.apache.hadoop.hbase.client.HBaseAdmin; import org.junit.After; import org.junit.Before; import org.junit.Test; import java.io.IOException; public class DemoNamespace { private Connection connection; private HBaseAdmin admin; @Before public void before(){ //调用自己写的工具类 admin = (HBaseAdmin) HBaseUtils.getAdmin(); } //具体逻辑 列出namespace @Test public void listNamespace() throws IOException { System.out.println("listNamespace()执行如下...."); //1.获取到namespace的数组 NamespaceDescriptor[] namespaceDescriptors = admin.listNamespaceDescriptors(); //2.打印namespace for (NamespaceDescriptor namespaceDescriptor : namespaceDescriptors) { System.out.println(namespaceDescriptor.getName()); } } //具体逻辑 创建namespace @Test public void createNamespace() throws IOException { System.out.println("createNamespace()执行如下...."); //1.创建NamespaceDescriptor NamespaceDescriptor ns1 = NamespaceDescriptor.create("ns1").build(); ns1.setConfiguration("name","Lq"); //2.添加到admin admin.createNamespace(ns1); } //具体逻辑 根据namespace 列出表, 法1 @Test public void listNamespaceTables1() throws IOException { System.out.println("listNamespaceTables1()执行如下...."); //1.创建NamespaceDescriptor TableName[] defaultTables = admin.listTableNamesByNamespace("default"); //2.display for (TableName defaultTable : defaultTables) { System.out.println(defaultTable.getNameAsString()); } } //具体逻辑 根据namespace 列出表, 法2 @Test public void listNamespaceTables2() throws IOException { System.out.println("listNamespaceTables2()执行如下...."); //1.创建NamespaceDescriptor HTableDescriptor[] defaultTableDescriptors = admin.listTableDescriptorsByNamespace("hbase"); //2.display for (HTableDescriptor defaultTableDescriptor : defaultTableDescriptors) { System.out.println(defaultTableDescriptor.getNameAsString()); } } //具体逻辑 根据namespace名,修改其属性 @Test public void changeNamespaceAttribute() throws IOException { System.out.println("changeNamespaceAttribute()执行如下...."); //1.创建NamespaceDescriptor NamespaceDescriptor ns1 = NamespaceDescriptor.create("ns1").build(); ns1.setConfiguration("name","Lq"); ns1.setConfiguration("age","23"); System.out.println("修改前:==="+ns1); ns1.setConfiguration("name","Marry"); //2.添加到admin admin.modifyNamespace(ns1); System.out.println("修改后:==="+ns1); } //具体逻辑 根据namespace名,删除 @Test public void delete() throws IOException { System.out.println("delete()执行如下...."); admin.deleteNamespace("ns1"); } @After public void after(){ HBaseUtils.close(admin); } }

3.2 部分测试结果

listNamespace()执行如下.... default hbase listNamespaceTables1()执行如下.... student taxi test1 changeNamespaceAttribute()执行如下.... 修改前:==={NAME => 'ns1', age => '23', name => 'Lq'} 修改后:==={NAME => 'ns1', age => '23', name => 'Marry'}

4. Hbase中Table的java API操作

涉及的全部编码在(4.11)

4.1 创建表

4.2 列出表的列簇

4.3 根据表名修改列簇

4.4 在表中根据指定列簇名删除列簇

4.5 删除表

4.6 单个行键 put

4.7 多个行键 put

4.8 通过行键 查询指定列簇的 列名和列值

4.9 scan

4.10 删除指定行键

4.11 编码

DemoTables.java编码

package cn.hbase;

import org.apache.hadoop.hbase.; import org.apache.hadoop.hbase.client.; import org.apache.hadoop.hbase.util.Bytes; import org.junit.After; import org.junit.Before; import org.junit.Test;

import java.io.IOException; import java.util.*;

import static cn.hbase.HBaseUtils.showResult;

public class DemoTables {

private HBaseAdmin admin; private Table table; @Before public void before(){ //调用自己的工具类 admin = (HBaseAdmin) HBaseUtils.getAdmin(); table = HBaseUtils.getTable("test1"); } // 业务逻辑 指定表名和列簇 创建 表 @Test public void createTable() throws IOException { System.out.println("createTable() 执行如下:"); //创建 tablename TableName testtable = TableName.valueOf("t1"); //创建表的描述器对象

// TableDescriptor tableDescriptor = new TableDescriptor(); HTableDescriptor tableDescriptor = new HTableDescriptor(testtable); //创建列簇描述器对象 HColumnDescriptor familyColum = new HColumnDescriptor(“name”); //添加列簇 tableDescriptor.addFamily(familyColum); //提交到admin admin.createTable(tableDescriptor); }

// 业务逻辑 列出 指定表名的列簇 @Test public void listTableColumnFamily() throws IOException { System.out.println("listTableColumnFamily() 执行如下:"); //获取表的描述器 HTableDescriptor tableDescriptor = admin.getTableDescriptor(TableName.valueOf("t1")); //获取列簇 HColumnDescriptor[] columnFamilies = tableDescriptor.getColumnFamilies(); for (HColumnDescriptor columnFamily : columnFamilies) { System.out.println(columnFamily.getNameAsString()); } } // 业务逻辑 根据表名修改列簇 //1.查出原来的表 2.进行列簇添加 @Test public void alterTableColumnFamily() throws IOException { System.out.println("alterTableColumnFamily() 执行如下:"); TableName tableName = TableName.valueOf("test1"); //获取表的描述器 HTableDescriptor tableDescriptor = admin.getTableDescriptor(tableName); //列簇设置添加 HColumnDescriptor ageColumn = new HColumnDescriptor("age"); admin.addColumnFamily(tableName,ageColumn);

// tableDescriptor.addFamily(ageColumn); //这里有点问题,报错为HTableDescriptor is read-only

} // 业务逻辑 在表中根据指定列簇名删除列簇 //1.查出原来的表 2.进行列簇删除 @Test public void deleteTableColumnFamily() throws IOException { System.out.println("deleteTableColumnFamily() 执行如下:"); TableName tableName = TableName.valueOf("t1"); // 删除指定的列簇 法一 admin.deleteColumnFamily(tableName,Bytes.toBytes("sex"));//这个可以删除列簇

// //删除指定的列簇 法二 (标记过时) // admin.deleteColumn(tableName,Bytes.toBytes(“age”)); // tableDescriptor.removeFamily(Bytes.toBytes(“name”));//这里有点问题,报错为HTableDescriptor is read-only //修改为上一行,用admin操作 }

// 业务逻辑 //1.判断原来的表是否disable 2.进行删除 @Test public void deleteTable() throws IOException { System.out.println("deleteTable() 执行如下:"); TableName tableName = TableName.valueOf("test1"); // 删除 之前判断 if( !admin.isTableDisabled(tableName)){ admin.disableTable(tableName); } admin.deleteTable(tableName); } /** 单个行键 put * put 't1', 'r1', 'c1', 'value' */ // 业务逻辑 //1.table 获得表名 2.组装 设置其他参数 @Test public void put1() throws IOException { System.out.println("put1() 执行如下:"); //创建Put对象 Put put = new Put(Bytes.toBytes("001")); //组装列数据 put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("Lq")); //用table提交 table.put(put); } /** 多个行键 put * put 't1', 'r1', 'c1', 'value' */ // 业务逻辑 //1.table 获得表名 2.组装 设置其他参数 @Test public void put2() throws IOException { System.out.println("put2() 执行如下:"); //创建Put对象 Put put2 = new Put(Bytes.toBytes("002")); Put put3 = new Put(Bytes.toBytes("003")); //组装列数据 put2.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("Marry")); put3.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("Zach")); List<Put> plist = new ArrayList<>(); plist.add(put2); plist.add(put3); //用table提交 table.put(plist); } /** * t.get 'r1' */ // 业务逻辑 //通过行键 查询指定列簇的 列名和列值 @Test public void get1() throws IOException { System.out.println("get1() 执行如下:"); //创建 Get 对象 指定查询的row Get get = new Get(Bytes.toBytes("001")); //调用get() Result result = table.get(get); //将Result对象 打印 指定的 列簇 NavigableMap<byte[], byte[]> navigableMaps = result.getFamilyMap(Bytes.toBytes("info"));

// Set<Map.Entry<byte[], byte[]>> entries = navigableMaps.entrySet(); for (Map.Entry<byte[], byte[]> entry :navigableMaps.entrySet() ) { System.out.println(new String(entry.getKey())+"—>"+new String(entry.getValue())); } }

/** * result.cellScanner() 扫描得到结果中的数据 */ // 业务逻辑 //通过行键 查询 列簇、列名和列值 @Test public void get2() throws IOException { System.out.println("get2() 执行如下:"); //创建 Get 对象 指定查询的row Get get = new Get(Bytes.toBytes("001")); //调用get() Result result = table.get(get); //调用自己工具类中的对 Result对象 封装的方法 showResult(result); } /** * scan */ @Test public void scan() throws IOException { System.out.println("scan() 执行如下:"); //创建 Scan 对象 Scan scan = new Scan(); //使用 table.getScanner(),传入scan参数 ResultScanner scanner = HBaseUtils.getTable("test1").getScanner(scan); Iterator<Result> iterator = scanner.iterator(); while(iterator.hasNext()){ Result nextResult = iterator.next(); showResult(nextResult); } } @Test public void deleteByRowkey() throws IOException { System.out.println("deleteByRowkey() 执行如下:"); System.out.println("删除前如下:"); scan(); Delete delete = new Delete(Bytes.toBytes("003")); table.delete(delete); System.out.println("删除后如下:"); scan(); } @After public void after(){ HBaseUtils.close(admin); HBaseUtils.close(table); }

}

HBaseUtils工具类编码

package cn.hbase;

import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.; import org.apache.hadoop.hbase.client.; import org.apache.hadoop.hbase.filter.Filter; import org.apache.hadoop.hbase.filter.SingleColumnValueFilter; import java.io.IOException; import java.util.Iterator;

public class HBaseUtils {

private final static String KEY = "hbase.zookeeper.quorum"; private final static String VALUE = "master-s:2181,slave1-s:2181,slave2-s:2181"; private static Configuration configuration; static { //1.创建配置对象 configuration = HBaseConfiguration.create(); configuration.set(KEY, VALUE); } public static Admin getAdmin() { try { Connection connection = ConnectionFactory.createConnection(configuration); Admin admin = connection.getAdmin(); return admin; } catch (IOException e) { e.printStackTrace(); return null; } } //操作 表 要获得表的对象,因此重构以下表的方法 //无参,获得默认表,这里为了演示,设置为student表 public static Table getTable(){ try { Connection connection = ConnectionFactory.createConnection(configuration); return connection.getTable(TableName.valueOf("student")); } catch (IOException e) { e.printStackTrace(); return null; } } //操作 表 要获得表的对象,因此重构以下表的方法 //有参,获得指定表名的 表 public static Table getTable(String tablename){ try { Connection connection = ConnectionFactory.createConnection(configuration); return connection.getTable(TableName.valueOf(tablename)); } catch (IOException e) { e.printStackTrace(); return null; } } //由于测试类中对Result多次操作,所以此处对要操作的Result 对象进行封装 public static void showResult(Result result) throws IOException { //result.cellScanner() 扫描得到结果中的数据 CellScanner cellScanner = result.cellScanner(); while(cellScanner.advance()){ Cell current = cellScanner.current(); System.out.print(new String(CellUtil.cloneRow(current)) +"~ "); System.out.print(new String(CellUtil.cloneFamily(current)) +":"); System.out.print(new String(CellUtil.cloneQualifier(current)) +"-->"); System.out.print(new String(CellUtil.cloneValue(current)) +"\n"); } System.out.println("================="); } //再次对DemoFilter的方法进行重构 HBaseUtil public static void showFilterResult(Filter filter) throws IOException { //处理缺失值 if(filter instanceof SingleColumnValueFilter){ SingleColumnValueFilter singleColumnValueFilter = (SingleColumnValueFilter) filter; singleColumnValueFilter.setFilterIfMissing(true); } Scan scan = new Scan(); //下面还是可以传入filter,虽然filter被强制转换了,但singleColumnValueFilter和filter在堆中的内存是一样的 scan.setFilter(filter); //使用 table.getScanner(),传入scan参数 ResultScanner scanner = getTable("test1").getScanner(scan); Iterator<Result> iterator = scanner.iterator(); while(iterator.hasNext()){ Result nextResult = iterator.next(); showResult(nextResult); } } //关闭 admin 的 public static void close(Admin admin) { try { if (admin != null) admin.close(); } catch (IOException e) { e.printStackTrace(); } } //关闭 table 的 public static void close(Table table) { try { if (table != null) table.close(); } catch (IOException e) { e.printStackTrace(); } } //关闭 重载上面两个close()方法 public static void close(Admin admin, Table table) { close(admin); close(table); }

}

二、Hbase高级查询

1.过滤器的需求样例

/* * 业务逻辑:select * from test1 where age < 50 and name="Marry" * */

2.filter(过滤器)

作用:由于get()和scan()都不能进行条件查询,filter可以对scan()进行条件过滤。

2.1 SingleColumnValueFilter 单列值过滤器

//byte [] family, 列簇 // byte [] qualifier, 列名 // CompareOperator op, 枚举值 // byte[] value 列值 // !!! 注意说明:假如设定的过滤字段,其他列没有此字段,也会被查询出来 /* * 业务逻辑:select * from test1 where sex=“M” * */ @Test public void singleColumnValueFilter() throws IOException { //创建单列值的过滤器 SingleColumnValueFilter singleColumnValueFilter = new SingleColumnValueFilter( Bytes.toBytes("info"), Bytes.toBytes("sex"), CompareFilter.CompareOp.EQUAL, Bytes.toBytes("M") ); // !!! 注意说明:假如设定的过滤字段,其他列没有此字段,也会被查询出来 Scan scan = new Scan(); scan.setFilter(singleColumnValueFilter); //使用 table.getScanner(),传入scan参数 ResultScanner scanner = table.getScanner(scan); Iterator<Result> iterator = scanner.iterator(); while(iterator.hasNext()){ Result nextResult = iterator.next(); showResult(nextResult); } }

结果:

业务逻辑:select * from test1 where sex='M' 001~ info:age-->23 001~ info:name-->Lq ================= 002~ info:age-->18 002~ info:name-->Marry ================= 004~ info:age-->80 004~ info:name-->jack 004~ info:sex-->M ================= 005~ info:hight-->175 ================= 006~ info:hight-->215 =================
tip
1.如果过滤条件是age,那么其他列没有age列的话不会被过滤,即会被查询留下。 2.大多数情况下,put进Hbase里的数据是字节数组,但是会被通过字符串转换的,所以默认是按照字典顺序进行比较。如果要进行数字的比较,那么在程序中的put()中,列值插入时要使用,Bytes.toBytes(int val)这种数值类型。

2.2 FilterList 多列值过滤器

/* * 业务逻辑:select * from test1 where age < 50 and name="Marry" * */ @Test public void multipleColumnValueFilter() throws IOException { SingleColumnValueFilter ageFilter = new SingleColumnValueFilter( Bytes.toBytes("info"), Bytes.toBytes("age"), CompareFilter.CompareOp.LESS, Bytes.toBytes("50") ); SingleColumnValueFilter nameFilter = new SingleColumnValueFilter( Bytes.toBytes("info"), Bytes.toBytes("name"), CompareFilter.CompareOp.EQUAL, Bytes.toBytes("Marry") ); //缺失值过滤 ageFilter.setFilterIfMissing(true); nameFilter.setFilterIfMissing(true); //拼接成 filterList //MUST_PASS_ALL --> and MUST_PASS_ONE --> or FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL); filterList.addFilter(ageFilter); filterList.addFilter(nameFilter); // !!! 注意说明:假如设定的过滤字段,其他列没有此字段,也会被查询出来 Scan scan = new Scan(); scan.setFilter(filterList); //使用 table.getScanner(),传入scan参数 ResultScanner scanner = table.getScanner(scan); Iterator<Result> iterator = scanner.iterator(); while(iterator.hasNext()){ Result nextResult = iterator.next(); showResult(nextResult); } }

#####结果 业务逻辑:select * from test1 where age < 50 and name=‘marry’ 002~ info:age–>18 002~ info:name–>Marry =================

2.3再次重构HBaseUtil

//再次对DemoFilter的方法进行重构 HBaseUtil public static void showFilterResult(Filter filter) throws IOException { Scan scan = new Scan(); scan.setFilter(filter); //使用 table.getScanner(),传入scan参数 ResultScanner scanner = getTable("student").getScanner(scan); Iterator<Result> iterator = scanner.iterator(); while(iterator.hasNext()){ Result nextResult = iterator.next(); showResult(nextResult); } }
FilterList 重写为
@Test public void multipleColumnValueFilter() throws IOException { SingleColumnValueFilter ageFilter = new SingleColumnValueFilter( Bytes.toBytes("info"), Bytes.toBytes("age"), CompareFilter.CompareOp.LESS, Bytes.toBytes("50") ); SingleColumnValueFilter nameFilter = new SingleColumnValueFilter( Bytes.toBytes("info"), Bytes.toBytes("name"), CompareFilter.CompareOp.EQUAL, Bytes.toBytes("Marry") ); //缺失值过滤 ageFilter.setFilterIfMissing(true); nameFilter.setFilterIfMissing(true); //拼接成 filterList //MUST_PASS_ALL --> and MUST_PASS_ONE --> or FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL); filterList.addFilter(ageFilter); filterList.addFilter(nameFilter); // !!! 注意说明:假如设定的过滤字段,其他列没有此字段,也会被查询出来 showFilterResult(filterList); }

2.4 FamilyFilter 列簇过滤器

/* * 业务逻辑:select * from test1 where columnFamily="info" * */ @Test public void columnFamilyFilter() throws IOException { FamilyFilter familyFilter = new FamilyFilter( CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("info")) ); System.out.println("业务逻辑:select * from test1 where columnFamily='info'"); showFilterResult(familyFilter); }
结果
业务逻辑:select * from test1 where columnFamily="info" 001~ info:age-->23 001~ info:name-->Lq ================= 002~ info:age-->20 002~ info:name-->Marry ================= 003~ info:age-->62 003~ info:name-->AD 003~ info:sex-->F ================= 004~ info:age-->80 004~ info:name-->jack 004~ info:sex-->M ================= 005~ info:hight-->175 005~ info:name-->Lucy ================= 006~ info:hight-->215 =================

####2.5 QualifierFilter 列名过滤器 /* * 业务逻辑:select * from test1 where column=‘hight’ * */ @Test public void qualifierFamilyFilter() throws IOException { QualifierFilter qualifierFilter = new QualifierFilter( CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes(“hight”)) ); System.out.println(“业务逻辑:select * from test1 where column=‘hight’”); showFilterResult(qualifierFilter); }

结果
业务逻辑:select * from test1 where column='hight' 005~ info:hight-->175 ================= 006~ info:hight-->215 =================

2.6 ColumnPrefixFilter 列名前缀过滤器

/* * 业务逻辑:select * from test1 where column=' s* ' * */ @Test public void columnPrefixFilter() throws IOException { ColumnPrefixFilter columnPrefixFilter = new ColumnPrefixFilter(Bytes.toBytes("s")); System.out.println("业务逻辑:select * from test1 where column=' s* '"); showFilterResult(columnPrefixFilter); }
结果
业务逻辑:select * from test1 where column=' s* ' 003~ info:sex-->F ================= 004~ info:sex-->M =================

2.7 MultipleColumnPrefixFilter 多个列名前缀过滤器

/* * 业务逻辑:select * from test1 where column=' s* ' or column = ' h* ' * */ @Test public void multipleColumnPrefixFilter() throws IOException { byte[][] prefixes = new byte[][]{Bytes.toBytes("s"), Bytes.toBytes("h")}; MultipleColumnPrefixFilter multipleColumnPrefixFilter = new MultipleColumnPrefixFilter(prefixes); System.out.println("业务逻辑:select * from test1 where column=' s* '"); showFilterResult(multipleColumnPrefixFilter); }
结果
业务逻辑:select * from test1 where column=' s* ' or column = ' h* ' 003~ info:sex-->F ================= 004~ info:sex-->M ================= 005~ info:hight-->175 ================= 006~ info:hight-->215 =================

2.8 rowFilter 行键过滤器

/* * 业务逻辑:select * from test1 where row >= 004 * */ @Test public void rowkeyFilter() throws IOException { RowFilter rowFilter = new RowFilter( CompareFilter.CompareOp.GREATER_OR_EQUAL, new BinaryComparator(Bytes.toBytes("004")) ); System.out.println("业务逻辑:select * from test1 where row >= 004"); showFilterResult(rowFilter); }
结果
业务逻辑:select * from test1 where row >= 004 004~ info:age-->80 004~ info:name-->jack 004~ info:sex-->M ================= 005~ info:hight-->175 005~ info:name-->Lucy ================= 006~ info:hight-->215 =================

3.Column Value Comparators 列值比较器

3.1 RegexStringComparator 正则串比较器

/* * 业务逻辑:select * from test1 where name like ‘L*’ * 类似于正则表达式的查询 * */ //byte [] family, 列簇 // byte [] qualifier, 列名 // CompareOperator op, 枚举值 // RegexStringComparator // !!! 注意说明:假如设定的过滤字段,其他列没有此字段,也会被查询出来 @Test public void regexStringComparator() throws IOException { SingleColumnValueFilter nameFilter = new SingleColumnValueFilter( Bytes.toBytes("info"), Bytes.toBytes("name"), CompareFilter.CompareOp.EQUAL, new RegexStringComparator("^L") ); //nameFilter.setFilterIfMissing(true); //把缺失值过滤也写到工具类里 showFilterResult(nameFilter); }
结果
业务逻辑:select * from test1 where name like ‘L*’ 001~ info:age-->23 001~ info:name-->Lq ================= 005~ info:hight-->175 005~ info:name-->Lucy =================
中途小改进,对于SingleColumnValueFilter.setFilterIfMissing()进行改进,防止了filterList的干扰
//再次对DemoFilter的方法进行重构 HBaseUtil public static void showFilterResult(Filter filter) throws IOException { //处理缺失值 if(filter instanceof SingleColumnValueFilter){ SingleColumnValueFilter singleColumnValueFilter = (SingleColumnValueFilter) filter; singleColumnValueFilter.setFilterIfMissing(true); } Scan scan = new Scan(); //下面还是可以传入filter,虽然filter被强制转换了,但singleColumnValueFilter和filter在堆中的内存是一样的 scan.setFilter(filter); //使用 table.getScanner(),传入scan参数 ResultScanner scanner = getTable("test1").getScanner(scan); Iterator<Result> iterator = scanner.iterator(); while(iterator.hasNext()){ Result nextResult = iterator.next(); showResult(nextResult); } }

3.2 SubstringComparator 子串比较器

/* * 业务逻辑:select * from test1 where name like ‘%a%’ * */ //byte [] family, 列簇 // byte [] qualifier, 列名 // CompareOperator op, 枚举值 // SubstringComparator // !!! 注意说明:假如设定的过滤字段,其他列没有此字段,也会被查询出来 @Test public void substringComparator() throws IOException { SingleColumnValueFilter nameFilter = new SingleColumnValueFilter( Bytes.toBytes("info"), Bytes.toBytes("name"), CompareFilter.CompareOp.EQUAL, new SubstringComparator("a") ); //nameFilter.setFilterIfMissing(true); //把缺失值过滤也写到工具类里 showFilterResult(nameFilter); }
结果
业务逻辑:select * from test1 where name like ‘%a%’ 002~ info:age-->20 002~ info:name-->Marry ================= 003~ info:age-->62 003~ info:name-->AD 003~ info:sex-->F ================= 004~ info:age-->80 004~ info:name-->jack 004~ info:sex-->M =================

3.3 BinaryPrefixComparator 二进制前缀比较器

/* * 业务逻辑:select * from test1 where name like ‘ L* ’ * */ //byte [] family, 列簇 // byte [] qualifier, 列名 // CompareOperator op, 枚举值 // BinaryPrefixComparator // !!! 注意说明:假如设定的过滤字段,其他列没有此字段,也会被查询出来 @Test public void binaryPrefixComparator() throws IOException { SingleColumnValueFilter nameFilter = new SingleColumnValueFilter( Bytes.toBytes("info"), Bytes.toBytes("name"), CompareFilter.CompareOp.EQUAL, new BinaryPrefixComparator(Bytes.toBytes("L")) ); //nameFilter.setFilterIfMissing(true); //把缺失值过滤也写到工具类里 showFilterResult(nameFilter); }
结果
业务逻辑:select * from test1 where name like ‘ L* ’ 001~ info:age-->23 001~ info:name-->Lq ================= 005~ info:hight-->175 005~ info:name-->Lucy =================

3.4 BinaryComparator 二进制比较器

/* * 业务逻辑:select * from test1 where name="Lucy" * */ //byte [] family, 列簇 // byte [] qualifier, 列名 // CompareOperator op, 枚举值 // BinaryComparator // !!! 注意说明:假如设定的过滤字段,其他列没有此字段,也会被查询出来 @Test public void binaryComparator() throws IOException { SingleColumnValueFilter nameFilter = new SingleColumnValueFilter( Bytes.toBytes("info"), Bytes.toBytes("name"), CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("Lucy")) ); //nameFilter.setFilterIfMissing(true); //把缺失值过滤也写到工具类里 showFilterResult(nameFilter); }
结果
业务逻辑:select * from test1 where name='Lucy' 005~ info:hight-->175 005~ info:name-->Lucy =================

三、布隆过滤器在Hbase中的应用

1.什么是布隆过滤器

作用:判断一个元素是否在一个集合中。

2.布隆过滤器的经典应用

爬虫:判断URL是否已经被爬取过

3.布隆过滤器在Hbase中到底起到什么作用

作用:提升get/scan的查询效率,减少查询时间 原理:在进行get查询时, get 'test1','001' 这些信息首先存储在memstore,满足了条件之后就会把memstore移出Hfile。随着数据体量的增大,产生的文件会越来越多,加入这个时候要判断某个行键是否在哪个Hfile中,就需要对store中的所有Hfile进行扫描,这样效率太低。此时布隆过滤器可以发挥作用,HBase默认布隆过滤器是出于关闭状态(毕竟消耗大数组存储) NONE:不开启 ROW:行级别的 ROWCOL:列级别

四、HBase的寻址机制

最新回复(0)