【HBase之轨迹】（2）使用hbase命令和JavaAPI操作HBase（包括复杂查询--过滤器，HBase工具类）

时间：2023-04-22

—— 目录 ——

0、前置1、命令行使用

① DDL ：表的增删改查② DML ：数据的增删改查③ 原子自增 incr 2、过滤器：复杂查询语句

① 默认过滤器② 比较器③ 实战例子 3、Java API 使用

① Hbase 工具类② 使用方法 4、写在最后

0、前置

前面一弹介绍了如何使用 docker 搭建 hbase
【Hbase之轨迹】（1）使用 Docker 搭建 Hbase 集群

现在开始通过 hbase 命令和 JavaAPI 使用 Hbase
对 Hbase 的介绍、运作流程原理和架构等，将在总集篇一起放出来

本篇介绍了 Hbase 一系列命令的使用，包括表的增删改查，数据的增删改查
同时列出了官网中提到的各个过滤器和比较器及其功能，最后在命令行和 Java 中进行使用
JavaAPI 另外整理了一个工具集，以及其简单使用

1、命令行使用

首先进入操作界面

./hbase shell

① DDL ：表的增删改查

操作过程中，可以关注 Web 端的 Tables，能看到创建的表的信息

1)创建表create <表名>,<列族名1>,<列族名2>...例：create 'user','address','info'2)查询所有用户表list3)查看表详情，可以看到各列族的属性describe <表名>例：describe 'user'4)更改表中列族的属性alter <表名>,{NAME=><列族名>,<属性名称>=><属性值>}例：alter 'user',{NAME=>'address',VERSIONS=>3}该例将 address 列族的最大版本数改为了 3表示列族将存储最近 3 个版本的数据，更旧的数据删去默认为 1，表示只保留最新版本，有新数据时，其余旧版本的数据都将倍删去5)删除表先 disable <表名>，后 drop <表名>例：disable 'user'drop 'user'6)创建命名空间create_namespace <命名空间名>例：create_namespace 'iceclean'7)查看所有命名空间list_namespace8)在指定的命名空间中建表create <命名空间名:表名>例：create 'iceclean:skill','live','program'上边在建表时，没有指定哪一个命名空间，默认就在 default 下建表9)删除命名空间要删除的命名空间必须为空（下面没有表），不为空的话得先将表删除drop_namespace <命名空间名>

② DML ：数据的增删改查

注意：在增删操作中，最后边都可以加上时间戳手动指明时间，不加的花由系统默认生成

1)插入数据（兼更新数据，只要指定了同一个列，就是更新）put <表名>,<行键>,<列族名:列名>,<值>例：put 'user','1','info:name','iceclean'2)删除列族/列delete <表名>,<行键>,<列族名>delete <表名>,<行键>,<列族名:列名>注意：如果某个列有多个版本，删除的是最新的版本（所以老版本会跳出来）3)删除一整行数据deleteall <表名>,<行键>4)清空表中的数据truncate <表名>5)指定行键查询，也可以精确到列族或者列get <表名>,<行键>get <表名>,<行键>,<列族名>get <表名>,<行键>,<列族名:列名>注意：如果出现中文，可以加一个参数：{FORMATTER => "toString"}6)查询列，且精确到版本号get <表名>,<行键>,{COLUMN=><列族名:列名>,VERSIONS=><数量>}例：get 'user','101c',{COLUMN=>'info:name',VERSIONS=>10}注意，这里虽然指明了要查看 10 个版本，也确实修改了很多个版本但如果在建表时，该列指定保留的版本数只为 1，则这里依旧只显示 1 个版本因为其余的旧版本并没有被保留下来，自然就查不到啦7)全表扫描查询数据scan <表名>8)计算表数据量count <表名> 9)范围查询，左闭右开，STARTROW 缺省默认为最小，STOPROW 缺省默认为最大scan <表名>,{STARTROW=><行键>,STOPROW=><行键>}例：scan 'user',{STARTROW=>'101a',STOPROW=>'103f'}10)指定列名查询增加参数：{COLUMNS => ['<列族名1>:<列名1>', '<列族名2>:<列名2>', ...]}11)限制查询增加参数：{LIMIT => 3}12)查看操作日志其中 VERSIONS 表示每条数据最多查看到第几个版本scan <表名>,{RAW=>true,VERSIONS=>10}

③ 原子自增 incr

在很多情况下，我们需要某个列值进行自增，如常见的点赞收藏和阅读量等的自增
使用 put 创建的列是不支持自增的，需要使用 incr，语法如下：

icnr <表名>, <行键>, <列族名:列名>, [累加值，默认为1]

2、过滤器：复杂查询语句

使用 get 只能通过 rowKey 查询数据
而当需要通过条件过滤查询数据时，get 做不到，就需要用到过滤器了

过滤器通常和 scan 结合使用
其实底层也是调用了 Hbase 的 JavaAPI，后边 SpringBoot 整合会直接 new 出来用

① 默认过滤器

以下过滤器都有对应的 Java 实现类

种类过滤器名功能行键过滤器RowFilter实现行键字符串的比较和过滤PrefixFilterrowkey 前缀过滤器KeyOnlyFilter只对单元格的键进行过滤和显示，不显示值FirstKeyOnlyFilter只扫描显示相同键的第一个单元格，其键值对会显示出来InclusiveStopFilter替代 ENDROW 返回终止条件行列过滤器FamilyFilter列簇过滤器，只显示对应列簇的数据QualifierFilter列标识过滤器，只显示对应列名的数据ColumnPrefixFilter对列名称的前缀进行过滤MultipleColumnPrefixFilter可以指定多个前缀对列名称过滤ColumnRangeFilter过滤列名称的范围值过滤器ValueFilter找到符合值条件的键值对SingleColumnValueFilter按指定列和指定值过滤，相当于 where key [比较符] valueSingleColumnValueExcludeFilter过滤掉匹配上的键值对其他过滤器ColumnPaginationFilter对一行的所有列分页，只返回 [offset,offset+limit] 范围内的列PageFilter对显示结果按行进行分页显示TimestampsFilter时间戳过滤，支持等值，可以设置多个时间戳ColumnCountGetFilter限制每个逻辑行返回键值对的个数，在 get 方法中使用DependentColumnFilter允许用户指定一个参考列或引用列来过滤其他列的过滤器 ② 比较器

无论在 Java 还是在 Shell 中，进行过滤都需要比较器辅佐进行

比较器名功能描述表达式缩写BinaryComparator匹配完整字节数组（字符串）binary:值BinaryPrefixComparator匹配字节数组前缀（字符串前缀）binaryprefix:值BitComparator匹配比特位bit:值NullComparator匹配空值nullRegexStringComparator匹配正则表达式regexstring:值SubstringComparator匹配子字符串substring:值 ③ 实战例子

首先建表以及演示数据：
这里建了一张描述前端标签元素的表，因为没换个标签可以有各自不同的属性，所以可以充分利用 Hbase 可随意增加列的特性，来存储它们各自不同的属性
数据只简单插入了两条，更多的数据再自己插入就欧了

这里范例表只用了一个列族，因为如果又多个列族的话会降低 Hbase 性能，非必要的情况下一个列族就够了

create 'stardust', 'info'# 范例数据put 'stardust', '1', 'info:name', 'btn1'put 'stardust', '1', 'info:kind', 'button'put 'stardust', '1', 'info:belong', 'root'put 'stardust', '1', 'info:x', '100px'put 'stardust', '1', 'info:y', '200px'put 'stardust', '1', 'info:width', '50px'put 'stardust', '1', 'info:height', '100px'put 'stardust', '1', 'info:store-int', 60put 'stardust', '2', 'info:name', 'text1'put 'stardust', '2', 'info:kind', 'label'put 'stardust', '2', 'info:belong', 'root'put 'stardust', '2', 'info:x', '100px'put 'stardust', '2', 'info:y', '150px'put 'stardust', '2', 'info:width', '30px'put 'stardust', '2', 'info:height', '100px'put 'stardust', '2', 'info:store-string', 'label content'

查询示例：

# 查询 id 为 1 的元素scan 'stardust', {FILTER => "RowFilter(=, 'binary:1')"}# 过滤出有存储字符串能力的元素（只得到列数据）scan 'stardust', {FILTER => "QualifierFilter(=, 'binary:store-string')"}# 查询类型为 button 的元素scan 'stardust', {FILTER => "SingleColumnValueFilter('info', 'kind', =, 'binary:button')"}# 查询出值为 100px 的列的元素（只得到列数据）scan 'stardust', {FILTER => "ValueFilter(=, 'binary:100px')"}# 查询出 x 坐标在 100px 且名字包含 te 的元素scan 'stardust', {FILTER => "SingleColumnValueFilter('detail', 'x', =, 'binary:100px') AND SingleColumnValueFilter('info', 'name', =, 'substring:te')"}

3、Java API 使用

Connection 是重量级且线程安全的，需要存下来重复利用
HTable 是轻量级且线程不安全的，需要每次用完都关闭，下一次重新开

① Hbase 工具类

下列为参考网上其他工具类自己改写的，底层调用了 Hbase 的 JavaAPI
需要修改的是静态代码块中的 zookeeper 配置

public class HbaseUtils { private static Connection connection; static { Configuration configuration = HbaseConfiguration.create(); configuration.set("hbase.zookeeper.quorum", "hadoop001:12181,hadoop002:12182,hadoop003:12183"); try { connection = ConnectionFactory.createConnection(configuration); } catch (IOException e) { e.printStackTrace(); } } public static boolean createTable(String tableName, String ..、columnFamilies) { try { HbaseAdmin admin = (HbaseAdmin) connection.getAdmin(); if (admin.tableExists(TableName.valueOf(tableName))) { admin.close(); return false; } TableDescriptorBuilder tableDescriptor = TableDescriptorBuilder.newBuilder(TableName.valueOf(tableName)); Arrays.stream(columnFamilies).forEach(columnFamily -> tableDescriptor.setColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(columnFamily)) .setMaxVersions(1) .build())); admin.createTable(tableDescriptor.build()); admin.close(); } catch (IOException e) { e.printStackTrace(); } return true; } public static boolean deleteTable(String tableName) { try { HbaseAdmin admin = (HbaseAdmin) connection.getAdmin(); // 删除表前需要先禁用表 admin.disableTable(TableName.valueOf(tableName)); admin.deleteTable(TableName.valueOf(tableName)); admin.close(); } catch (Exception e) { e.printStackTrace(); } return true; } public static boolean putRow(String tableName, String rowKey, String columnFamilyName, String qualifier, String value) { try { Table table = connection.getTable(TableName.valueOf(tableName)); Put put = new Put(Bytes.toBytes(rowKey)); put.addColumn(Bytes.toBytes(columnFamilyName), Bytes.toBytes(qualifier), Bytes.toBytes(value)); table.put(put); table.close(); } catch (IOException e) { e.printStackTrace(); } return true; } public static boolean putRow(String tableName, String rowKey, String columnFamilyName, List> pairList) { try { Table table = connection.getTable(TableName.valueOf(tableName)); Put put = new Put(Bytes.toBytes(rowKey)); pairList.forEach(pair -> put.addColumn(Bytes.toBytes(columnFamilyName), Bytes.toBytes(pair.getFirst()), Bytes.toBytes(pair.getSecond()))); table.put(put); table.close(); } catch (IOException e) { e.printStackTrace(); } return true; } public static Result getRow(String tableName, String rowKey) { try { Table table = connection.getTable(TableName.valueOf(tableName)); Get get = new Get(Bytes.toBytes(rowKey)); Result result = table.get(get); printResult(result); table.close(); } catch (IOException e) { e.printStackTrace(); } return null; } public static void printResult(Result result) { List cells = result.listCells(); for (Cell cell : cells) { // 获取列簇名称 String cf = Bytes.toString(cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength()); // 获取列名称 String cn = Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()); // 获取值 String value = Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()); System.out.println(cf + ":" + cn + " => " + value); } } public static String getCell(String tableName, String rowKey, String columnFamily, String qualifier) { try { Table table = connection.getTable(TableName.valueOf(tableName)); Get get = new Get(Bytes.toBytes(rowKey)); if (!get.isCheckExistenceOnly()) { get.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(qualifier)); Result result = table.get(get); byte[] resultValue = result.getValue(Bytes.toBytes(columnFamily), Bytes.toBytes(qualifier)); table.close(); return Bytes.toString(resultValue); } table.close(); } catch (IOException e) { e.printStackTrace(); } return null; } public static ResultScanner getScanner(String tableName) { try { Table table = connection.getTable(TableName.valueOf(tableName)); Scan scan = new Scan(); return table.getScanner(scan); } catch (IOException e) { e.printStackTrace(); } return null; } public static ResultScanner getScanner(String tableName, FilterList filterList) { try { Table table = connection.getTable(TableName.valueOf(tableName)); Scan scan = new Scan(); scan.setFilter(filterList); return table.getScanner(scan); } catch (IOException e) { e.printStackTrace(); } return null; } public static ResultScanner getScanner(String tableName, String startRowKey, String endRowKey, FilterList filterList) { try { Table table = connection.getTable(TableName.valueOf(tableName)); Scan scan = new Scan(); scan.withStartRow(Bytes.toBytes(startRowKey)); scan.withStopRow(Bytes.toBytes(endRowKey)); scan.setFilter(filterList); return table.getScanner(scan); } catch (IOException e) { e.printStackTrace(); } return null; } public static boolean deleteRow(String tableName, String rowKey) { try { Table table = connection.getTable(TableName.valueOf(tableName)); Delete delete = new Delete(Bytes.toBytes(rowKey)); table.delete(delete); table.close(); } catch (IOException e) { e.printStackTrace(); } return true; } public static boolean deleteColumn(String tableName, String rowKey, String familyName, String qualifier) { try { Table table = connection.getTable(TableName.valueOf(tableName)); Delete delete = new Delete(Bytes.toBytes(rowKey)); delete.addColumn(Bytes.toBytes(familyName), Bytes.toBytes(qualifier)); table.delete(delete); table.close(); } catch (IOException e) { e.printStackTrace(); } return true; }}

② 使用方法

(1)—— 删除表

可以先删掉上述用命令行创建的表：

void deleteTest() { if (HbaseUtils.deleteTable("stardust")) { System.out.println("删除成功"); }}

(2)—— 创建表并插入数据

void initStardust() { // 创建表 HbaseUtils.createTable("stardust", "info"); // 插入数据 String[] kinds = {"button", "label", "input", "img", "div"}; String kind; Random random = new Random(new Date().getTime()); for (int i = 1; i <= 9; i++) { kind = kinds[random.nextInt(5)]; HbaseUtils.putRow("stardust", "" + i, "info", "id", "" + i); HbaseUtils.putRow("stardust", "" + i, "info", "name", kind + random.nextInt(100)); HbaseUtils.putRow("stardust", "" + i, "info", "kind", "" + kind); HbaseUtils.putRow("stardust", "" + i, "info", "belong", "0"); HbaseUtils.putRow("stardust", "" + i, "info", "x", random.nextInt(500) + "px"); HbaseUtils.putRow("stardust", "" + i, "info", "y", random.nextInt(500) + "px"); HbaseUtils.putRow("stardust", "" + i, "info", "width", random.nextInt(500) + "px"); HbaseUtils.putRow("stardust", "" + i, "info", "height", random.nextInt(500) + "px"); int skill = random.nextInt(100); if (skill < 40) { HbaseUtils.putRow("stardust", "" + i, "info", "store-int", random.nextInt(100) + ""); } else if (skill < 70) { HbaseUtils.putRow("stardust", "" + i, "info", "store-string", kind + " content :" + random.nextInt(100)); } else if (skill < 85) { HbaseUtils.putRow("stardust", "" + i, "info", "modify", "(1, 2|x, y|x=x+y)"); } else if (skill < 95) { HbaseUtils.putRow("stardust", "" + i, "info", "check", "(1|x|x=10)"); } else { HbaseUtils.putRow("stardust", "" + i, "info", "clock", "0/30 * * * * ?"); } }}

(3)—— 打印全表

void printTable(String tableName) { ResultScanner scanner = HbaseUtils.getScanner(tableName); if (scanner != null) { for (Result result : scanner) { HbaseUtils.printResult(result); System.out.println("---"); } }}

(4)—— 过滤出 x 在 100px 以上，y 在 300px 以上的元素

注意，这里的比较是字符串比较，而不是真正的值比较
所以这里使用的是正则表达式，而非简单的大于小于

void scanTest() { SingleColumnValueFilter start = new SingleColumnValueFilter( Bytes.toBytes("info"), Bytes.toBytes("x"), CompareOperator.EQUAL, new RegexStringComparator("[2-9][0-9][0-9]px")); SingleColumnValueFilter end = new SingleColumnValueFilter( Bytes.toBytes("info"), Bytes.toBytes("y"), CompareOperator.EQUAL, new RegexStringComparator((" [3-9][0-9][0-9]px"))); FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, start, end); ResultScanner scanner = HbaseUtils.getScanner("stardust", filterList); if (scanner != null) { for (Result result : scanner) { HbaseUtils.printResult(result); System.out.println("---"); } scanner.close(); }}

对工具类的使用可以灵活变通，特别是过滤器可以耍出很多花样，这里就不再示例了
重要的还是对 Hbase 工具类的理解，还可以继续往工具类中添加功能（这里示范的只是一个很简单的工具类）

4、写在最后

使用 Hbase 工具类对 Hbase 进行操作依旧很麻烦，特别是查询要求复杂时需要用到很多过滤器
所以下一弹会介绍 apache 的另一个项目 Phoenix，可以像 MySQL 一样使用 SQL 操作 Hbase

冰面上的鱼早就 over 拉（IceClean）

上一篇：多智能体强化学习在智能工厂在线调度中应用

下一篇：Scala的安装和环境变量配置