ShardingSphere解决大数据存储

时间：2023-04-17

1、什么是分库分表

分库分表就是为了解决由于数据量过大而导致数据库性能降低的问题，将原来独立的数据
库拆分成若干数据库组成，将数据大表拆分成若干数据表组成，使得单一数据库、单一数
据表的数据量变小，从而达到提升数据库性能的目的

2.水平分表

有5000万数据要入库，根据阿里巴巴规范，一张表最多存500万数据，现在有5000万数据，所以先建立20张同样的表，每张表入库大约250万数据，用sharding sphere来分表

配置如下：

# shardingjdbc分片策略# 配置数据源，给数据源起名称spring.shardingsphere.datasource.names=m1# 一个实体类对应两张表，覆盖spring.main.allow-bean-definition-overriding=true#配置数据源具体内容，包含连接池，驱动，地址，用户名和密码spring.shardingsphere.datasource.m1.type=com.alibaba.druid.pool.DruidDataSourcespring.shardingsphere.datasource.m1.driver-class-name=com.mysql.cj.jdbc.Driverspring.shardingsphere.datasource.m1.url=jdbc:mysql://localhost:3306/course_db?serverTimezone=GMT%2B8spring.shardingsphere.datasource.m1.username=rootspring.shardingsphere.datasource.m1.password=root123#指定course表分布情况，配置表在哪个数据库里面，表名称都是什么 m1.course_1 , m1.course_2spring.shardingsphere.sharding.tables.course.actual-data-nodes=m1.course_$->{1..2}# 指定course表里面主键cid 生成策略 SNOWFLAKEspring.shardingsphere.sharding.tables.course.key-generator.column=cidspring.shardingsphere.sharding.tables.course.key-generator.type=SNOWFLAKE# 指定分片策略约定cid值偶数添加到course_1表，如果cid是奇数添加到course_2表spring.shardingsphere.sharding.tables.course.table-strategy.inline.sharding-column=cidspring.shardingsphere.sharding.tables.course.table-strategy.inline.algorithm-expression=course_$->{cid % 2 + 1}# 打开sql输出日志spring.shardingsphere.props.sql.show=true

注意这里的id为雪花算法，所以cid长度很大，mysql建立主键推荐用自增的id，所以我们换一个生成id的方法。

1.根据spi机制：在resource下面新建文件夹：meta-INF/services，然后新建文件：org.apache.shardingsphere.spi.keygen.ShardingKeyGenerator

在文件里面新增内容：

com.atguigu.shardingjdbcdemo.SimpleShardingKeyGenerator

内容如下：public class SimpleShardingKeyGenerator implements ShardingKeyGenerator { private AtomicLong atomic = new AtomicLong(0); @Getter @Setter private Properties properties = new Properties(); @Override public Comparable<?> generateKey() { return atomic.incrementAndGet(); } @Override public String getType() { //声明类型 return "SIMPLE"; } @Override public Properties getProperties() { return properties; }}

同时修改配置文件：

spring.shardingsphere.sharding.tables.course.key-generator.type=SIMPLE

这样5000万数据按照自增的顺序分布在一个数据库20个表中，完成了分表的操作

测试了一下：500w数据入库大概：跑了79分钟，好慢

如果是5000w数据，那就分10个数据库，一个数据库20张表，每张表25000数据，

那么一个数据库就是500w,10个数据库就是5000w达到需求了，其实可以每张表

500w数据库，那么20张表就是1000w,10个数据库就是1亿，如果是10台服务器，可以轻松

容下10亿数据，如果有100亿的量，可以水平扩展表。20张表*50个数据库*100台服务器

20*500w*50*100=5000亿，是的5000亿，这个数据量惊人啊。

查询的话：会遍历20个数据库，效率有点低

========================水平分库分表=========================

同样要将500万数据放入5个数据库中，每个数据库4张表，完成分库分表操作

配置如下：

# shardingjdbc分片策略# 配置数据源，给数据源起名称,# 水平分库，配置两个数据源spring.shardingsphere.datasource.names=m1,m2,m3,m4,m5# 一个实体类对应两张表，覆盖spring.main.allow-bean-definition-overriding=true#配置第一个数据源具体内容，包含连接池，驱动，地址，用户名和密码spring.shardingsphere.datasource.m1.type=com.alibaba.druid.pool.DruidDataSourcespring.shardingsphere.datasource.m1.driver-class-name=com.mysql.cj.jdbc.Driverspring.shardingsphere.datasource.m1.url=jdbc:mysql://localhost:3306/edu_db_1?serverTimezone=GMT%2B8spring.shardingsphere.datasource.m1.username=rootspring.shardingsphere.datasource.m1.password=root1234%#配置第二个数据源具体内容，包含连接池，驱动，地址，用户名和密码spring.shardingsphere.datasource.m2.type=com.alibaba.druid.pool.DruidDataSourcespring.shardingsphere.datasource.m2.driver-class-name=com.mysql.cj.jdbc.Driverspring.shardingsphere.datasource.m2.url=jdbc:mysql://localhost:3306/edu_db_2?serverTimezone=GMT%2B8spring.shardingsphere.datasource.m2.username=rootspring.shardingsphere.datasource.m2.password=root1234%#配置第三个数据源具体内容，包含连接池，驱动，地址，用户名和密码spring.shardingsphere.datasource.m3.type=com.alibaba.druid.pool.DruidDataSourcespring.shardingsphere.datasource.m3.driver-class-name=com.mysql.cj.jdbc.Driverspring.shardingsphere.datasource.m3.url=jdbc:mysql://localhost:3306/edu_db_3?serverTimezone=GMT%2B8spring.shardingsphere.datasource.m3.username=rootspring.shardingsphere.datasource.m3.password=root1234%#配置第四个数据源具体内容，包含连接池，驱动，地址，用户名和密码spring.shardingsphere.datasource.m4.type=com.alibaba.druid.pool.DruidDataSourcespring.shardingsphere.datasource.m4.driver-class-name=com.mysql.cj.jdbc.Driverspring.shardingsphere.datasource.m4.url=jdbc:mysql://localhost:3306/edu_db_4?serverTimezone=GMT%2B8spring.shardingsphere.datasource.m4.username=rootspring.shardingsphere.datasource.m4.password=root1234%#配置第五个数据源具体内容，包含连接池，驱动，地址，用户名和密码spring.shardingsphere.datasource.m5.type=com.alibaba.druid.pool.DruidDataSourcespring.shardingsphere.datasource.m5.driver-class-name=com.mysql.cj.jdbc.Driverspring.shardingsphere.datasource.m5.url=jdbc:mysql://localhost:3306/edu_db_5?serverTimezone=GMT%2B8spring.shardingsphere.datasource.m5.username=rootspring.shardingsphere.datasource.m5.password=root1234%#指定数据库分布情况，数据库里面表分布情况# m1 m2 course_1 course_2spring.shardingsphere.sharding.tables.course.actual-data-nodes=m$->{1..5}.course_$->{1..4}# 指定course表里面主键cid 生成策略 SNOWFLAKEspring.shardingsphere.sharding.tables.course.key-generator.column=cidspring.shardingsphere.sharding.tables.course.key-generator.type=SIMPLE# 指定表分片策略约定cid值偶数添加到course_1表，如果cid是奇数添加到course_2表spring.shardingsphere.sharding.tables.course.table-strategy.inline.sharding-column=cidspring.shardingsphere.sharding.tables.course.table-strategy.inline.algorithm-expression=course_$->{cid % 5 + 1}# 指定数据库分片策略约定user_id是偶数添加m1，是奇数添加m2spring.shardingsphere.sharding.tables.course.database-strategy.inline..sharding-column=user_idspring.shardingsphere.sharding.tables.course.database-strategy.inline.algorithm-expression=m$->{user_id % 4 + 1}# 打开sql输出日志spring.shardingsphere.props.sql.show=true

测试的时候我们就不入库500w数据了，太慢了，先入库20000数据量，看看结果

================垂直分库，也就是专表专用================

配置如下：

# shardingjdbc分片策略# 配置数据源，给数据源起名称,# 水平分库，配置两个数据源spring.shardingsphere.datasource.names=m1,m2,m0# 一个实体类对应两张表，覆盖spring.main.allow-bean-definition-overriding=true#配置第一个数据源具体内容，包含连接池，驱动，地址，用户名和密码spring.shardingsphere.datasource.m1.type=com.alibaba.druid.pool.DruidDataSourcespring.shardingsphere.datasource.m1.driver-class-name=com.mysql.cj.jdbc.Driverspring.shardingsphere.datasource.m1.url=jdbc:mysql://localhost:3306/edu_db_1?serverTimezone=GMT%2B8spring.shardingsphere.datasource.m1.username=rootspring.shardingsphere.datasource.m1.password=root1234%#配置第二个数据源具体内容，包含连接池，驱动，地址，用户名和密码spring.shardingsphere.datasource.m2.type=com.alibaba.druid.pool.DruidDataSourcespring.shardingsphere.datasource.m2.driver-class-name=com.mysql.cj.jdbc.Driverspring.shardingsphere.datasource.m2.url=jdbc:mysql://localhost:3306/edu_db_2?serverTimezone=GMT%2B8spring.shardingsphere.datasource.m2.username=rootspring.shardingsphere.datasource.m2.password=root1234%#配置第三个数据源具体内容，包含连接池，驱动，地址，用户名和密码spring.shardingsphere.datasource.m0.type=com.alibaba.druid.pool.DruidDataSourcespring.shardingsphere.datasource.m0.driver-class-name=com.mysql.cj.jdbc.Driverspring.shardingsphere.datasource.m0.url=jdbc:mysql://localhost:3306/user_db?serverTimezone=GMT%2B8spring.shardingsphere.datasource.m0.username=rootspring.shardingsphere.datasource.m0.password=root1234%# 配置user_db数据库里面t_user 专库专表spring.shardingsphere.sharding.tables.t_user.actual-data-nodes=m$->{0}.t_user# 指定course表里面主键cid 生成策略 SNOWFLAKEspring.shardingsphere.sharding.tables.t_user.key-generator.column=user_idspring.shardingsphere.sharding.tables.t_user.key-generator.type=SNOWFLAKE# 指定表分片策略约定cid值偶数添加到course_1表，如果cid是奇数添加到course_2表spring.shardingsphere.sharding.tables.t_user.table-strategy.inline.sharding-column=user_idspring.shardingsphere.sharding.tables.t_user.table-strategy.inline.algorithm-expression=t_user#指定数据库分布情况，数据库里面表分布情况# m1 m2 course_1 course_2spring.shardingsphere.sharding.tables.course.actual-data-nodes=m$->{1..2}.course_$->{1..2}# 指定course表里面主键cid 生成策略 SNOWFLAKEspring.shardingsphere.sharding.tables.course.key-generator.column=cidspring.shardingsphere.sharding.tables.course.key-generator.type=SNOWFLAKE# 指定表分片策略约定cid值偶数添加到course_1表，如果cid是奇数添加到course_2表spring.shardingsphere.sharding.tables.course.table-strategy.inline.sharding-column=cidspring.shardingsphere.sharding.tables.course.table-strategy.inline.algorithm-expression=course_$->{cid % 2 + 1}# 指定数据库分片策略约定user_id是偶数添加m1，是奇数添加m2#spring.shardingsphere.sharding.default-database-strategy.inline.sharding-column=user_id#spring.shardingsphere.sharding.default-database-strategy.inline.algorithm-expression=m$->{user_id % 2 + 1}spring.shardingsphere.sharding.tables.course.database-strategy.inline..sharding-column=user_idspring.shardingsphere.sharding.tables.course.database-strategy.inline.algorithm-expression=m$->{user_id % 2 + 1}# 打开sql输出日志spring.shardingsphere.props.sql.show=true

根据上面的配置，t_user表只入库m0数据库

垂直分库

============公共表的使用======================

比如一张字典表，在所有的数据库中，所有的表中都需要用到，

配置如下：

# shardingjdbc分片策略# 配置数据源，给数据源起名称,# 水平分库，配置两个数据源spring.shardingsphere.datasource.names=m1,m2,m0# 一个实体类对应两张表，覆盖spring.main.allow-bean-definition-overriding=true#配置第一个数据源具体内容，包含连接池，驱动，地址，用户名和密码spring.shardingsphere.datasource.m1.type=com.alibaba.druid.pool.DruidDataSourcespring.shardingsphere.datasource.m1.driver-class-name=com.mysql.cj.jdbc.Driverspring.shardingsphere.datasource.m1.url=jdbc:mysql://localhost:3306/edu_db_1?serverTimezone=GMT%2B8spring.shardingsphere.datasource.m1.username=rootspring.shardingsphere.datasource.m1.password=root1234%#配置第二个数据源具体内容，包含连接池，驱动，地址，用户名和密码spring.shardingsphere.datasource.m2.type=com.alibaba.druid.pool.DruidDataSourcespring.shardingsphere.datasource.m2.driver-class-name=com.mysql.cj.jdbc.Driverspring.shardingsphere.datasource.m2.url=jdbc:mysql://localhost:3306/edu_db_2?serverTimezone=GMT%2B8spring.shardingsphere.datasource.m2.username=rootspring.shardingsphere.datasource.m2.password=root1234%#配置第三个数据源具体内容，包含连接池，驱动，地址，用户名和密码spring.shardingsphere.datasource.m0.type=com.alibaba.druid.pool.DruidDataSourcespring.shardingsphere.datasource.m0.driver-class-name=com.mysql.cj.jdbc.Driverspring.shardingsphere.datasource.m0.url=jdbc:mysql://localhost:3306/user_db?serverTimezone=GMT%2B8spring.shardingsphere.datasource.m0.username=rootspring.shardingsphere.datasource.m0.password=root1234%# 配置公共表spring.shardingsphere.sharding.broadcast-tables=t_udictspring.shardingsphere.sharding.tables.t_udict.key-generator.column=dictidspring.shardingsphere.sharding.tables.t_udict.key-generator.type=SNOWFLAKE# 配置user_db数据库里面t_user 专库专表spring.shardingsphere.sharding.tables.t_user.actual-data-nodes=m$->{0}.t_user# 指定course表里面主键cid 生成策略 SNOWFLAKEspring.shardingsphere.sharding.tables.t_user.key-generator.column=user_idspring.shardingsphere.sharding.tables.t_user.key-generator.type=SNOWFLAKE# 指定表分片策略约定cid值偶数添加到course_1表，如果cid是奇数添加到course_2表spring.shardingsphere.sharding.tables.t_user.table-strategy.inline.sharding-column=user_idspring.shardingsphere.sharding.tables.t_user.table-strategy.inline.algorithm-expression=t_user#指定数据库分布情况，数据库里面表分布情况# m1 m2 course_1 course_2spring.shardingsphere.sharding.tables.course.actual-data-nodes=m$->{1..2}.course_$->{1..2}# 指定course表里面主键cid 生成策略 SNOWFLAKEspring.shardingsphere.sharding.tables.course.key-generator.column=cidspring.shardingsphere.sharding.tables.course.key-generator.type=SNOWFLAKE# 指定表分片策略约定cid值偶数添加到course_1表，如果cid是奇数添加到course_2表spring.shardingsphere.sharding.tables.course.table-strategy.inline.sharding-column=cidspring.shardingsphere.sharding.tables.course.table-strategy.inline.algorithm-expression=course_$->{cid % 2 + 1}# 指定数据库分片策略约定user_id是偶数添加m1，是奇数添加m2#spring.shardingsphere.sharding.default-database-strategy.inline.sharding-column=user_id#spring.shardingsphere.sharding.default-database-strategy.inline.algorithm-expression=m$->{user_id % 2 + 1}spring.shardingsphere.sharding.tables.course.database-strategy.inline..sharding-column=user_idspring.shardingsphere.sharding.tables.course.database-strategy.inline.algorithm-expression=m$->{user_id % 2 + 1}# 打开sql输出日志spring.shardingsphere.props.sql.show=true

运行结果：

三个数据中t_dict都有数据，删除操作的话，三张表的数据都会被清除

上一篇：Netty

下一篇：elasticsearch