即席查询调研报告
即席查询调研报告
什么是即席查询
即席查询(Ad Hoc)是用户根据自己的需求,灵活的选择查询条件,系统能够根据用户的选择生成相应的统计报表。即席查询与普通应用查询最大的不同是普通的应用查询是定制开发的,而即席查询是指那些用户在使用系统时,根据自己当时的需求自定义的查询。
测试组件介绍
Apache Doris
是一个现代化的MPP分析型数据库产品。仅需亚秒级响应时间即可获得查询结果,有效地支持实时数据分析。支持数据实时导入和实时查询,并且可以支持10PB以上的超大数据集。提供 OLAP 数据分析。(同时支持聚合和明细查询分析)
starRocks
fork于 Apache Doris,在此基础上做了一些增强
特性
1、联邦查询
StarRocks支持使用外表的方式进行联邦查询,当前可以支持Hive,MySQL,Elasticserach三种类型的外表,用户无需通过数据导入,可以直接进行数据查询加速。
2、全面向量化引擎
CPU需要支持AVX2指令集, cat /proc/cpuinfo |grep avx2有结果输出表明CPU支持,如果没有支持,建议更换机器,StarRocks使用向量化技术需要一定的指令集支持才能发挥效果。
3、智能查询优化
StarRocks通过CBO优化器(Cost Based Optimizer)可以对复杂查询自动优化。无需人工干预,就可以通过统计信息合理估算执行成本,生成更优的执行计划,大大提高了Adhoc和ETL场景的数据分析效率。
4、智能物化视图
StarRocks支持智能的物化视图。用户可以通过创建物化视图,预先计算生成预聚合表用于加速聚合类查询请求。
StarRocks的物化视图能够在数据导入时自动完成汇聚,与原始表数据保持一致。并且在查询的时候,用户无需指定物化视图,StarRocks能够自动选择最优的物化视图来满足查询请求。
5、标准SQL
StarRocks支持标准的SQL语法,包括聚合,JOIN,排序,窗口函数,自定义函数等功能。StarRocks可以完整支持TPC-H的22个SQL和TPC-DS的99个SQL。此外,StarRocks还兼容MySQL协议语法,可使用现有 的各种客户端工具、BI软件访问StarRocks,
对StarRocks中的数据进行拖拽式分析。
6、流批一体
StarRocks支持实时和批量两种数据导入方式,支持的数据源有Kafka, HDFS, 本地文件,支持的数据格式有ORC, Parquet和CSV等,StarRocks可以实时消费Kafka数据来完成数据导入,保证数据不丢不重(exactly once)。StarRocks也可以从本地或者远程(HDFS)批量导入数据。
Presto
Presto是一个分布式SQL查询引擎, 它被设计为用来专门进行高速、实时的数据分析。它支持标准的ANSI SQL,包括复杂查询、聚合(aggregation)、连接(join)和窗口函数(window functions),动态编译/利用cpu的指令集并行化执行。它的产生是为了解决hive的MR太慢的问题,Presto 本身并不存储数据,但是可以接入多种数据源,并且支持跨数据源的级联查询。Presto是一个OLAP的工具,擅长对海量数据进行复杂的分析。
ClickHouse
Clickhouse由俄罗斯yandex公司开发。是一款用于大数据实时分析的列式数据库管理系统。通过向量化的执行以及对cpu底层指令集的使用,可以对海量数据进行并行处理,从而加快数据的处理速度。
除此之外,adhoc还有Impala,但经社区反馈其多表查询性能和presto差不多,但是单表查询方面却不如presto好。而且Impala有很多不支持的地方,例如:不支持Date数据类型,不支持ORC文件格式等等,需要采用parquet格式进行查询,而且Impala在查询时占用的内存很大。因此直接pass掉
测试方法
TPC-H基准压测
TPC-H 基准测试是由TPC-D发展而来的,是面向商品零售业的决策支持系统测试基准,它定义了8张表,22个查询,遵循SQL92;基准的数据库模式遵循第三范式;新兴的数据仓库开始采用新的模型,如星型模型、雪花模型。
表的信息:2个事实表:lineorder,orders 6个维度表:customer,part,parsupp,supplier,region,nation
数据查询:22条标准SQL查询测试语句:统计查询、多表关联、sum、复杂条件、group by、order by等组合方式
SSB基准压测
SSB(Star Schema Benchmark)是在TPC-H基础之上,麻省州立大学波士顿校区的研究人员定义的基于现实商业应用的数据模型,是学术界和工业界广泛使用的一个星型模型测试集(来源论文),比较公正和中立。
表的信息:1张大宽表:lineorder_flat;1张事实表:lineorder4张维度表:customer,part,dates,supplier
数据查询:2类查询场景(单表和多表join)每类场景下有13条标准SQL查询测试语句:统计查询、多表关联、sum、复杂条件、group by、order by等组合方式
本次主要通过ssb基准进行测试,通过这个测试集合可以方便的对比各种OLAP产品的基础性能指标。Clickhouse 通过改写SSB,将星型模型打平转化成宽表,改造成了一个单表测试benchmark。本报告记进行了StarRocks、ApacheDoris和Clickhouse在SSB单表测试,以及StarRocks和ApacheDoris的多表测试,由于Clickhouse对多表Join支持有限,这也不是clickhouse的强项,所以在多表测试中并未将Clickhouse加入测试。
测试环境
硬件环境
机器 | 1台腾讯云主机 |
---|---|
CPU | 16核(超线程) |
内存 | 64G |
软件环境
容器化环境部署,StarRocks,Apache Doris和Clickhouse容器分别部署在相同的机器上进行测试,都为standalone模式。
StarRocks和Apache Doris的基础镜像为centos:7.9,Clickhouse基础镜像为Ubuntu 20,考虑到容器本身对资源的消耗很小,可以忽略不计。
软件版本:StarRocks community 1.18.2 ,Aapche Doris 0.14.0,Clickhouse 21.8.6.15
测试数据
测试数据生成
StarRocks和Apache Doris测试数据生成
编译 :
wget https://starrocks-public.oss-cn-beijing.aliyuncs.com/ssb-poc-0.9.2.zip
unzip ssb-poc-0.9.2.zip
cd ssb-poc
make && make install
生成数据:
cd output
bin/gen-ssb.sh 10 data_dir
注意: 如果使用因子100,生成6亿条记录的初始文件,约59G;使用因子1000,生成60亿条记录的初始文件,约600G;本次使用因子10,生成6千万条记录的初始文件,约6G。
创建表结构:
修改配置文件conf/starrocks.conf,指定脚本操作的集群地址
# for mysql cmd
mysql_host: 192.168.1.1 #fe地址
mysql_port: 9030
mysql_user: root
mysql_password:
database: ssb
# cluster ports
http_port: 8030
be_heartbeat_port: 9050
broker_port: 8000
# parallel_fragment_exec_instance_num
parallel_num: 1
...
使用Stream load导入单表数据
bin/stream_load.sh data_dir
插入数据到宽表lineorder_flat
bin/flat_insert.sh
或者创建分区宽表lineorder_flat
CREATE TABLE `lineorder_flat` (
`lo_orderdate` date NOT NULL COMMENT "",
`lo_orderkey` int(11) NOT NULL COMMENT "",
`lo_linenumber` tinyint(4) NOT NULL COMMENT "",
`lo_custkey` int(11) NOT NULL COMMENT "",
`lo_partkey` int(11) NOT NULL COMMENT "",
`lo_suppkey` int(11) NOT NULL COMMENT "",
`lo_orderpriority` varchar(100) NOT NULL COMMENT "",
`lo_shippriority` tinyint(4) NOT NULL COMMENT "",
`lo_quantity` tinyint(4) NOT NULL COMMENT "",
`lo_extendedprice` int(11) NOT NULL COMMENT "",
`lo_ordtotalprice` int(11) NOT NULL COMMENT "",
`lo_discount` tinyint(4) NOT NULL COMMENT "",
`lo_revenue` int(11) NOT NULL COMMENT "",
`lo_supplycost` int(11) NOT NULL COMMENT "",
`lo_tax` tinyint(4) NOT NULL COMMENT "",
`lo_commitdate` date NOT NULL COMMENT "",
`lo_shipmode` varchar(100) NOT NULL COMMENT "",
`c_name` varchar(100) NOT NULL COMMENT "",
`c_address` varchar(100) NOT NULL COMMENT "",
`c_city` varchar(100) NOT NULL COMMENT "",
`c_nation` varchar(100) NOT NULL COMMENT "",
`c_region` varchar(100) NOT NULL COMMENT "",
`c_phone` varchar(100) NOT NULL COMMENT "",
`c_mktsegment` varchar(100) NOT NULL COMMENT "",
`s_region` varchar(100) NOT NULL COMMENT "",
`s_nation` varchar(100) NOT NULL COMMENT "",
`s_city` varchar(100) NOT NULL COMMENT "",
`s_name` varchar(100) NOT NULL COMMENT "",
`s_address` varchar(100) NOT NULL COMMENT "",
`s_phone` varchar(100) NOT NULL COMMENT "",
`p_name` varchar(100) NOT NULL COMMENT "",
`p_mfgr` varchar(100) NOT NULL COMMENT "",
`p_category` varchar(100) NOT NULL COMMENT "",
`p_brand` varchar(100) NOT NULL COMMENT "",
`p_color` varchar(100) NOT NULL COMMENT "",
`p_type` varchar(100) NOT NULL COMMENT "",
`p_size` tinyint(4) NOT NULL COMMENT "",
`p_container` varchar(100) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`lo_orderdate`, `lo_orderkey`)
COMMENT "OLAP"
PARTITION BY RANGE(`lo_orderdate`)
(PARTITION p1 VALUES [('0000-01-01'), ('1993-01-01')),
PARTITION p2 VALUES [('1993-01-01'), ('1994-01-01')),
PARTITION p3 VALUES [('1994-01-01'), ('1995-01-01')),
PARTITION p4 VALUES [('1995-01-01'), ('1996-01-01')),
PARTITION p5 VALUES [('1996-01-01'), ('1997-01-01')),
PARTITION p6 VALUES [('1997-01-01'), ('1998-01-01')),
PARTITION p7 VALUES [('1998-01-01'), ('1999-01-01')))
DISTRIBUTED BY HASH(`lo_orderkey`) BUCKETS 48
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);
clickhouse测试数据生成
编译 :
git clone https://github.com/vadimtk/ssb-dbgen.git
cd ssb-dbgen
make
其他参考链接,使用因子10,生成6千万条测试数据。
测试数据详情
表名 | 行数 | 解释 |
---|---|---|
lineorder | 6千万 | SSB商品订单表 |
customer | 30万 | SSB客户表 |
part | 80万 | SSB 零部件表 |
supplier | 2万 | SSB 供应商表 |
dates | 2556 | 日期表 |
lineorder_flat | 6千万 | SSB打平后的宽表 |
测试sql
单表测试SQL
--Q1.1
SELECT sum(lo_extendedprice * lo_discount) AS `revenue`
FROM lineorder_flat
WHERE lo_orderdate >= '1993-01-01' and lo_orderdate <= '1993-12-31' AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;
--Q1.2
SELECT sum(lo_extendedprice * lo_discount) AS revenue FROM lineorder_flat
WHERE lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-01-31' AND lo_discount BETWEEN 4 AND 6 AND lo_quantity BETWEEN 26 AND 35;
--Q1.3
SELECT sum(lo_extendedprice * lo_discount) AS revenue
FROM lineorder_flat
WHERE weekofyear(lo_orderdate) = 6 AND lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-12-31'
AND lo_discount BETWEEN 5 AND 7 AND lo_quantity BETWEEN 26 AND 35;
--Q2.1
SELECT sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_category = 'MFGR#12' AND s_region = 'AMERICA'
GROUP BY year, p_brand
ORDER BY year, p_brand;
--Q2.2
SELECT
sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_brand >= 'MFGR#2221' AND p_brand <= 'MFGR#2228' AND s_region = 'ASIA'
GROUP BY year, p_brand
ORDER BY year, p_brand;
--Q2.3
SELECT sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_brand = 'MFGR#2239' AND s_region = 'EUROPE'
GROUP BY year, p_brand
ORDER BY year, p_brand;
--Q3.1
SELECT c_nation, s_nation, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue FROM lineorder_flat
WHERE c_region = 'ASIA' AND s_region = 'ASIA' AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_nation, s_nation, year
ORDER BY year ASC, revenue DESC;
--Q3.2
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_nation = 'UNITED STATES' AND s_nation = 'UNITED STATES' AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
--Q3.3
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_city in ( 'UNITED KI1' ,'UNITED KI5') AND s_city in ( 'UNITED KI1' ,'UNITED KI5') AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
--Q3.4
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_city in ('UNITED KI1', 'UNITED KI5') AND s_city in ( 'UNITED KI1', 'UNITED KI5') AND lo_orderdate >= '1997-12-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
--Q4.1
set vectorized_engine_enable = FALSE;
SELECT year(lo_orderdate) AS year, c_nation, sum(lo_revenue - lo_supplycost) AS profit FROM lineorder_flat
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2')
GROUP BY year, c_nation
ORDER BY year ASC, c_nation ASC;
--Q4.2
SELECT year(lo_orderdate) AS year,
s_nation, p_category, sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2')
GROUP BY year, s_nation, p_category
ORDER BY year ASC, s_nation ASC, p_category ASC;
--Q4.3
SELECT year(lo_orderdate) AS year, s_city, p_brand,
sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE s_nation = 'UNITED STATES' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_category = 'MFGR#14'
GROUP BY year, s_city, p_brand
ORDER BY year ASC, s_city ASC, p_brand ASC;
多表测试SQL
--Q1.1
select sum(lo_revenue) as revenue
from lineorder join dates on lo_orderdate = d_datekey
where d_year = 1993 and lo_discount between 1 and 3 and lo_quantity < 25;
--Q1.2
select sum(lo_revenue) as revenue
from lineorder
join dates on lo_orderdate = d_datekey
where d_yearmonthnum = 199401
and lo_discount between 4 and 6
and lo_quantity between 26 and 35;
--Q1.3
select sum(lo_revenue) as revenue
from lineorder
join dates on lo_orderdate = d_datekey
where d_weeknuminyear = 6 and d_year = 1994
and lo_discount between 5 and 7
and lo_quantity between 26 and 35;
--Q2.1
select sum(lo_revenue) as lo_revenue, d_year, p_brand
from lineorder
inner join dates on lo_orderdate = d_datekey
join part on lo_partkey = p_partkey
join supplier on lo_suppkey = s_suppkey
where p_category = 'MFGR#12' and s_region = 'AMERICA'
group by d_year, p_brand
order by d_year, p_brand;
--Q2.2
select sum(lo_revenue) as lo_revenue, d_year, p_brand
from lineorder
join dates on lo_orderdate = d_datekey
join part on lo_partkey = p_partkey
join supplier on lo_suppkey = s_suppkey
where p_brand between 'MFGR#2221' and 'MFGR#2228' and s_region = 'ASIA'
group by d_year, p_brand
order by d_year, p_brand;
--Q2.3
select sum(lo_revenue) as lo_revenue, d_year, p_brand
from lineorder
join dates on lo_orderdate = d_datekey
join part on lo_partkey = p_partkey
join supplier on lo_suppkey = s_suppkey
where p_brand = 'MFGR#2239' and s_region = 'EUROPE'
group by d_year, p_brand
order by d_year, p_brand;
--Q3.1
select c_nation, s_nation, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where c_region = 'ASIA' and s_region = 'ASIA'and d_year >= 1992 and d_year <= 1997
group by c_nation, s_nation, d_year
order by d_year asc, lo_revenue desc;
--Q3.2
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where c_nation = 'UNITED STATES' and s_nation = 'UNITED STATES'
and d_year >= 1992 and d_year <= 1997
group by c_city, s_city, d_year
order by d_year asc, lo_revenue desc;
--Q3.3
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where (c_city='UNITED KI1' or c_city='UNITED KI5')
and (s_city='UNITED KI1' or s_city='UNITED KI5')
and d_year >= 1992 and d_year <= 1997
group by c_city, s_city, d_year
order by d_year asc, lo_revenue desc;
--Q3.4
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where (c_city='UNITED KI1' or c_city='UNITED KI5') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_yearmonth
= 'Dec1997'
group by c_city, s_city, d_year
order by d_year asc, lo_revenue desc;
--Q4.1
select d_year, c_nation, sum(lo_revenue) - sum(lo_supplycost) as profit
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
where c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2')
group by d_year, c_nation
order by d_year, c_nation;
--Q4.2
select d_year, s_nation, p_category, sum(lo_revenue) - sum(lo_supplycost) as profit
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
where c_region = 'AMERICA'and s_region = 'AMERICA'
and (d_year = 1997 or d_year = 1998)
and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2')
group by d_year, s_nation, p_category
order by d_year, s_nation, p_category;
--Q4.3
select d_year, s_city, p_brand, sum(lo_revenue) - sum(lo_supplycost) as profit
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
where c_region = 'AMERICA'and s_nation = 'UNITED STATES'
and (d_year = 1997 or d_year = 1998)
and p_category = 'MFGR#14'
group by d_year, s_city, p_brand
order by d_year, s_city, p_brand;
运行测试用例
本测试采用单核测试
clickhouse
在客户端执行命令
set max_threads=1;
clickhouse 使用上面单表测试SQL进行单表性能测试
starrocks和Apache doris
在客户端执行命令
set global parallel_fragment_exec_instance_num = 1;
starrocks和Apache doris分别单表和多表测试,执行上面的SQL
或者执行上面生成测试数据生成的脚本分别进行单表和多表测试
执行SQL:
bin/benchmark.sh -p -d ssb
bin/benchmark.sh -p -d ssb-flat
测试结果
单表测试结果
SQL | StarRocks 用时(ms)fisrt | StarRocks 用时(ms) | Apache Doris 用时(ms)first | Apache Doris 用时(ms) | Clickhouse用时(ms)first | Clickhouse用时(ms) |
---|---|---|---|---|---|---|
Q1.1 | 6882 | 155 | 398 | 416 | 479 | 213 |
Q1.2 | 129 | 90 | 206 | 203 | 62 | 24 |
Q1.3 | 98 | 95 | 187 | 200 | 7 | 7 |
Q2.1 | 1715 | 529 | 7869 | 7678 | 2856 | 1199 |
Q2.2 | 405 | 437 | 6151 | 6329 | 965 | 984 |
Q2.3 | 183 | 160 | 796 | 724 | 935 | 952 |
Q3.1 | 993 | 1056 | 8170 | 8034 | 1710 | 1342 |
Q3.2 | 213 | 218 | 1127 | 1001 | 971 | 655 |
Q3.3 | 175 | 118 | 575 | 565 | 417 | 429 |
Q3.4 | 119 | 163 | 217 | 268 | 25 | 19 |
Q4.1 | 1134 | 1100 | 9730 | 8981 | 2158 | 1656 |
Q4.2 | 383 | 359 | 2905 | 2694 | 346 | 323 |
Q4.3 | 284 | 263 | 802 | 1019 | 330 | 298 |
多表测试结果
SQL | StarRocks 用时(ms)first | StarRocks 用时(ms) | Apache Doris 用时(ms)fisrt | Apache Doris 用时(ms) |
---|---|---|---|---|
Q1.1 | 2349 | 46 | 5066 | 247 |
Q1.2 | 3775 | 42 | 165 | 114 |
Q1.3 | 47 | 45 | 141 | 174 |
Q2.1 | 11486 | 126 | 3450 | 505 |
Q2.2 | 204 | 177 | 513 | 501 |
Q2.3 | 103 | 92 | 428 | 422 |
Q3.1 | 569 | 245 | 2714 | 954 |
Q3.2 | 995 | 160 | 697 | 451 |
Q3.3 | 129 | 86 | 487 | 413 |
Q3.4 | 46 | 46 | 397 | 404 |
Q4.1 | 1335 | 261 | 837 | 825 |
Q4.2 | 122 | 96 | 741 | 744 |
Q4.3 | 89 | 79 | 504 | 515 |
结果分析
从Linux 系统缓存命中率方面分析
从测试结果可以看出同一个查询引擎,第一次运行的结果都会比再次运行的时间要长。因此禁用系统交换分区和提高缓存命中率可以有效地提高分析引擎的查询效率。
从不同查询引擎方面分析
-
从单表测试结果分析,StarRocks 的性能以微弱的优势领先于 Clickhouse,但都明显高于Apache Doris。
-
从多表测试结果分析,StarRocks 的性能明显高于Apache Doris。
-
另外从社区反馈分析,Clickhouse的性能在单表性能方面明显高于presto,参考链接; Presto和DorisDB在同等环境下性能对比,ORC格式下StarRocks 查询Hive外表的性能大约是Presto的3.35倍,参考链接
结论
采用满分为5来比较各方面性能
StarRocks | Apache Doris | Clickhouse | presto | |
---|---|---|---|---|
单表查询性能 | 5 | 4 | 5 | 4 |
多表查询性能 | 5 | 4 | 1 | 3 |
外表支持 | 3(有限支持) | 2(有限支持) | 0(不支持) | 5(丰富) |
社区活跃度 | 1 (1.2k) | 2 (3.3k) | 5 (19.5k) | 4 (12.4k) |
时间:2021-09-23
基于以上结论,得出以下各自使用场景
clickhouse场景
- 海量数据,但又不希望单节点的存储空间消耗太高。
- 宽表,为了业务方便,可能会把很多相关数据列都整合到一个表里。
StarRocks场景
- 同时存在大量单表查询和多表聚合的需求
- 存在实时增量的场景(例如:kafka实时接入)
presto场景
- 同时存在大量单表查询和多表聚合的需求,同时对性能要求不是太高
- 存在多种数据源级联查询的场景
- 可以与hive/hdfs深入绑定,利用hdfs本地化计算的优势
由于StarRocks 是Apache Doris的加强版,因此推荐直接使用 StarRocks
附件
StarRocks docker-compose参考链接
Apache doris docker-compose参考链接
clickhouse部署参考链接
presto部署参考链接