数智资源网
首页 首页 大数据 大数据入门 查看内容

如何立足Hadoop成功建立商务智能:七项必备诀窍

木马童年 2020-10-18 14:40 10 0

在企业实施Hadoop技术时,其中的顶级用例无疑在于商务智能(简称BI)。根据新近发布的一项基准调查结果,我们整理出最适用于处理各类工作负载的几款Hadoop SQL引擎。下面,我们一起来看:1. 不存在万试万灵的选项 No S ...

在企业实施Hadoop技术时,其中的顶级用例无疑在于商务智能(简称BI)。根据新近发布的一项基准调查结果,我们整理出最适用于处理各类工作负载的几款Hadoop SQL引擎。下面,我们一起来看:

1. 不存在万试万灵的选项

  如何立足Hadoop成功建立商务智能:七项必备诀窍

No Single Best Engine

The benchmark results show that there is no one-size-fits-all general purpose engine for executing these types of queries. "Depending on raw data size, query complexity, and the target number of end-users, enterprises will find that each engine has its own 'sweet spot,'" according to the study's findings.

2. 小数据对大数据

  如何立足Hadoop成功建立商务智能:七项必备诀窍

Small Vs. Big Data

The benchmark shows that Impala and Spark SQL are the stars when it comes to queries against small data sets. AtScale said that the most recent release of Hive LLAP (Live Long and Process) shows acceptable query response times on small data sets, and that Presto also shows promise for these types of queries.

3. 少对多

  如何立足Hadoop成功建立商务智能:七项必备诀窍

Few Vs. Many

This metric looks at the performance when the data is hit with many queries at the same time. Presto, which AtScale included for the first time in this benchmark test, showed the best results for concurrency testing. Impala continued its strong concurrent query performance. Hive and Spark SQL registered significant improvements on this metric in the current benchmark test.

4. 复杂查询情况

  如何立足Hadoop成功建立商务智能:七项必备诀窍

Complex Queries

AtScale's Klahr warns that, while Impala and Presto do well on concurrency, the results shifted as queries became more complex. When it came to complex queries, SparkSQL started to outperform Impala, Klahr told InformationWeek. "You need to have a multi-engine strategy and a mechanism that can automatically route end-user queries to the right engine without the end-user having to think about 'Am I writing a Spark query or an Impala query?'" he said, noting that AtScale does perform that kind of automatic routing to the best engine.

5. 大规模数据集

  如何立足Hadoop成功建立商务智能:七项必备诀窍

Large Data Sets

Querying big data sets generally means slower results. The fastest performing engines for these data sets were Spark SQL at less than 20 seconds, followed by Impala at less than 40 seconds. Response times for both of these engines improved significantly from the benchmark six months ago to today. Hive and Presto returned results in just over 2 minutes. Increasing the number of joins generally increased processing time, according to AtScale. Spark SQL and Impala were more likely to perform best as the number of joins increased.

6. 不同引擎各擅胜场

  如何立足Hadoop成功建立商务智能:七项必备诀窍

Everybody Wins

All the engines that were evaluated registered significant performance improvements since AtScale's last benchmark test 6 months ago -- on the order of 2x to 4x, according to the company. "This is great news for those enterprises deploying BI workloads to Hadoop. We believe that a best-of-breed strategy -- best engine, best semantic Bilayer, best visualization tool -- will lead enterprises down the most successful path to BI-on-Hadoop success," the company said in its benchmark report.

7. 充分考虑开源优势

  如何立足Hadoop成功建立商务智能:七项必备诀窍

Open Source Advances

Klahr told InformationWeek in an interview that between the first edition of the benchmark 6 months ago and today, the query performance of Hive improved by 3.5x, Spark by 2.5x, and Impala by 3x. "If I'm a buyer or an executive, these improvements are going to make me stop and question any investment on a proprietary Hadoop engine," Klahr said, because these open source tools are being improved at a rapid pace.

商务智能 小数据 数据集 Hadoop
0
为您推荐
小甲鱼数据结构与算法,资源教程下载

小甲鱼数据结构与算法,资源教程下载

课程名称小甲鱼数据结构与算法,资源教程下载课程目录:01 数据结构和算法绪论02 谈谈…...

云帆大数据Hadoop从入门到上手企业开发8天学习视频,资源教程下载

云帆大数据Hadoop从入门到上手企业开发8天学习视频,

课程名称云帆大数据Hadoop从入门到上手企业开发8天学习视频,资源教程下载课程介绍超…...

数据分析工具之spss/amos精品课程零基础到精通,资源教程下载

数据分析工具之spss/amos精品课程零基础到精通,资源

课程名称数据分析工具之spss/amos精品课程零基础到精通,资源教程下载课程介绍Matlab…...

郝斌数据结构系列培训学习视频,资源教程下载

郝斌数据结构系列培训学习视频,资源教程下载

课程名称郝斌数据结构系列培训学习视频,资源教程下载课程目录01:什么叫做数据结构02…...

2017算法与数据结构C++精解-慕课网,资源教程下载

2017算法与数据结构C++精解-慕课网,资源教程下载

课程名称2017算法与数据结构C++精解-慕课网,资源教程下载课程目录第1章 当我们谈论算…...

Spark Streaming实时流处理项目实战,Spark与Spark Streaming核心架构系统实践课程下 ...

Spark Streaming实时流处理项目实战,Spark与Spark St

课程名称Spark Streaming实时流处理项目实战,Spark与Spark Streaming核心架构系统实…...