2020-05-15

Druid 0.17入门(4)—— 数据查询方式大全

Druid 0.17入门(4)—— 数据查询方式大全


本文介绍Druid查询数据的方式,首先我们保证数据已经成功载入。

Druid查询基于HTTP,Druid提供了查询视图,并对结果进行了格式化。

Druid提供了三种查询方式,SQL,原生JSON,CURL。

一、SQL查询

我们用wiki的数据为例

查询10条最多的页面编辑

SELECT page, COUNT(*) AS EditsFROM wikipediaWHERE TIMESTAMP '2015-09-12 00:00:00' <= "__time" AND "__time" < TIMESTAMP '2015-09-13 00:00:00'GROUP BY pageORDER BY Edits DESCLIMIT 10

我们在Query视图中操作

会有提示

选择Smart query limit会自动限制行数

Druid还提供了命令行查询sql 可以运行bin/dsql进行操作

Welcome to dsql, the command-line client for Druid SQL.Type "\h" for help.dsql>

提交sql

dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;┌──────────────────────────────────────────────────────────┬───────┐│ page              │ Edits │├──────────────────────────────────────────────────────────┼───────┤│ Wikipedia:Vandalismusmeldung        │ 33 ││ User:Cyde/List of candidates for speedy deletion/Subpage │ 28 ││ Jeremy Corbyn           │ 27 ││ Wikipedia:Administrators' noticeboard/Incidents   │ 21 ││ Flavia Pennetta           │ 20 ││ Total Drama Presents: The Ridonculous Race    │ 18 ││ User talk:Dudeperson176123        │ 18 ││ Wikipédia:Le Bistro/12 septembre 2015     │ 18 ││ Wikipedia:In the news/Candidates       │ 17 ││ Wikipedia:Requests for page protection     │ 17 │└──────────────────────────────────────────────────────────┴───────┘Retrieved 10 rows in 0.06s.

还可以通过Http发送SQL

curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages-sql.json { "page": "Wikipedia:Vandalismusmeldung", "Edits": 33 }, { "page": "User:Cyde/List of candidates for speedy deletion/Subpage", "Edits": 28 }, { "page": "Jeremy Corbyn", "Edits": 27 }, { "page": "Wikipedia:Administrators' noticeboard/Incidents", "Edits": 21 }, { "page": "Flavia Pennetta", "Edits": 20 }, { "page": "Total Drama Presents: The Ridonculous Race", "Edits": 18 }, { "page": "User talk:Dudeperson176123", "Edits": 18 }, { "page": "Wikipédia:Le Bistro/12 septembre 2015", "Edits": 18 }, { "page": "Wikipedia:In the news/Candidates", "Edits": 17 }, { "page": "Wikipedia:Requests for page protection", "Edits": 17 }]

更多SQL示例

时间查询

SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeletedFROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00'GROUP BY 1

分组查询

SELECT channel, page, SUM(added)FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00'GROUP BY channel, pageORDER BY SUM(added) DESC

查询原始数据

SELECT user, pageFROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00'LIMIT 5

定时查询

也可以在dsql里操作

dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;│ DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"virtualColumns":[],"dimension":{"type":"default","dimension":"page","outputName":"d0","outputType":"STRING"},"metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.001Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"context":{},"descending":false}], signature=[{d0:STRING, a0:LONG}]) │└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘Retrieved 1 row in 0.03s.

二、原生JSON查询

Druid支持基于Json的查询

{ "queryType" : "topN", "dataSource" : "wikipedia", "intervals" : ["2015-09-12/2015-09-13"], "granularity" : "all", "dimension" : "page", "metric" : "count", "threshold" : 10, "aggregations" : [ {  "type" : "count",  "name" : "count" } ]}

把json粘贴到json 查询模式窗口

Json查询是通过向router和broker发送请求

curl -X POST '<queryable_host>:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/json' -d @<query_json_file>

Druid提供了丰富的查询方式

Aggregation查询

Timeseries查询
{ "queryType": "timeseries", "dataSource": "sample_datasource", "granularity": "day", "descending": "true", "filter": { "type": "and", "fields": [  { "type": "selector", "dimension": "sample_dimension1", "value": "sample_value1" },  { "type": "or",  "fields": [   { "type": "selector", "dimension": "sample_dimension2", "value": "sample_value2" },   { "type": "selector", "dimension": "sample_dimension3", "value": "sample_value3" }  ]  } ] }, "aggregations": [ { "type": "longSum", "name": "sample_name1", "fieldName": "sample_fieldName1" }, { "type": "doubleSum", "name": "sample_name2", "fieldName": "sample_fieldName2" } ], "postAggregations": [ { "type": "arithmetic",  "name": "sample_divide",  "fn": "/",  "fields": [  { "type": "fieldAccess", "name": "postAgg__sample_name1", "fieldName": "sample_name1" },  { "type": "fieldAccess", "name": "postAgg__sample_name2", "fieldName": "sample_name2" }  ] } ], "intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ]}
TopN查询
{ "queryType": "topN", "dataSource": "sample_data", "dimension": "sample_dim", "threshold": 5, "metric": "count", "granularity": "all", "filter": { "type": "and", "fields": [  {  "type": "selector",  "dimension": "dim1",  "value": "some_value"  },  {  "type": "selector",  "dimension": "dim2",  "value": "some_other_val"  } ] }, "aggregations": [ {  "type": "longSum",  "name": "count",  "fieldName": "count" }, {  "type": "doubleSum",  "name": "some_metric",  "fieldName": "some_metric" } ], "postAggregations": [ {  "type": "arithmetic",  "name": "average",  "fn": "/",  "fields": [  {   "type": "fieldAccess",   "name": "some_metric",   "fieldName": "some_metric"  },  {   "type": "fieldAccess",   "name": "count",   "fieldName": "count"  }  ] } ], "intervals": [ "2013-08-31T00:00:00.000/2013-09-03T00:00:00.000" ]}
GroupBy查询
{ "queryType": "groupBy", "dataSource": "sample_datasource", "granularity": "day", "dimensions": ["country", "device"], "limitSpec": { "type": "default", "limit": 5000, "columns": ["country", "data_transfer"] }, "filter": { "type": "and", "fields": [  { "type": "selector", "dimension": "carrier", "value": "AT&T" },  { "type": "or",  "fields": [   { "type": "selector", "dimension": "make", "value": "Apple" },   { "type": "selector", "dimension": "make", "value": "Samsung" }  ]  } ] }, "aggregations": [ { "type": "longSum", "name": "total_usage", "fieldName": "user_count" }, { "type": "doubleSum", "name": "data_transfer", "fieldName": "data_transfer" } ], "postAggregations": [ { "type": "arithmetic",  "name": "avg_usage",  "fn": "/",  "fields": [  { "type": "fieldAccess", "fieldName": "data_transfer" },  { "type": "fieldAccess", "fieldName": "total_usage" }  ] } ], "intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ], "having": { "type": "greaterThan", "aggregation": "total_usage", "value": 100 }}

Metadata查询

TimeBoundary 查询
{ "queryType" : "timeBoundary", "dataSource": "sample_datasource", "bound"  : < "maxTime" | "minTime" > # optional, defaults to returning both timestamps if not set "filter" : { "type": "and", "fields": [<filter>, <filter>, ...] } # optional}
SegmentMetadata查询
{ "queryType":"segmentMetadata", "dataSource":"sample_datasource", "intervals":["2013-01-01/2014-01-01"]}
DatasourceMetadata查询
{ "queryType" : "dataSourceMetadata", "dataSource": "sample_datasource"}

Search查询

{ "queryType": "search", "dataSource": "sample_datasource", "granularity": "day", "searchDimensions": [ "dim1", "dim2" ], "query": { "type": "insensitive_contains", "value": "Ke" }, "sort" : { "type": "lexicographic" }, "intervals": [ "2013-01-01T00:00:00.000/2013-01-03T00:00:00.000" ]}

查询建议

用Timeseries和TopN替代GroupBy

取消查询

DELETE /druid/v2/{queryId}
curl -X DELETE " "error" : "Query timeout", "errorMessage" : "Timeout waiting for task.", "errorClass" : "java.util.concurrent.TimeoutException", "host" : "druid1.example.com:8083"}

三、CURL

基于Http的查询

curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages.json id="四、客户端查询">四、客户端查询

客户端查询是基于json的

具体查看 https://druid.apache.org/libraries.html

比如python查询的pydruid

from pydruid.client import *from pylab import pltquery = PyDruid(druid_url_goes_here, 'druid/v2')ts = query.timeseries( datasource='twitterstream', granularity='day', intervals='2014-02-02/p4w', aggregations={'length': doublesum('tweet_length'), 'count': doublesum('count')}, post_aggregations={'avg_tweet_length': (Field('length') / Field('count'))}, filter=Dimension('first_hashtag') == 'sochi2014')df = query.export_pandas()df['timestamp'] = df['timestamp'].map(lambda x: x.split('T')[0])df.plot(x='timestamp', y='avg_tweet_length', ylim=(80, 140), rot=20,  title='Sochi 2014')plt.ylabel('avg tweet length (chars)')plt.show()

实时流式计算整理了Druid入门指南
持续更新中~

更多实时数据分析相关博文与科技资讯,欢迎关注 "实时流式计算"

获取《Druid实时大数据分析》电子书,请在公号后台回复 "Druid"


No comments:

Post a Comment