跳到主要内容
版本: 最新版本-3.5

approx_top_k

返回 expr 中出现频率最高的 k 个 item 值及其近似计数。

此函数从 v3.0 版本开始支持。

语法

APPROX_TOP_K(<expr> [ , <k> [ , <counter_num> ] ] )

参数

  • expr: STRING、BOOLEAN、DATE、DATETIME 或数值类型的表达式。

  • k:可选的 INTEGER 字面值,大于 0。如果未指定 k,则默认为 5。最大值为 100000

  • counter_num:可选的 INTEGER 字面值,大于或等于 kcounter_num 越大,结果越准确。但是,这也会增加 CPU 和内存成本。

    • 最大值为 100000
    • 如果未指定 counter_num,则默认为 max(min(2 * k, 100), 100000)

返回值

结果作为 STRUCT 类型的 ARRAY 返回,其中每个 STRUCT 包含一个用于值的 item 字段(具有其原始输入类型)和一个 count 字段(BIGINT 类型),其中包含近似的出现次数。数组按 count 降序排序。

聚合函数返回表达式 expr 中出现频率最高的 k 个 item 值及其近似计数。每个计数的误差可能高达 2.0 * numRows / counter_num,其中 numRows 是总行数。较高的 counter_num 值以增加内存使用量为代价提供更好的准确性。项目数少于 counter_num 的表达式将产生精确的项目计数。结果包括 NULL 值作为结果中自己的项目。

示例

使用scores表中的数据作为示例。

-- Calculate the score distribution of each subject.
MySQL > SELECT subject, APPROX_TOP_K(score) AS top_k FROM scores GROUP BY subject;
+---------+--------------------------------------------------------------------------------------------------------------------+
| subject | top_k |
+---------+--------------------------------------------------------------------------------------------------------------------+
| physics | [{"item":99,"count":2},{"item":null,"count":1},{"item":100,"count":1},{"item":85,"count":1},{"item":60,"count":1}] |
| english | [{"item":null,"count":1},{"item":92,"count":1},{"item":98,"count":1},{"item":100,"count":1},{"item":85,"count":1}] |
| NULL | [{"item":90,"count":1}] |
| math | [{"item":80,"count":2},{"item":null,"count":1},{"item":92,"count":1},{"item":95,"count":1},{"item":70,"count":1}] |
+---------+--------------------------------------------------------------------------------------------------------------------+

-- Calculate the score distribution of the math subject.
MySQL > SELECT subject, APPROX_TOP_K(score) AS top_k FROM scores WHERE subject IN ('math') GROUP BY subject;
+---------+-------------------------------------------------------------------------------------------------------------------+
| subject | top_k |
+---------+-------------------------------------------------------------------------------------------------------------------+
| math | [{"item":80,"count":2},{"item":null,"count":1},{"item":95,"count":1},{"item":92,"count":1},{"item":70,"count":1}] |
+---------+-------------------------------------------------------------------------------------------------------------------+

关键词

APPROX_TOP_K