์ง‘๊ณ„ ํ•จ์ˆ˜ยถ

์ง‘๊ณ„ ํ•จ์ˆ˜๋Š” ํ–‰ ์ „์ฒด์˜ ๊ฐ’์— ๋Œ€ํ•œ ์—ฐ์‚ฐ์„ ํ†ตํ•ด ํ•ฉ๊ณ„, ํ‰๊ท , ๊ณ„์‚ฐ, ์ตœ์†Œ๊ฐ’/์ตœ๋Œ€๊ฐ’, ํ‘œ์ค€ ํŽธ์ฐจ, ์ถ”์ •๊ณผ ๊ฐ™์€ ์ˆ˜ํ•™์  ๊ณ„์‚ฐ์€ ๋ฌผ๋ก ์ด๊ณ , ๋ช‡ ๊ฐ€์ง€ ๋น„์ˆ˜ํ•™์  ์—ฐ์‚ฐ๋„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

์ง‘๊ณ„ ํ•จ์ˆ˜๋Š” ์—ฌ๋Ÿฌ ํ–‰(์‹ค์ œ๋กœ๋Š” 0๊ฐœ, 1๊ฐœ ๋˜๋Š” ๊ทธ ์ด์ƒ์˜ ํ–‰)์„ ์ž…๋ ฅ๊ฐ’์œผ๋กœ ๋ฐ›์•„ ๋‹จ์ผ ์ถœ๋ ฅ๊ฐ’์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋Œ€๋กœ, ์Šค์นผ๋ผ ํ•จ์ˆ˜๋Š” ํ•˜๋‚˜์˜ ํ–‰์„ ์ž…๋ ฅ๊ฐ’์œผ๋กœ ๋ฐ›์•„ ํ•˜๋‚˜์˜ ํ–‰(ํ•˜๋‚˜์˜ ๊ฐ’)์„ ์ถœ๋ ฅ๊ฐ’์œผ๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

์ง‘๊ณ„ ํ•จ์ˆ˜๋Š” ์ž…๋ ฅ๊ฐ’์— 0๊ฐœ์˜ ํ–‰์ด ํฌํ•จ๋˜์–ด ์žˆ์„ ๋•Œ๋„ ํ•ญ์ƒ ์ •ํ™•ํžˆ ํ•˜๋‚˜์˜ ํ–‰์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ, ์ž…๋ ฅ๊ฐ’์— 0๊ฐœ์˜ ํ–‰์ด ํฌํ•จ๋˜๋Š” ๊ฒฝ์šฐ ์ถœ๋ ฅ๊ฐ’์€ NULL์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ง‘๊ณ„ ํ•จ์ˆ˜๋Š”๊ฐœ์˜ ํ–‰์„ ์ „๋‹ฌํ•  ๋•Œ 0, ๋นˆ ๋ฌธ์ž์—ด ๋˜๋Š” ๊ธฐํƒ€ ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ํ•ญ๋ชฉ์˜ ๋‚ด์šฉ:

ํ•จ์ˆ˜์˜ ๋ชฉ๋ก(ํ•˜์œ„ ์นดํ…Œ๊ณ ๋ฆฌ๋ณ„)ยถ

ํ•จ์ˆ˜ ์ด๋ฆ„

์ฐธ๊ณ 

์ผ๋ฐ˜ ์ง‘๊ณ„

๋‹ค๋ฅธ ์ง‘๊ณ„ ํ•จ์ˆ˜์™€ ๋‹ค๋ฅธ ๊ตฌ๋ฌธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋‹ค๋ฅธ ์ง‘๊ณ„ ํ•จ์ˆ˜์™€ ๋‹ค๋ฅธ ๊ตฌ๋ฌธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

STDDEV์™€ STDDEV_SAMP๋Š” ๋ณ„์นญ์ž…๋‹ˆ๋‹ค.

VAR_POP ์˜ ๋ณ„์นญ์ž…๋‹ˆ๋‹ค.

VAR_SAMP ์˜ ๋ณ„์นญ์ž…๋‹ˆ๋‹ค.

๋น„ํŠธ ์ง‘๊ณ„

๋ถ€์šธ ์ง‘๊ณ„

ํ•ด์‹œ

๋ฐ˜์ •ํ˜• ๋ฐ์ดํ„ฐ ์ง‘๊ณ„

์„ ํ˜• ํšŒ๊ท€

ํ†ต๊ณ„ ๋ฐ ํ™•๋ฅ 

๊ณ ์œ  ๊ฐ’ ๊ณ„์‚ฐ

์นด๋””๋„๋ฆฌํ‹ฐ ์ถ”์ • . (HyperLogLog ์‚ฌ์šฉ)

HLL ์˜ ๋ณ„์นญ์ž…๋‹ˆ๋‹ค.

์ง‘๊ณ„ ํ•จ์ˆ˜๊ฐ€ ์•„๋‹ˆ๋ฉฐ, DATASKETCHES_HLL_ACCUMULATE ๋˜๋Š” DATASKETCHES_HLL_COMBINE ์˜ ์Šค์นผ๋ผ ์ž…๋ ฅ๊ฐ’์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์ง‘๊ณ„ ํ•จ์ˆ˜๊ฐ€ ์•„๋‹ˆ๋ฉฐ, HLL_ACCUMULATE ๋˜๋Š” HLL_COMBINE ์˜ ์Šค์นผ๋ผ ์ž…๋ ฅ๊ฐ’์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์œ ์‚ฌ์„ฑ ์ถ”์ • . (MinHash ์‚ฌ์šฉ)

APPROXIMATE_SIMILARITY ์˜ ๋ณ„์นญ์ž…๋‹ˆ๋‹ค.

๋นˆ๋„ ์ถ”์ • . (๊ณต๊ฐ„ ์ ˆ์•ฝ ์‚ฌ์šฉ)

์ง‘๊ณ„ ํ•จ์ˆ˜๊ฐ€ ์•„๋‹ˆ๋ฉฐ, APPROX_TOP_K_ACCUMULATE ๋˜๋Š” APPROX_TOP_K_COMBINE ์˜ ์Šค์นผ๋ผ ์ž…๋ ฅ๊ฐ’์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋ฐฑ๋ถ„์œ„์ˆ˜ ์ถ”์ • . (t-Digest ์‚ฌ์šฉ)

์ง‘๊ณ„ ํ•จ์ˆ˜๊ฐ€ ์•„๋‹ˆ๋ฉฐ, APPROX_PERCENTILE_ACCUMULATE ๋˜๋Š” APPROX_PERCENTILE_COMBINE ์˜ ์Šค์นผ๋ผ ์ž…๋ ฅ๊ฐ’์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์ง‘๊ณ„ ์œ ํ‹ธ๋ฆฌํ‹ฐ

์ง‘๊ณ„ ํ•จ์ˆ˜๋Š” ์•„๋‹ˆ์ง€๋งŒ, GROUP BY ์ฟผ๋ฆฌ๋กœ ์ƒ์„ฑ๋œ ํ–‰์˜ ์ง‘๊ณ„ ์ˆ˜์ค€์„ ๊ฒฐ์ •ํ•˜๊ธฐ ์œ„ํ•ด ์ง‘๊ณ„ ํ•จ์ˆ˜์™€ ํ•จ๊ป˜ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

GROUPING ์˜ ๋ณ„์นญ์ž…๋‹ˆ๋‹ค.

์†Œ๊ฐœ ์˜ˆยถ

๋‹ค์Œ ์˜ˆ์—์„œ๋Š” ์ง‘๊ณ„ ํ•จ์ˆ˜(AVG)์™€ ์Šค์นผ๋ผ ํ•จ์ˆ˜(COS)์˜ ์ฐจ์ด์ ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์Šค์นผ๋ผ ํ•จ์ˆ˜๋Š” ๊ฐ ์ž…๋ ฅ ํ–‰์— ๋Œ€ํ•ด ํ•˜๋‚˜์˜ ์ถœ๋ ฅ ํ–‰์„ ๋ฐ˜ํ™˜ํ•˜๋Š” ๋ฐ˜๋ฉด, ์ง‘๊ณ„ ํ•จ์ˆ˜๋Š” ์—ฌ๋Ÿฌ ์ž…๋ ฅ ํ–‰์— ๋Œ€ํ•ด ํ•˜๋‚˜์˜ ์ถœ๋ ฅ ํ–‰์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ค๊ณ  ๊ฐ’์œผ๋กœ ์ฑ„์›๋‹ˆ๋‹ค.

CREATE TABLE simple (x INTEGER, y INTEGER);
INSERT INTO simple (x, y) VALUES
    (10, 20),
    (20, 44),
    (30, 70);
Copy

ํ…Œ์ด๋ธ”์„ ์ฟผ๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

SELECT x, y 
    FROM simple
    ORDER BY x,y;
Copy
+----+----+
|  X |  Y |
|----+----|
| 10 | 20 |
| 20 | 44 |
| 30 | 70 |
+----+----+

์Šค์นผ๋ผ ํ•จ์ˆ˜๋Š” ๊ฐ ์ž…๋ ฅ ํ–‰์— ๋Œ€ํ•ด ํ•˜๋‚˜์˜ ์ถœ๋ ฅ ํ–‰์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

SELECT COS(x)
    FROM simple
    ORDER BY x;
Copy
+---------------+
|        COS(X) |
|---------------|
| -0.8390715291 |
|  0.4080820618 |
|  0.1542514499 |
+---------------+

์ง‘๊ณ„ ํ•จ์ˆ˜๋Š” ์—ฌ๋Ÿฌ ์ž…๋ ฅ ํ–‰์— ๋Œ€ํ•ด ํ•˜๋‚˜์˜ ์ถœ๋ ฅ ํ–‰์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

SELECT SUM(x)
    FROM simple;
Copy
+--------+
| SUM(X) |
|--------|
|     60 |
+--------+

์ง‘๊ณ„ ํ•จ์ˆ˜ ๋ฐ NULL ๊ฐ’ยถ

์ผ๋ถ€ ์ง‘๊ณ„ ํ•จ์ˆ˜๋Š” NULL ๊ฐ’์„ ๋ฌด์‹œํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, AVG ๋Š” ๋‹ค์Œ ์ˆ˜์‹์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐ’ 1, 5, NULL ์˜ ํ‰๊ท ์ด 3 ์ธ ๊ฒƒ์œผ๋กœ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

(1 + 5) / 2 = 3

๋ถ„์ž์™€ ๋ถ„๋ชจ์—์„œ ๋ชจ๋‘ NULL์ด ์•„๋‹Œ ๊ฐ’ ๋‘ ๊ฐœ๋งŒ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์ง‘๊ณ„ ํ•จ์ˆ˜์— ์ „๋‹ฌ๋œ ๋ชจ๋“  ๊ฐ’์ด NULL์ธ ๊ฒฝ์šฐ ์ง‘๊ณ„ ํ•จ์ˆ˜๋Š” NULL์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

์ผ๋ถ€ ์ง‘๊ณ„ ํ•จ์ˆ˜์—๋Š” ๋‘˜ ์ด์ƒ์˜ ์—ด์ด ์ „๋‹ฌ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ:

SELECT COUNT(col1, col2) FROM table1;
Copy

์ด๋Ÿฐ ๊ฒฝ์šฐ์—๋Š” ๊ฐœ๋ณ„ ์—ด์ด NULL์ด๋ฉด ์ง‘๊ณ„ ํ•จ์ˆ˜๊ฐ€ ํ–‰์„ ๋ฌด์‹œํ•ฉ๋‹ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด, ๋‹ค์Œ ์ฟผ๋ฆฌ์—์„œ๋Š” ์„ ํƒํ•œ ์—ด์—์„œ ๋„ค ํ–‰ ์ค‘ ์„ธ ํ–‰์— ํ•˜๋‚˜ ์ด์ƒ์˜ NULL ๊ฐ’์ด ์žˆ์œผ๋ฏ€๋กœ COUNT ๋Š” 4 ๊ฐ€ ์•„๋‹ˆ๋ผ 1 ์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ค๊ณ  ๊ฐ’์œผ๋กœ ์ฑ„์›๋‹ˆ๋‹ค.

CREATE OR REPLACE TABLE test_null_aggregate_functions (x INT, y INT);
INSERT INTO test_null_aggregate_functions (x, y) VALUES
  (1, 2),         -- No NULLs.
  (3, NULL),      -- One but not all columns are NULL.
  (NULL, 6),      -- One but not all columns are NULL.
  (NULL, NULL);   -- All columns are NULL.
Copy

ํ…Œ์ด๋ธ”์„ ์ฟผ๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

SELECT COUNT(x, y) FROM test_null_aggregate_functions;
Copy
+-------------+
| COUNT(X, Y) |
|-------------|
|           1 |
+-------------+

๋‘ ๊ฐœ ์ด์ƒ์˜ ์—ด์„ ์ฐธ์กฐํ•˜๋Š” ์‹์œผ๋กœ SUM ์„ ํ˜ธ์ถœํ•˜๊ณ  ํ•ด๋‹น ์—ด ์ค‘ ํ•˜๋‚˜ ์ด์ƒ์ด NULL์ธ ๊ฒฝ์šฐ ์ด ์‹์€ NULL๋กœ ๊ณ„์‚ฐ๋˜๊ณ  ํ–‰์€ ๋ฌด์‹œ๋ฉ๋‹ˆ๋‹ค.

SELECT SUM(x + y) FROM test_null_aggregate_functions;
Copy
+------------+
| SUM(X + Y) |
|------------|
|          3 |
+------------+

๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ด ๋™์ž‘์€ ์ผ๋ถ€ ์—ด์ด NULL์ผ ๋•Œ ํ–‰์„ ์‚ญ์ œํ•˜์ง€ ์•Š๋Š” GROUP BY ์˜ ๋™์ž‘๊ณผ ๋‹ค๋ฆ…๋‹ˆ๋‹ค.

SELECT x AS X_COL, y AS Y_COL 
  FROM test_null_aggregate_functions 
  GROUP BY x, y;
Copy
+-------+-------+
| X_COL | Y_COL |
|-------+-------|
|     1 |     2 |
|     3 |  NULL |
|  NULL |     6 |
|  NULL |  NULL |
+-------+-------+