Skip to content

Instantly share code, notes, and snippets.

@javisantana
Last active November 18, 2020 12:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save javisantana/461157c4dd00cc15cfc111feb27f0893 to your computer and use it in GitHub Desktop.
Save javisantana/461157c4dd00cc15cfc111feb27f0893 to your computer and use it in GitHub Desktop.

Looks like filters are pushed down when filtering an "UNION ALL". Also an example on how to use EXPLAIN in clickhouse and a different view of seeing what is going on with the traces, this lines show how much data clickhouse is reading:

Selected 1 parts by date, 1 parts by key, 2 marks by primary key, 2 marks to read from 1 ranges
Reading approx. 16384 rows with 1 streams

The example

➜  tests clickhouse client -q "select version()"
20.8.2.3


➜  tests cat union_all_push_down.sql 


drop table if exists a;
drop table if exists b;

create table a (
    a Date,
    b Int32
) Engine = MergeTree() order by a;

create table b (
    a Date,
    b Int32
) Engine = MergeTree() order by a;


insert into a select today() - toIntervalDay(number/1000) as a, number as b from numbers(1000000);
insert into b select today() - toIntervalDay(number/1000) as a, number as b from numbers(1000000);

explain plan
select count() from (
    select * from a 
    union all
    select * from b
)
where a > today() - interval 3 day;


➜  tests cat union_all_push_down.sql | clickhouse client --send_logs_level=trace -mn  2>&1 | grep SelectExecutor
[xyz] 2020.11.18 13:23:13.662368 [ 46219 ] {e3edaa54-3a22-4e06-8119-4e6a2b2912c3} <Debug> default.a (SelectExecutor): Key condition: (column 0 in [18582, +inf))
[xyz] 2020.11.18 13:23:13.662407 [ 46219 ] {e3edaa54-3a22-4e06-8119-4e6a2b2912c3} <Trace> default.a (SelectExecutor): Used optimized inclusion search over index for part all_1_1_0 with 16 steps
[xyz] 2020.11.18 13:23:13.662416 [ 46219 ] {e3edaa54-3a22-4e06-8119-4e6a2b2912c3} <Debug> default.a (SelectExecutor): Selected 1 parts by date, 1 parts by key, 2 marks by primary key, 2 marks to read from 1 ranges
[xyz] 2020.11.18 13:23:13.662453 [ 46219 ] {e3edaa54-3a22-4e06-8119-4e6a2b2912c3} <Trace> default.a (SelectExecutor): Reading approx. 16384 rows with 1 streams
[xyz] 2020.11.18 13:23:13.662567 [ 46219 ] {e3edaa54-3a22-4e06-8119-4e6a2b2912c3} <Debug> default.b (SelectExecutor): Key condition: (column 0 in [18582, +inf))
[xyz] 2020.11.18 13:23:13.662588 [ 46219 ] {e3edaa54-3a22-4e06-8119-4e6a2b2912c3} <Trace> default.b (SelectExecutor): Used optimized inclusion search over index for part all_1_1_0 with 16 steps
[xyz] 2020.11.18 13:23:13.662600 [ 46219 ] {e3edaa54-3a22-4e06-8119-4e6a2b2912c3} <Debug> default.b (SelectExecutor): Selected 1 parts by date, 1 parts by key, 2 marks by primary key, 2 marks to read from 1 ranges
[xyz] 2020.11.18 13:23:13.662618 [ 46219 ] {e3edaa54-3a22-4e06-8119-4e6a2b2912c3} <Trace> default.b (SelectExecutor): Reading approx. 16384 rows with 1 streams


➜  tests cat union_all_push_down.sql | clickhouse client -mn                                                    
Expression (Projection)
  Expression (Before ORDER BY and SELECT)
    Aggregating
      Expression (Before GROUP BY)
        Filter (WHERE)
          Union
            Expression (Projection)
              Expression (Before ORDER BY and SELECT)
                Filter (WHERE)
                  ReadFromStorage (Read from MergeTree)
            Expression (Projection)
              Expression (Before ORDER BY and SELECT)
                Filter (WHERE)
                  ReadFromStorage (Read from MergeTree)
                  
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment