Hlavní navigace

Názor k článku Korelované vnořené dotazy: proč nepoužívat a čím nahradit od Pavel Stěhule - Dobrý den, dobrá připomínka, na SELF JOIN jsem si...

  • Článek je starý, nové názory již nelze přidávat.
  • 15. 3. 2008 19:53

    Pavel Stěhule

    Dobrý den,

    dobrá připomínka, na SELF JOIN jsem si nevzpomněl - a máte pravdu, že vůči korelovanému dotazu je to stále vynikající řešení (a to bez toho, že bych se uchýlil k uloženým procedurám).

    --bez indexu
    postgres=# explain analyze select t1.id, t1.sale_date, t1.product, t1.sale_price, sum(t2.sale_price) from 
    history t1 inner join history t2 on t1.id >= t2.id and t1.product = t2.product
    group by t1.id, t1.sale_date, t1.product, t1.sale_price
    order by t1.id                                          
    ;
                                                                  QUERY PLAN                                                              
    --------------------------------------------------------------------------------------------------------------------------------------
     Sort  (cost=8241.68..8254.18 rows=5001 width=19) (actual time=2073.968..2081.068 rows=5001 loops=1)
       Sort Key: t1.id
       Sort Method:  quicksort  Memory: 480kB
       ->  HashAggregate  (cost=7871.91..7934.42 rows=5001 width=19) (actual time=2049.964..2060.161 rows=5001 loops=1)
             ->  Merge Join  (cost=824.54..6521.46 rows=108036 width=19) (actual time=83.099..1632.142 rows=171001 loops=1)
                   Merge Cond: ((t1.product)::text = (t2.product)::text)
                   Join Filter: (t1.id >= t2.id)
                   ->  Sort  (cost=412.27..424.77 rows=5001 width=15) (actual time=43.820..51.850 rows=5001 loops=1)
                         Sort Key: t1.product
                         Sort Method:  quicksort  Memory: 324kB
                         ->  Seq Scan on history t1  (cost=0.00..105.01 rows=5001 width=15) (actual time=0.124..10.570 rows=5001 loops=1)
                   ->  Sort  (cost=412.27..424.77 rows=5001 width=11) (actual time=39.241..605.987 rows=337001 loops=1)
                         Sort Key: t2.product
                         Sort Method:  quicksort  Memory: 324kB
                         ->  Seq Scan on history t2  (cost=0.00..105.01 rows=5001 width=11) (actual time=0.064..8.448 rows=5001 loops=1)
     Total runtime: 2087.921 ms
    (16 rows)
    
    postgres=# explain analyze SELECT sale_date, product, sale_price, 
                      COALESCE((SELECT SUM(sale_price) 
                                   FROM history 
                                  WHERE product = o.product 
                                    AND id <= o.id), 0) as total from history o; query plan ----------------------------------------------------------------------------------------------------------------------------------------- seq scan on o (cost=0.00..438222.42 rows=5001 width=15) (actual time=0.193..8727.369 loops=1) subplan ->  Aggregate  (cost=87.60..87.61 rows=1 width=4) (actual time=1.735..1.736 rows=1 loops=5001)
               ->  Index Scan using history_pkey on history  (cost=0.00..87.59 rows=1 width=4) (actual time=0.804..1.670 rows=34 loops=5001)
                     Index Cond: (id <= $1) filter: ((product)::text 
    total runtime: 8746.833 ms (7 rows) 
    postgres=# create index fxxx on history (product,id);
    CREATE INDEX
    Time: 59,664 ms
    postgres=# explain analyze select t1.id, t1.sale_date, t1.product, t1.sale_price, sum(t2.sale_price) from 
    history t1 inner join history t2 on t1.id >= t2.id and t1.product = t2.product
    group by t1.id, t1.sale_date, t1.product, t1.sale_price
    order by t1.id
    ;
                                                                   QUERY PLAN                                                               
    ----------------------------------------------------------------------------------------------------------------------------------------
     Sort  (cost=3541.92..3554.42 rows=5001 width=19) (actual time=1524.078..1531.054 rows=5001 loops=1)
       Sort Key: t1.id
       Sort Method:  quicksort  Memory: 480kB
       ->  HashAggregate  (cost=3172.15..3234.66 rows=5001 width=19) (actual time=1500.061..1511.894 rows=5001 loops=1)
             ->  Nested Loop  (cost=0.00..1821.70 rows=108036 width=19) (actual time=0.345..1072.113 rows=171001 loops=1)
                   ->  Seq Scan on history t1  (cost=0.00..105.01 rows=5001 width=15) (actual time=0.118..7.966 rows=5001 loops=1)
                   ->  Index Scan using fxxx on history t2  (cost=0.00..0.33 rows=1 width=11) (actual time=0.022..0.087 rows=34 loops=5001)
                         Index Cond: (((t2.product)::text = (t1.product)::text) AND (t1.id >= t2.id))
     Total runtime: 1537.842 ms
    (9 rows)
    
    Time: 1541,110 ms
    postgres=# explain analyze select * from report1();
                                                        QUERY PLAN                                                     
    -------------------------------------------------------------------------------------------------------------------
     Function Scan on report1  (cost=0.00..260.00 rows=1000 width=44) (actual time=227.831..234.816 rows=5001 loops=1)
     Total runtime: 241.232 ms
    (2 rows)
    
    Time: 241,836 ms
    
    

    U mne je SELF JOIN cca 4x rychlejší než varianta používající korelovaný poddotaz. Nicméně uložená procedura běžela 0.27 ms, tj. minimálně 8x rychleji (JOIN), cca 24x rychleji než korelovaný dotaz - což je dost na to, abych preferoval uloženou proceduru nebo na Oraclu analytický dotaz.

    díky za opravu a doplnění