SQLintersection session – Parallel Query Execution, Part 1

I attended the SQL intersection conference in November 2014 in Las Vegas, Nevada.  I am including my notes from the sessions in this blog, primarily for my own reference later.  For anyone that comes across these notes, please take them with a grain of salt – I may very easily have misheard something or have written it down incorrectly.

Session Title:  Parallel Query Execution, Part 1
Instructor: Paul White

Within a parallel plan, each thread is effectively running a serial plan

  • Parallel seeks and scans
    • 2005 and earlier use parallel page provider
    • 2008+ use range enumerators
      • May see ACCESS_METHODS_DATASET_PARENT latch waits
    • Only forward scans and seeks are supported
      • Backward scans can only happen serially
    • Exchange types
      • Hash – Distributed based on hash function
      • Round Robin – Sent to next available consumer
      • Broadcast – Pushed to all consumers
      • Demand – Row at a time (rare)
      • Range – Mostly index build plans
    • Exchanges can be sort order preserving or not
    • Root node has property called ThreadStats
      • Will show number of threads reserved and number of threads used
      • May not use all reserved threads depending on timing of thread allocated in the query (can reuse threads that are already complete)
    • exec_context_id is the thread number
      • 0 = serial
    • Query to identify node id where CXPACKET wait may be happening
    • Parallel merge join requires order preservation
      • Does not scale well to high DOP
    • Parallel hash join
      • Generally scales well to high DOP
      • Bitmap reduction – can only appear in parallel plans, and can greatly reduce work to be done (i.e., early reduction)
    • Parallel nested loops join
      • Inner side runs serially
      • Optimizer is “biased” against them
      • TF 8649 – Force parallel plan
      • Various tricks or plan guides to force usage
        • Such as specifying row goals
        • Express queries as APPLY – As query is pushed toward parallel nested loop queries, can push toward smaller memory grants and more efficient use of resources
      • Parallel sorts
        • Observer memory fraction percentage
        • Tricks to avoid spills
          • Cast strings to larger strings (i.e., select cast(str as varchar(100))) to force larger memory grant
          • Split sort into smaller sorts (initial select into temp table, then cross apply to temp table)
        • Avoid too much parallelism – set the cost threshold for parallelism
        • Parallel deadlocks
          • Once deadlock is detected, contents of CXPACKET buffers spill to tempdb
          • Interesting crafted demo that causes spills only at even DOPs (but not with odd)!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.