SQLintersection session – Mastering SQL Server Execution Plan Analysis

I attended the SQL intersection conference in November 2014 in Las Vegas, Nevada.  I am including my notes from the sessions in this blog, primarily for my own reference later.  For anyone that comes across these notes, please take them with a grain of salt – I may very easily have misheard something or have written it down incorrectly.

Session Title:  Mastering SQL Server Execution Plan Analysis
Instructor: Paul White

 

  • SQL technically has “bags” not “sets”
  • Binding order: from, on, join, where, group by, cube/rollup, having, select, distinct, order by, top
    • Can refer to elements defined in previous items
  • “Anybody using Hekaton? No, me either.  Don’t see any need for it.”
  • Query optimization pipeline – multiple steps
  • SQL Server not strongly typed; implicit type conversions can have serious consequences for performance
  • Constant folding: Evaluate deterministic expressions, such as 1 + 1
  • OPTION RECOMPILE vs WITH RECOMPILE
    • Option recompile is very powerful tool, often worth paying the price to recompile for each value of a parameter
    • Does not work if assigning to variable
    • Option recompile enables parameter embedding
    • Can provide more aggressive (better) query plans
    • Substitutes parameters, does constant folding
    • Use with caution!
  • TF 3604 – Output to client
  • TF 8605
    • Outputs tree in message tab
    • Particularly useful to view when plan does not optimize as expected
  • SQL Server does not treat CTEs as a “fence” – i.e., not evaluated separately; rather, rolled into the plan (treated as internal view)
    • Even causes view read counts to increase
  • Simplification
    • Eliminate unused columns
    • Rewrite logical subqueries as inner/outer join/apply
    • Full and outer joins to inner joins (if, i.e., WHERE clause does not allow nulls)
    • Remove redundancies and empty expressions
    • Primarily targets auto-generated SQL
    • WITH QUERYRULEOFF XX
    • Can combine query with things like check constraints to further restrict/simplify query
    • Computed columns (persisted) can be simplified, but only if expression is in the same order!
    • Joins that check for existence but FK exists
      • See MVP Deep Dives Volume 1 for a method of querying against a generic view (Rob Farley) (or here)
      • Unless FK is multicolumn
    • Cardinality estimation
      • Uses histograms, density, trie trees (short strings)
      • Uses derived stats
      • Heavily influences initial join order
      • When writing expressions, always ask how the CE will come up with an estimate
    • New 2014 cardinality estimator
      • The compatibility level of the context DB controls which CE is used
      • TF 2312 = new, TF 9481 = old
      • Lots of square roots (exponential backoff)
        • Example – join 2 tables
          • t0 = most selective
          • t1 = less selective
          • old: t0 * t1
          • new: t0 * sqrt(t1)
        • “If you want an exciting support life, use linked servers and throw in lots of new features”
        • Trivial Plan Queries
          • Queries without joins only
          • Property on root node of query plan: OptimizationLevel = Trivial vs Full
          • Subqueries prevent trival plans, so can use SELECT (SELECT col) FROM tbl to force full optimization
          • TF 8757 also prevents trivial
        • Cost-based optimization
          • Cascades framework (SearchSpace) – readily available
          • TF 8606 – show tree at different stages
          • TF 8612 – Show cardinality estimate (2012+)
        • Memo groups
          • Logical properties per group are the same (column lists)
          • Physical properties vary – different ways explored for doing the same thing
            • Sort order, cardinality est., parallelism, execution mode, Halloween protection
          • Visualization: tree with single node, then add variations to grow tree on out
          • Based on applying rules (395 of them in SQL2014)
            • Most rules are not used most of the time
            • Small set of rules most of the time
          • Works very well in vast majority of cases
          • Query hints work by disabling rules in the optimizer
            • FORCE ORDER query hint disables many rules – has a big impact on plan
            • Other hints imply FORCE ORDER, such as inner LOOP join
            • OPTION (QUERYRULEOFF)
            • DBCC SHOWONRULES – List of rules currently enabled. Examples:
              • JNToNL = logical join to nested loop
              • LOJNToNL = left outer logical join to nested loop
              • FOJNnoneqToSM = full outer join, non-equality, to sort merge
            • DBCC SHOWOFFRULES
            • dm_exec_query_transformation_stats – shows how frequently rules are used
            • See Paul’s query Q07 – which rules for test query
            • TF 8619 (2012+) – also shows applied rules
            • TF 8621 (2012+) – more detailed
          • Optimization phases
            • Transaction processing (Search 0)
              • At least 3 tables
              • Mostly OLTP queries, small number of rows with nested loops
            • Quick plan (Search 1)
              • Subset of rules
              • If cost exceed threshold for parallelism, run second time with parallel enabled
            • Full optimization (Search 2)
              • At least 5 tables
              • If Search1 resulted in serial, only serial plans searched; same for parallel
              • All rules available
            • Costing
              • Always assumes cold cache
              • Doesn’t consider specific hardware, mostly (but does account for CPU cores and memory)
              • Many assumptions that may or may not be true
              • Costing numbers originally mean an estimate of number of seconds on “Nick’s” machines (dev at MS)
              • Now completely meaningless for most purposes
                • Not really valid to compare two queries
                • Not even valid within a query; i.e., if it says sort of 60% of query, doesn’t make it so
                • Useful only as a sign post
              • Plan guides don’t necessarily reduce compilation time
                • Still runs Search 2
                • Guide used to shape search
              • TF 8608 – Shows initial memo groups
              • TF 8675 – Shows search phase info
              • TF 8606 – Shows tree info
              • TF 8615 – Shows final memos
              • TF 8607 – Output tree from optimizer
              • TF 7352 – Copy out tree (similar to plan)
              • Query execution engine
                • The “true” execution plan is called the MXC (“execution context”)
                  • Far more detailed that the user execution plan
                  • Many internal details are hidden
                • The building blocks of the execution plan are “iterators”
                  • Have a generic interface
                • fn_PhysLocFormatter(%%physloc%%)
                  • Returns file, page, slot for row
                • TF 8666 – Adds some debugging data to plan (see the F4 properties, InternalDebuggingInfo)
                • Interesting query with 9 different seeks:
                  • LastName in (‘aaa’, ‘bbb’, ‘ccc’) AND FirstName in (‘ddd’, ‘eee’, ‘fff’)
                  • Plan gives only one index seek, but:
                    • Properties show 9 different seek predicates
                  • Physically does 9 seeks
                  • Limited to 64 seek operations
                  • Beware that seeks can be hidden in this way
                • Filter iterator – predicates than can’t be used in seeks
                  • May have start-up expression: can control whether or not entire subtree is executed at all
                • Residual – hidden filter
                  • Filter predicate is pushed to the child iterator
                  • Because it is hidden, is easy to overlook
                  • “In a lot of ways, this is the major takeaway of the day”
                • TF 9130 – Reveal pushed predicates (doesn’t work for join predicates)
                • Also: If predicate is too large to fit on the tooltip, SSMS will simply leave it off! Have to look at properties window.
                • Nested loops
                  • Be sure to look at “outer references” property
                  • SSMS shows estimated execution stats based on one loop; actual exec stats on aggregated loops, so they generally don’t correlate
                  • If having performance issues with nested loop joins, be sure to check for residual predicates. May be able to rewrite to push predicate further down the tree
                • TF 2324 – Disables implications of predicates
                • Row-mode parallel plans
                • TF 8649 – Force parallel plan
                • If a table has a computed column, and the table is referenced in the query (even if the computed column is not referenced), will not get a parallel query (computed column is considered a scalar function)
                • Batch mode parallel execution – Rather than operating one row at a time, operates on between 64 and 900 at once
                  • Queue up batches; and thread can process next batch
                • Merge Joins – Check if many-to-many property is true; usually indicates a query error
                • See Paul’s blog on residual predicates on hash joins (make sure types match)
                • Bitmap – either simple or Bloom filter bitmap (for parallel plans only)
                • TOP (n) PERCENT must read the entire set and will include an eager spool
                • Sorts have a memory fraction property
                • Spool is a place in tempdb to cache rows in case they may be needed later
                  • Eager spool often indicates a missing index
                • Merge join operator can be performing a UNION ALL rather than a standard join!
                • SSMS update operator can actually be updating multiple indexes; view properties to see for sure
                • Wide vs. Narrow update plans
                  • Wide: Each index updated by separate operator
                  • Narrow: Multiple indexes updated by one operator
                • Split, Sort, Collapse operators in a row generally indicates that a unique index is being updated
                • TF 4138 (documented, supported) – Disable row goal behavior
                  • TOP adds row goal behavior
                • Queries that execute LIKE against a variable can have a Constant Expression that computes at run-time the range begin/end values
                • GetRangeWithMismatchedTypes (only in XML) indicates a type mismatch that must be converted to a range (i.e., datetime to date)
                • One way to handle catch-all search queries (rather than dynamic SQL or option recompile) is to use a columnstore index

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.