SQLintersection session – Squeezing the Most out of Row and Page Compression

I attended the SQL intersection conference in November 2014 in Las Vegas, Nevada.  I am including my notes from the sessions in this blog, primarily for my own reference later.  For anyone that comes across these notes, please take them with a grain of salt – I may very easily have misheard something or have written it down incorrectly.

Session Title:  Squeezing the Most out of Row and Page Compression
Instructor: Andrew J. Kelly

A highly underutilized feature in SQL Server

    • Row compression
      • Fixed length columns stored as variable
      • Does not compress BLOB/LOB (i.e., LOB pages)
      • Storage format is a superset of vardecimal
      • Overhead for more metadata (i.e., variable lengths), but optimized for this
      • Recommended for high update systems
      • Overall better compression than page compression
    • Page compression
      • Does row compression first
      • Once page is full, further compression based on column and page dictionary
      • Also no BLOB/LOG (LOB pages)
      • Recommended for high read workloads
    • Can be enabled at table, index (clustered and/or nonclustered), partition level
      • Partition can be some or all
    • sp_estimate_data_compression_savings
      • Copies sample into tempdb and compresses it; extrapolate to entire table
    • Can be enabled or disabled online (essentially does an online rebuild)
    • Nulls and zero stored as four bits
    • No trailing spaces on char columns
    • Column prefix
      • Determine common prefix on columns
      • Even on binary and integer data
      • Stores prefixes, then uses pointers to prefixes
    • Page dictionary
      • Find repeating columns
      • Store dictionary, use pointer to dictionary entries
    • If bulk-inserting or inserting into a heap with page compression, will only get row compression
      • Unless TABLOCK is specified
    • May increase size of compressed backups
    • Don’t compress small objects without a good reason

SQLintersection session – A Tour through C# and Visual Basic Language Changes

I attended the SQL intersection conference in November 2014 in Las Vegas, Nevada.  I am including my notes from the sessions in this blog, primarily for my own reference later.  For anyone that comes across these notes, please take them with a grain of salt – I may very easily have misheard something or have written it down incorrectly.

This is one of two development session I attended.

Session Title:  A Tour through C# and Visual Basic Language Changes
Instructor: Kathleen Dollard

Visual Studio v14 is coded in native language (C#, VB)

    • Open source, transparent, etc.
    • Language extensibilities through analyzers
    • Language features have cleaner semantics
      • Some of these are in flux and may not happen
      • public int Id {get;} [=1;]
        • Can only init during construction; or, have initializer in code
      • Remove class name from static method calls
      • Initializers for dictionaries
        • new Dictionary<string, Customer> = { [“A”] = new Cust(…), [“B”] = new Cust(…) }
      • await in catch/finally blocks
      • Conditional catch (exception filters)
        • catch (Exception ex) if (ex.Code == 4) { … }
      • Expression body members
        • public string FullName => FirstName + “ “ + LastName;
      • Null propagation (aka null conditionals; sometimes call Elvis operator!)
        • x ?. x.Y
        • Evaluates to null if x is null, else evaluates to x.Y
        • Example
          • If (Event != null) { Event.Invoke (…); }
          • Now becomes: Event?.Event.Invoke (…)
        • array ?[index]
          • Checks if index has value (?)
        • String interpolation
          • Format(“{0} {1} {2}”, x, y, z)
          • Can now be written as $”\{x} \{y} \{z}”  (maybe will require \ prefix instead of $; still in limbo)
        • nameof
          • void method(int abc) { if (abc < 0} throw new ArgumentOutOfRangeException(nameof(abc)); }
        • Visual Studio features
          • Breakpoint peek
            • Debug breakpoints (conditions, actions)
            • Tip: set hit count = 999999, will show actual hit count on exception
            • Tip: if breakpoints don’t hit, close VS, delete sou files and restart VS
          • Performance tips
            • Stepping through code will show elapsed time for running
          • Name refactoring
            • Ctrl RR while on an identifier will change all references
          • Other refactoring
            • Select code block, extract block
          • Touch to zoom or scroll
          • Improved coloration (i.e., hover tips on identifier)
          • Word part intelligence (ArgExp will find ArgumentException)
          • Proper case menus

Peek in XAML

SQLintersection session – Anatomy of a Data Modification: Locking and Isolation

I attended the SQL intersection conference in November 2014 in Las Vegas, Nevada.  I am including my notes from the sessions in this blog, primarily for my own reference later.  For anyone that comes across these notes, please take them with a grain of salt – I may very easily have misheard something or have written it down incorrectly.

Session Title:  Anatomy of a Data Modification: Locking and Isolation
Instructor: Kimberly Tripp

  • Versioning is a powerful and underused mode that can solve many problems
  • SQL uses locks on resources as indicators (i.e., intent shared, intent exclusive)
  • Use SSD for most random IO drives; for server consolidation scenarios, this may be a common log drive
  • SQL Server can be in 1 of 4 states at any given time
    • Locking only
    • Locking + versioning
      • RCSI (statement-level read consistency)
      • Snapshot isolation (transaction-level read consistency)
      • RCSI + snapshot isolation
    • Lock compatibility
      • X – not compatible with IX, IS, S
      • IX – not compatible with S; is compatible with IS
      • IS – compatible with S
    • dm_exec_sessions: isolation level + session setting
      • 1 = Read uncommitted
      • 2 = Read committed (no distinction between locking vs. versioning)
      • 3 = Repeatable reads
      • 4 = Serializable
      • 5 = Snapshot
    • Whitepapers

SQLintersection session – Performance Troubleshooting Using Waits and Latches

I attended the SQL intersection conference in November 2014 in Las Vegas, Nevada.  I am including my notes from the sessions in this blog, primarily for my own reference later.  For anyone that comes across these notes, please take them with a grain of salt – I may very easily have misheard something or have written it down incorrectly.

Session Title:  Performance Troubleshooting Using Waits and Latches
Instructor: Paul Randal

 

  • Keep historical snapshots of wait stats
  • See 2014 whitepaper on using wait stats
  • dm_os_schedulers; interesting columns
    • visible_online, visible_online (dac), hidden_online
    • context_switch: number of context switches
    • loads
    • quantum_length_us: length of the scheduler’s quantum in microseconds (time until the thread must yield)
  • Waiter list is completely unordered. Each thread will be notified when its resource is available, and can be notified in any order.  It is then moved to the runnable list
  • No limit to how many threads are on the waiter list
  • No limit to how long a thread can be on the list
  • When a thread’s quantum is exhausted, it is automatically moved to the runnable list
    • “Check for abort or yield” function called by thread – SQLOS will notify if quantum is exhausted or if the user has cancelled
    • NON_YIELDING_THREAD error is always a bug. Must bounce SQL to kill it
    • Must yield even if no other threads are on the scheduler
  • Threads on the waiter list do not periodically wake up to see if resource is available
  • When a thread owning a lock releases the lock and another thread is waiting on the resource, it transfers ownership to the second thread and moves it off the waiter list. That is, the thread is “signaled.”
  • Time that a thread sits on the runnable queue is called signal wait time. This is usually very small and generally not of interest.
  • Runnable queue is ordered as first-in, first-out
    • Special case: Resource governor allows for threads to be given priority. However, this is hard to set up and nobody uses it
  • dm_os_schedulers.runnable_tasks_count. If this amount is large (perhaps >= 10 per scheduler) this indicates CPU pressure
  • Waits happen. It is simply a part of SQL Server.
  • Wait time = resource wait time + signal wait time
  • dm_os_waiting_tasks – What is happening right now?
  • dm_os_wait_stats – What has happened
    • Aggregate view of wait stats since they were last cleared
    • Capture snapshots over time. Every 15-30 minutes should be sufficient
    • Figure out how long to retain
  • Coming by about the end of the year: sqlskills.com will have a repository to wait types! (and latch types)
  • Capture waits via extended events: sqlos.wait_info event
  • A couple of demos on collecting XEvent, including SQL Server call stacks!
  • Latch whitepaper [.pdf]
  • Superlatches – efficiency mechanism on NUMA nodes
    • Can produce benign error message in log: “Failure to allocate super-latch promotion threshold” – nothing to do about it
  • dm_os_latch_stats: Ignore unless wait stats show latch waits
  • Clearing wait/latch stats: DBCC SQLPERF
  • Spinlock – synch mechanism that is lighter than a latch
    • Uses CPU-level instruction to get a bit
    • Tries a number of times (500 to 1000); if still cannot acquire will backoff and sleep on the CPU, then try again
    • Does not yield thread, so runnable threads have to wait –high signal wait times will be the result
    • In practice, this does not result in a SOS_NONYIELDING thread scenario (which takes about 60 seconds to come up)
    • Rare to see, but can happen
    • Diagnosing requires lots of internals knowledge; see this whitepaper [.pdf]
  • Common wait types
    • CXPACKET – parallelization
    • OLEDB – Linked servers or DMVs
    • ASYNC_NETWORK_IO – TDS packets sent to client; awaiting ack.
  • Common latch types
    • ACCESS_METHODS_DATASET_PARENT – multithreading where worker thread is getting next set of work
    • ACCESS_METHODS_SCAN_RANGE_GENERATION
    • ACCESS_METHODS_HOBT_COUNT
  • Wait types
    • PAGEIOLATCH_XX – Waiting for data to be read into memory
      • XX = SH – Page to be read
      • XX = EX – Page to be written
    • Determining which table a page belong to
  • dm_os_buffer_descriptors – one row per page in the buffer pool
  • dm_os_virtual_file_stats
    • For each DB file – IOs, etc.
    • Snapshot this table from time to time
  • PAGE_LATCH_XX
    • Waiting for access to memory page
    • Often indicates tempdb contention
    • TF 1118 – Removes SGAM contention
      • Also, TF 3226 – Don’t write backup success messages to the log
      • TF 4199 – changes query opt., not necessarily recommended for all systems
    • For PFS contention, add move tempdb files
      • If logical cores < 8, tempdb files = number of logical cores
      • Else use 8 tempdb files; if contention in tempdb not alleviated, add 4 more at a time until problem is alleviated
    • May also indicate excessive page splits or an insert hotspot
  • LCK_M_XX wait
    • Analyzing – see this Michael Swart blog post
    • Especially look for LCK_M_RS_XX; indicates serializable isolation level
    • On SharePoint systems consider using snapshot isolation
  • Row locks escalate to table locks
  • Page locks escalate to table locks
  • Row locks never escalate to page locks
  • WRITELOG wait
    • LOGBUFFER wait – copying log block to log cache
    • Once log block is written, waits for SPID 1 (the log writer) to asynchronously flush the log to disk; when done, signals the original thread
    • Limit on how many log writes per database can be enqueued at once
      • Pre-2012: 32
      • 2012+: 112
    • Can be symptomatic of page splits or many small txns
    • Delayed durability
      • Only enable is SLA has data loss permissible
      • Can significantly improve throughput on small transactions
    • CXPACKET
      • Completely normal on thread 0 (execution_context_id = 0) (controller thread)
      • Only of concern if wait is high on a thread other than 0
      • See Bob Door’s blog post on how to track foreign memory access (foreign pages in performance monitor)
      • Consider increasing cost threshold for parallelism
    • SOS_SCHEDULER_YIELD
      • Have to use XEvents to view call stacks to really identify source!
    • LATCH_XX
      • Latch class largely controls understanding of root cause; see this resource page
    • FGCB_ADD_REMOVE latches
      • Often auto-growth related
    • DBCC_XX latches
      • Related to CHECKDB
      • Nothing to do about these
      • Note that persisted computed columns can significantly increase latching (DBCC_OBJECT_METADATA)
    • PREEMPTION_OS_XX waits
      • Call-out to Windows
      • XX = name of Win API method; Google for “MSDN XX”
    • ASYNC_NETWORK_IO waits
      • Client is likely doing RBAR on the resultset rather than caching the results client-side

Many, many other wait types

SQLintersection session – Mastering SQL Server Execution Plan Analysis

I attended the SQL intersection conference in November 2014 in Las Vegas, Nevada.  I am including my notes from the sessions in this blog, primarily for my own reference later.  For anyone that comes across these notes, please take them with a grain of salt – I may very easily have misheard something or have written it down incorrectly.

Session Title:  Mastering SQL Server Execution Plan Analysis
Instructor: Paul White

 

  • SQL technically has “bags” not “sets”
  • Binding order: from, on, join, where, group by, cube/rollup, having, select, distinct, order by, top
    • Can refer to elements defined in previous items
  • “Anybody using Hekaton? No, me either.  Don’t see any need for it.”
  • Query optimization pipeline – multiple steps
  • SQL Server not strongly typed; implicit type conversions can have serious consequences for performance
  • Constant folding: Evaluate deterministic expressions, such as 1 + 1
  • OPTION RECOMPILE vs WITH RECOMPILE
    • Option recompile is very powerful tool, often worth paying the price to recompile for each value of a parameter
    • Does not work if assigning to variable
    • Option recompile enables parameter embedding
    • Can provide more aggressive (better) query plans
    • Substitutes parameters, does constant folding
    • Use with caution!
  • TF 3604 – Output to client
  • TF 8605
    • Outputs tree in message tab
    • Particularly useful to view when plan does not optimize as expected
  • SQL Server does not treat CTEs as a “fence” – i.e., not evaluated separately; rather, rolled into the plan (treated as internal view)
    • Even causes view read counts to increase
  • Simplification
    • Eliminate unused columns
    • Rewrite logical subqueries as inner/outer join/apply
    • Full and outer joins to inner joins (if, i.e., WHERE clause does not allow nulls)
    • Remove redundancies and empty expressions
    • Primarily targets auto-generated SQL
    • WITH QUERYRULEOFF XX
    • Can combine query with things like check constraints to further restrict/simplify query
    • Computed columns (persisted) can be simplified, but only if expression is in the same order!
    • Joins that check for existence but FK exists
      • See MVP Deep Dives Volume 1 for a method of querying against a generic view (Rob Farley) (or here)
      • Unless FK is multicolumn
    • Cardinality estimation
      • Uses histograms, density, trie trees (short strings)
      • Uses derived stats
      • Heavily influences initial join order
      • When writing expressions, always ask how the CE will come up with an estimate
    • New 2014 cardinality estimator
      • The compatibility level of the context DB controls which CE is used
      • TF 2312 = new, TF 9481 = old
      • Lots of square roots (exponential backoff)
        • Example – join 2 tables
          • t0 = most selective
          • t1 = less selective
          • old: t0 * t1
          • new: t0 * sqrt(t1)
        • “If you want an exciting support life, use linked servers and throw in lots of new features”
        • Trivial Plan Queries
          • Queries without joins only
          • Property on root node of query plan: OptimizationLevel = Trivial vs Full
          • Subqueries prevent trival plans, so can use SELECT (SELECT col) FROM tbl to force full optimization
          • TF 8757 also prevents trivial
        • Cost-based optimization
          • Cascades framework (SearchSpace) – readily available
          • TF 8606 – show tree at different stages
          • TF 8612 – Show cardinality estimate (2012+)
        • Memo groups
          • Logical properties per group are the same (column lists)
          • Physical properties vary – different ways explored for doing the same thing
            • Sort order, cardinality est., parallelism, execution mode, Halloween protection
          • Visualization: tree with single node, then add variations to grow tree on out
          • Based on applying rules (395 of them in SQL2014)
            • Most rules are not used most of the time
            • Small set of rules most of the time
          • Works very well in vast majority of cases
          • Query hints work by disabling rules in the optimizer
            • FORCE ORDER query hint disables many rules – has a big impact on plan
            • Other hints imply FORCE ORDER, such as inner LOOP join
            • OPTION (QUERYRULEOFF)
            • DBCC SHOWONRULES – List of rules currently enabled. Examples:
              • JNToNL = logical join to nested loop
              • LOJNToNL = left outer logical join to nested loop
              • FOJNnoneqToSM = full outer join, non-equality, to sort merge
            • DBCC SHOWOFFRULES
            • dm_exec_query_transformation_stats – shows how frequently rules are used
            • See Paul’s query Q07 – which rules for test query
            • TF 8619 (2012+) – also shows applied rules
            • TF 8621 (2012+) – more detailed
          • Optimization phases
            • Transaction processing (Search 0)
              • At least 3 tables
              • Mostly OLTP queries, small number of rows with nested loops
            • Quick plan (Search 1)
              • Subset of rules
              • If cost exceed threshold for parallelism, run second time with parallel enabled
            • Full optimization (Search 2)
              • At least 5 tables
              • If Search1 resulted in serial, only serial plans searched; same for parallel
              • All rules available
            • Costing
              • Always assumes cold cache
              • Doesn’t consider specific hardware, mostly (but does account for CPU cores and memory)
              • Many assumptions that may or may not be true
              • Costing numbers originally mean an estimate of number of seconds on “Nick’s” machines (dev at MS)
              • Now completely meaningless for most purposes
                • Not really valid to compare two queries
                • Not even valid within a query; i.e., if it says sort of 60% of query, doesn’t make it so
                • Useful only as a sign post
              • Plan guides don’t necessarily reduce compilation time
                • Still runs Search 2
                • Guide used to shape search
              • TF 8608 – Shows initial memo groups
              • TF 8675 – Shows search phase info
              • TF 8606 – Shows tree info
              • TF 8615 – Shows final memos
              • TF 8607 – Output tree from optimizer
              • TF 7352 – Copy out tree (similar to plan)
              • Query execution engine
                • The “true” execution plan is called the MXC (“execution context”)
                  • Far more detailed that the user execution plan
                  • Many internal details are hidden
                • The building blocks of the execution plan are “iterators”
                  • Have a generic interface
                • fn_PhysLocFormatter(%%physloc%%)
                  • Returns file, page, slot for row
                • TF 8666 – Adds some debugging data to plan (see the F4 properties, InternalDebuggingInfo)
                • Interesting query with 9 different seeks:
                  • LastName in (‘aaa’, ‘bbb’, ‘ccc’) AND FirstName in (‘ddd’, ‘eee’, ‘fff’)
                  • Plan gives only one index seek, but:
                    • Properties show 9 different seek predicates
                  • Physically does 9 seeks
                  • Limited to 64 seek operations
                  • Beware that seeks can be hidden in this way
                • Filter iterator – predicates than can’t be used in seeks
                  • May have start-up expression: can control whether or not entire subtree is executed at all
                • Residual – hidden filter
                  • Filter predicate is pushed to the child iterator
                  • Because it is hidden, is easy to overlook
                  • “In a lot of ways, this is the major takeaway of the day”
                • TF 9130 – Reveal pushed predicates (doesn’t work for join predicates)
                • Also: If predicate is too large to fit on the tooltip, SSMS will simply leave it off! Have to look at properties window.
                • Nested loops
                  • Be sure to look at “outer references” property
                  • SSMS shows estimated execution stats based on one loop; actual exec stats on aggregated loops, so they generally don’t correlate
                  • If having performance issues with nested loop joins, be sure to check for residual predicates. May be able to rewrite to push predicate further down the tree
                • TF 2324 – Disables implications of predicates
                • Row-mode parallel plans
                • TF 8649 – Force parallel plan
                • If a table has a computed column, and the table is referenced in the query (even if the computed column is not referenced), will not get a parallel query (computed column is considered a scalar function)
                • Batch mode parallel execution – Rather than operating one row at a time, operates on between 64 and 900 at once
                  • Queue up batches; and thread can process next batch
                • Merge Joins – Check if many-to-many property is true; usually indicates a query error
                • See Paul’s blog on residual predicates on hash joins (make sure types match)
                • Bitmap – either simple or Bloom filter bitmap (for parallel plans only)
                • TOP (n) PERCENT must read the entire set and will include an eager spool
                • Sorts have a memory fraction property
                • Spool is a place in tempdb to cache rows in case they may be needed later
                  • Eager spool often indicates a missing index
                • Merge join operator can be performing a UNION ALL rather than a standard join!
                • SSMS update operator can actually be updating multiple indexes; view properties to see for sure
                • Wide vs. Narrow update plans
                  • Wide: Each index updated by separate operator
                  • Narrow: Multiple indexes updated by one operator
                • Split, Sort, Collapse operators in a row generally indicates that a unique index is being updated
                • TF 4138 (documented, supported) – Disable row goal behavior
                  • TOP adds row goal behavior
                • Queries that execute LIKE against a variable can have a Constant Expression that computes at run-time the range begin/end values
                • GetRangeWithMismatchedTypes (only in XML) indicates a type mismatch that must be converted to a range (i.e., datetime to date)
                • One way to handle catch-all search queries (rather than dynamic SQL or option recompile) is to use a columnstore index