Advanced Oracle Troubleshooting v.2.1

by Tanel Põder

This seminar is concentrated entirely on Oracle troubleshooting – understanding what exactly is Oracle doing right now or what was it doing when the problem occurred. You will gain the skill to systematically work out the reasons for crashes, hangs, bad performance or other misbehavior.

The seminar takes a holistic approach for end-to-end troubleshooting. It will explain the full lifecycle of a database request, from database client libraries and network to Oracle database kernel and underlying OS. For each layer a troubleshooting technique is provided, along with advice on using the right tool for the right problem at the right time.

The seminar days are full of intensive learning, reading dumps, stack traces and querying advanced V$/X$ views. You’ll also be using OS-level tools and custom scripts provided to you for real-time and post-mortem diagnosis.

The emphasis is put on practical troubleshooting, the safety comes first and most techniques are designed to require no change to database schemas or instance parameters.

 

Objectives:

  • Understand internals of core Oracle kernel components and their interaction with OS
  • Be proficient in systematic end-to-end troubleshooting, using the right tool for right problem
  • Have the ability to proactively eliminate potential database performance issues

 

Non-Objectives:

  • This seminar is not a beginner database tuning course
  • This seminar is not an SQL tuning or CBO course (and it is not a SQL developer class)

 

You will learn:

  • Systematic approach for advanced Oracle troubleshooting and performance tuning
  • Oracle, OS interaction and using OS-level tools for troubleshooting Oracle from "outside"
  • Using the full power of Oracle’s built in instrumentation for troubleshooting
  • Troubleshooting crashes, ORA-600’s and complex hangs
  • Troubleshooting latch and mutex contention
  • Troubleshooting enqueue lock contention and deadlocks
  • Troubleshooting Oracle private, shared and OS memory problems
  • Troubleshooting Oracle cursor management and library cache problems
  • Troubleshooting buffer cache, logical and physical I/O problems
  • Troubleshooting undo errors, excessive redo and transaction related problems

Course Topics

1. Systematic approach for Advanced Oracle Troubleshooting
    • How to pick the correct starting point for troubleshooting and how to drill down systematically
    • Right tool for the right problem: understanding the advantages and shortcomings of different performance troubleshooting tools
    • Getting the most out of Oracle's instrumentation
    • Oracle Wait Interface internals
    • Using wait events, dynamic performance counters and metrics for troubleshooting
    • Using Active Session History and Time Model metrics
    • Troubleshooting high CPU usage
2. Troubleshooting physical IO and buffer cache issues
    • Troubleshooting IO problems, at Oracle database and OS level
    • Troubleshooting excessive IO
    • Troubleshooting slow IO
    • Using IO wait histograms
    • Measuring OS-level and hardware IO performance
    • Understanding iostat output
    • Troubleshooting buffer busy waits
    • Troubleshooting DBWR and free buffer waits
    • Configuring the OS IO layer optimally for Oracle
3. Troubleshooting commit, redo, undo and transaction issues
    • Troubleshooting unexpected TX and TM enqueue waits
    • Troubleshooting long running transactions
    • Troubleshooting undo generation and ORA-1555 errors
    • Troubleshooting excessive redolog generation and "log buffer space" waits
    • Troubleshooting LGWR, "log file sync" waits and commit performance
4. Troubleshooting Oracle SGA/PGA/UGA and OS memory issues
    • PGA/UGA memory usage troubleshooting and monitoring
    • Finding which cursors and operations consume the most of the private memory
    • Troubleshooting ORA-4030 errors
    • Troubleshooting SGA_TARGET, MEMORY_TARGET and SGA resizing related problems
    • Troubleshooting OS memory shortage and excessive paging
5. Troubleshooting shared pool and library cache memory allocation issues
    • Shared pool and SGA memory allocation internals
    • Troubleshooting shared pool memory usage and excessive flushing
    • Understanding and troubleshooting ORA-4031 errors
    • Troubleshooting library cache cursor sharing issues
    • Troubleshooting cursor leaks
    • Optimal cursor management strategy for high performance applications
6. Troubleshooting enqueue lock waits and deadlocks
    • Understanding commonly used enqueue types and their meaning
    • Mapping enqueue wait events and the exact item waited for using lock identifiers
    • Reading enqueue deadlock / ORA-60 traces
    • Making Oracle dump more information about deadlocks
    • Tracing enqueue locking activity
    • We will look into multiple enqueue waits which can cause headache, such as:
      • enq: TX - contention
      • enq: TM - contention
      • enq: KO - fast object checkpoint
      • enq: RO - fast object reuse
      • enq: CF - contention
      • enq: HW - contention
      • enq: HV - contention
      • enq: TS - contention
      • enq: TT - contention
      • enq: TO - contention
      • enq: SQ - contention
7. Troubleshooting latch contention
    • Oracle latching implementation and internals
    • Troubleshooting the classic latch contention waits such as:
      • latch: shared pool
      • latch: library cache / latch: library cache pin
      • latch: cache buffers chains
      • latch: cache buffers lru chain
      • latch: object queue header operation
      • latch: row cache objects
      • latch: undo global data
    • Identifying the problematic child latch from generic "latch free" wait events
    • Advanced latch contention troubleshooting using V$LATCHHOLDER and X$KSUPRLAT
    • Identifying the session and exact reason causing the latch to be busy
    • Systematically identifying the hot block causing cache buffer chains latch contention
8. Troubleshooting Library cache, Mutex and "cursor: pin" contention
    • Library cache and cursor lookup internals
    • Troubleshooting library cache lock/pin contention
    • Library cache hash buckets and their protection by mutexes in Oracle 11g
    • Oracle mutex implementation and internals
    • Cursor pinning using mutexes
    • Identifying mutex holder from Oracle Wait interface data
    • Reducing mutex contention by better cursor management
    • Common mutex-related bugs and issues
    • Advanced mutex contention troubleshooting using V$MUTEX_SLEEP_HISTORY
9. Troubleshooting complex hangs and spins
    • Logging on to a hung instance when even SYSDBA can't log on
    • Getting hanganalyze dumps and understanding hanganalyze output
    • Using V$WAIT_CHAINS for identifying blockers in 11g
    • Understanding state objects and reading systemstate dumps
    • Finding why a process is hung doing nothing even when the wait interface says the session isn't waiting
    • Finding why a process is burning CPU when Oracle reports it's doing nothing
10. Troubleshooting crashes, bugs and ORA-600/ORA-7445 errors
    • Reading ORA-600/ORA-7445 dumps
    • Reading ERRORSTACK dumps
    • Getting Oracle to dump extra information on error messages
    • Reading Oracle kernel stack traces
    • Proactive OS configuration for gathering Oracle crash information
    • Reading crash reasons from OS corefiles
Upcoming Sessions
Duration
Online: 5 days
Classroom training typically runs from 9:00 AM until 4:00 PM daily. Online training typically runs from 12:00 PM ET until 4:00 PM ET daily.

Please note that actual times may vary.
Prerequisites
This seminar is for Advanced to Expert Senior DBAs and Performance Engineers.