Home > Oracle Tips > Oracle Voices > Why Oracle hashing is so cool
Oracle Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

ORACLE VOICES

Why Oracle hashing is so cool


Craig A. Shallahamer
04.01.2005
Rating: -4.03- (out of 5)


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


Craig's Corner
Insights on Oracle technologies and trends by Craig Shallahamer
About the Author

Craig Shallahamer has 18-plus years experience in IT. As the president of OraPub, Inc., his objective is to empower Oracle performance managers by "doing" and teaching others to "do" whole system performance optimization (reactive and proactive) for Oracle-based systems.

In addition to course development and delivery, Craig is a consultant who was previously involved with developing a landmark performance management product, technically reviews Oracle books and articles, and keynotes at various Oracle conferences.

To view more of Craig's work, visit www.orapub.com.

Like politics, culture, and life's adventures, searching algorithms can be diverse and fascinating. But what's the best algorithm?

The Greeks embraced dissent and open criticism. Others embraced various authoritarian models. Similarly, when Oracle kernel architects needed a search algorithm, they had many different options to choose from. Search algorithms can use either an authoritarian approach or a more discussion-based approach. For example, when determining a SQL execution plan, the simplest approach is to pick a very authoritarian algorithm. The result was Oracle's rule-based optimizer. But as we will see, when complexity increases, an authoritarian model becomes very inefficient and can produce non-optimal results.

When the possible options are known and limited, and the environment stable, an authoritarian model can be very fast and reliable. One example is when Oracle needs to determine if a block is in the buffer cache or when it needs to know if a SQL statement is in the library cache. Oracle could use a discussion-based here but in this situation, the alternative outcomes and scope are limited and well-defined. The buffer is either in the cache or it is not in the cache. So Oracle chose an algorithm called hashing.

Hashing is so cool. It's lightening fast and requires only a small amount of memory as opposed to a large amount of disk space (think index) or CPU time (think cost based optimization). There is basically a simple calculation, a jump to a memory location, and then hopefully a short in-memory sequential search. When the sequential search is short, the result is an incredibly fast search -- much faster than using any kind of index or a cost-based approach.

Hashing consists of only a few components: a hash function, hash buckets, and hash chains. I'll take a minute to describe each of these.

Remember when you were a kid and you and your friends had your own special secret code. You know, like when an "a" would equal 1 and a "b" would equal 2, and so on. So a word like "ace" would be represented as "1 3 5". The first part of a hash function is to convert the input into a single number. In our example, the hashing function would probably sum the numbers, which would be 9.

The really cool thing about hashing functions is that regardless of what the summed number is, the "hashed" output is guaranteed to be within a given range. There are no exceptions. For example, we can define a hash function to only output numbers from 0 to 99. So if an input is 2, it may be "hashed" to 5. If the input is 243543, it may be hashed to 54.

Since the hash range is limited, numbers can hash to the same output number. For example, the number 2 and the number 98 may both hash to 5. That's not necessarily a bad thing. However, a really good hashing function will spread out the output evenly: given 1000 randomly picked inputs, 0 would be the output ten times, just like 50 would be output ten times. But wait -- it gets even better...

There is also a structure called a hash bucket. Each output possibility is related to a single hash bucket. If the output range is from 0 to 99, there will be 100 hash buckets. When the input value is hashed to 5, we say it has been "hashed to bucket five." Starting in Oracle 9i, the number of hash buckets is set to just over twice the number of buffers. Strange you say? Read on...

Associated with each bucket is a hash chain. Each node on the chain is associated with a cached Oracle block. If I'm checking to see if block 121 is cached, Oracle hashes the block to a specific hash bucket, and then the bucket's associated hash chain is sequentially searched looking for the specific buffer. (The algorithm is actually a little more complicated than this, but you should get the general idea.)

While searching memory is fast, sequentially searching a long hash chain takes a relatively long time and consumes some CPU time. To minimize the sequential search time, short hash chains are needed. This is why Oracle sets the number of buckets to just over twice the number of blocks cached. This means that the average hash chain will be half a buffer long. But why the big deal about sequentially searching memory? Because it not only affects your server process but perhaps many other server processes!

Have you ever heard of the cache buffer chain latch before? A single cache buffer chain latch "protects" one or more cache buffer chains. Before a server process can access a cache buffer chain, it must acquire the related cache buffer chain latch. So if the buffer chain you are sequentially searching is long, not only will you will hold the latch for a relatively long time, but other processes who need that latch will be waiting. And until you release the latch, they will either be repeatedly asking for the latch (called "spinning on the latch," which consumes CPU time) or sleeping (which posts a wait event).

If you are reading this article it is probably because your culture, to some extent, embraces open discussion and dissent just like the ancient Greeks. This approach works well in a complex and dynamic environment. That is why Oracle moved to the cost-based optimizer for choosing SQL execution plans. But for answering a simple and highly constrained question like, "Is the block in the cache?", a more authoritarian method is appropriate, such as hashing. It's an elegant solution to a question that gets asked billions and billions of times each day by Oracle server processes around the world.

Rate this Tip
To rate tips, you must be a member of SearchOracle.com.
Register now to start rating these tips. Log in if you are already a member.




Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


RELATED CONTENT
Oracle Voices
Essential performance forecasting, part 2: I/O
Essential performance forecasting, part 1
Performance tuning: The limits and limitlessness of time
Oracle myths debunked
Why it's OK to have more latches than CPUs
Performance war
Why every performance tool stinks
Triangulation: Hunting down performance bottlenecks

Oracle database performance problems and tuning
Oracle 11g data compression
Varchar or number for better performance?
Do statistics on SYS-owned objects hurt performance in 10g?
Inside the Oracle 11g SQL Performance Advisor, part 1
Inside the Oracle 11g SQL Performance Advisor, part 2
Difference between driving table and driver table in Oracle
Best design for E-Business Suite on hard drive
20GB data dictionary causing performance problems
Using the cost-based optimizer to improve Database 10g performance
Online tablespace reorganization in Oracle 9i

Oracle internals
LAST_OPER_TYPE column in v$sga_dynamic_components
Inside the Oracle 11g SQL Performance Advisor, part 1
Inside the Oracle 11g SQL Performance Advisor, part 2
How to read a STATSPACK report
Tuning the Oracle database with initialization parameters
Cannot create services in Oracle 9i
Move tables from system tablespace to users tablespace
Move Oracle home to another partition
Limit on CLOB datatype in Oracle 10g
Logging into Oracle utilities when database is down
Oracle internals Research

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
autonomous transaction  (SearchOracle.com)
delimiter  (SearchOracle.com)
extent  (SearchOracle.com)
flexfield  (SearchOracle.com)
responsibility  (SearchOracle.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary

DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.

HomeNewsTopicsTipsAsk the ExpertsWebcastsWhite PapersProductsBlogs
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




All Rights Reserved, Copyright 2003 - 2008, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts