Insights on Oracle technologies and trends by Craig Shallahamer
By submitting your email address, you agree to receive emails regarding relevant topic offers from TechTarget and its partners. You can withdraw your consent at any time. Contact TechTarget at 275 Grove Street, Newton, MA.
We have all heard that it doesn't make sense to have more latches than the number of CPUs. Why? Because you can never use more latches simultaneously than the number of CPUs. While this appears to make perfect sense, it conceals a hidden lie. And here's why...
Suppose you are a warden at an extremely high security prison filled with the most notoriously violent prisoners. Twenty guards are under your command, each responsible for twenty men. To avoid a massive prison riot, these groups of twenty-one are kept separated. To reduce the chances of a gun battle, only five guns exist and the twenty guards must share them. The warden has appointed an additional supervisory guard to allocate the five available guns to the guards. A guard cannot leave his post under any circumstances, even to aid another guard. And finally, because it would be foolish to wrestle with a prisoner, a guard can never engage a prisoner without a gun and can only engage one prisoner at a time.
The warden was very pleased with this setup because it ensured each prisoner was watched, reduced the chances of a prison riot, and the number of guns required was kept to a minimum. However, sometimes a situation might arise where it would become difficult for one guard to handle twenty prisoners. Even if the guard was able to get one of the guns, he would still be in a very precarious position.
This kept the warden up late at night. He knew in his heart that eventually one of his guards would be overtaken by his twenty prisoners. The thought of being responsible for the death of one of his guards haunted him each night. He decided to call for outside help and hired a highly-paid consultant to advise him. The consultant simply said, "Hire more guards."
The warden was shocked because a guard could only engage one prisoner and then, only if he had one of the five guns! The warden replied, "Hire more guards with no additional guns?" The highly-paid consultant simply replied, "Yes."
The warden was completely baffled and was starting to question the consultant's advice. The warden said, "You're crazy! Even if I hire one guard for each prisoner, there are only five guns available, and without a gun, a guard cannot engage a prisoner."
The highly-paid consultant confidently drew back, pulled out a nice long Cuban cigar, lit it, took a couple of long draws, and finally said, "Yes, you are correct. However, currently if a guard has two prisoners who need "attention" he can only attend to one and the other prisoners must wait. With additional guards, the chances are better that a guard with a gun will be available for a prisoner in need." The warden was amazed at the highly-paid consultant's wisdom.
The warden replied, "I understand. Currently, if a guard engages one of his twenty prisoners, he cannot attend to even one additional prisoner under his watch. With additional guards, while he is engaging one prisoner (with a gun) another guard next to him (also with a gun) could engage another prisoner."
So the mystery was solved and the highly-paid consultant received a handsome bonus.
Now let's take a look at how this relates to Oracle latching. In our story, each gun represents a CPU, each guard represents a latch, and each prisoner a memory structure piece (e.g., one of the cache buffer chains). While it is true that there are only five guns/CPUs available, the trick is allocating the five guns to any of the memory structure pieces (e.g., one of the hash bucket chains) as quickly as possible.
In Oracle, many times a single latch covers multiple parts of a memory structure (e.g., multiple cache buffer chains). This means if two processes need to access an adjacent memory structure piece, they cannot get access because a single latch is covering them both. One solution is to add more latches...even if the total number of latches far exceeds the number of CPUs. The general idea is to bring the ratio of latches to memory structure pieces closer to one. This will result in the increased likelihood that a latch will be available to any memory structure piece. That's why the highly-paid consultant suggested hiring more guards, even though he knew there were only five guns available.
Let's take a look at a very specific example. One such situation where it makes good sense to have the number of latches much larger than the number of CPUs is with the cache buffer chain latch. Let me quickly explain.
Without getting into the details, when Oracle asks, "Is this block in the buffer cache?" it is directed to one of the cache buffer chains. Then the chain is sequentially searched looking for the correct buffer. Starting in Oracle 9, there are just over two times the number of chains as there are block buffers.
The number of cache buffer chain (CBC) latches is instance parameter configurable. Each CBC latch protects one or more cache buffer chains. That is, before kernel code can access a specific chain, it must first get the specific CBC latch that covers its specific chain. Starting in Oracle 9, a single CBC latch may cover over 700 cache buffer chains. CBC latch contention occurs when multiple processes attempt to access one or more chains protected by the same CBC latch. So one of the potential CBC solutions (and there are many) is to increase the number of CBC latches.
Why? Because with more latches, there is an increased likelihood that the latch you must get is available, because that latch is protecting fewer chains. In fact, the general solution is not based upon the number CPUs at all. The idea is to reduce the chances that more than one process will request the same latch.
So while Oracle processes may never ask for more CBC latches than there are CPUs, it is a very good idea to have more CBC latches than CPUs.
I hope this clears some things up and rekindles your interest in Oracle internals. If you want to learn more, I talk about all this in more detail (hands on also) in my Advanced Reactive Performance Management class.