When we introduced locks and locking, we only worried about providing the same type of locking granularity as the available WATCH command—on the level of the market key that we were updating. But because we’re constructing locks manually, and we’re less concerned about the market in its entirety than we are with whether an item is still in the market, we can actually lock on a finer level of detail. If we replace the market-level lock with one specific to the item to be bought or sold, we can reduce lock contention and increase performance.
Let’s look at the results in table 6.3, which is the same simulation as produced table 6.2, only with locks over just the items being listed or sold individually, and not over the entire market.
Listed items |
Bought items |
Purchase retries |
Average wait per purchase |
|
---|---|---|---|---|
1 lister, 1 buyer, no lock |
145,000 |
27,000 |
80,000 |
14ms |
1 lister, 1 buyer, with lock |
51,000 |
50,000 |
0 |
1ms |
1 lister, 1 buyer, with fine-grained lock |
113,000 |
110,000 |
0 |
<1ms |
5 listers, 1 buyer, no lock |
331,000 |
<200 |
50,000 |
150ms |
5 listers, 1 buyer, with lock |
68,000 |
13,000 |
<10 |
5ms |
5 listers, 1 buyer, with fine-grained lock |
192,000 |
36,000 |
0 |
<2ms |
5 listers, 5 buyers, no lock |
206,000 |
<600 |
161,000 |
498ms |
5 listers, 5 buyers, with lock |
21,000 |
20,500 |
0 |
14ms |
5 listers, 5 buyers, with fine-grained lock |
116,000 |
111,000 |
0 |
<3ms |
With fine-grained locking, we’re performing 220,000–230,000 listing and buying operations regardless of the number of listing and buying processes. We have no retries, and even under a full load, we’re seeing less than 3 milliseconds of latency. Our listedto-sold ratio is again almost exactly the same as our ratio of listing-to-buying processes. Even better, we never get into a situation like we did without locks where there’s so much contention that latencies shoot through the roof and items are rarely sold.
Let’s take a moment to look at our data as a few graphs so that we can see the relative scales. In figure 6.3, we can see that both locking methods result in much higher numbers of items being purchased over all relative loads than the WATCH-based method.
Looking at figure 6.4, we can see that the WATCH-based method has to perform many thousands of expensive retries in order to complete what few sales are completed.
And in figure 6.5, we can see that because of the WATCH contention, which caused the huge number of retries and the low number of purchase completions, latency without using a lock went up significantly.
What these simulations and these charts show overall is that when under heavy load, using a lock can reduce retries, reduce latency, improve performance, and be tuned at the granularity that we need.
Our simulation is limited. One major case that it doesn’t simulate is where many more buyers are unable to buy items because they’re waiting for others. It also doesn’t simulate an effect known as dogpiling, when, as transactions take longer to complete, more transactions are overlapping and trying to complete. That will increase the time it takes to complete an individual transaction, and subsequently increase the chances for a time-limited transaction to fail. This will substantially increase the failure and retry rates for all transactions, but it’s especially harmful in the WATCH-based version of our market buying and selling.
The choice to use a lock over an entire structure, or over just a small portion of a structure, can be easy. In our case, the critical data that we were watching was a small piece of the whole (one item in a marketplace), so locking that small piece made sense. There are situations where it’s not just one small piece, or when it may make sense to lock multiple parts of structures. That’s where the decision to choose locks over small pieces of data or an entire structure gets difficult; the use of multiple small locks can lead to deadlocks, which can prevent any work from being performed at all.