Transaction and Concurrency
In this note it explained in more detail some topics about transaction and concurrency, such as deadlock handling and concurrency control schemes.
Deadlock Handling
In this chapter we will approach some mechanisms databases implement to prevent, mitigate and handle deadlocks.
Deadlock Prevention
A deadlock prevention strategy is a way that database try to avoid deadlocks happening (this is therefore done before a deadlock happens). The five deadlock prevention strategies that exist are:
- Pre-declaration: Require a transaction to acquire all the needed locks for that transaction before the transaction ends.
- Graph-based protocol: Impose a partial order on all data items, and force transactions to lock items based in the order imposed
- Wait-Die Scheme: When two transactions are deadlock, the system can force the "younger transaction" (higher timestamp) to be rolled back (dies), so the oldest transaction can eventually get access to the data locked that was causing the deadlock. When transactions are rolled back, they are restarted with their original timestamp (to give preference to older transaction)
- Wait-Wound Scheme: Similar to the strategy above, but instead fully rolling back the younger transaction, it is rolled back a limited number of instructions (not a restart)
- Timeout-based Schemes: Database sets a timeout whenever a transaction is waiting for a deadlock; when that timeout expires, that transaction is rolled back. This strategy has some problems, namely unnecessary rollbacks can happen, and starvation can also happen.
Concurrency Control Schemes
In this section we will discuss and present some of the possible concurrency control schemes that database systems can implement, such as:
Multiple Granularity
(todo)
Multiversion Schemes
Multiversion schemes work on the idea of keeping different (old) versions of each data item, to be able to increase concurrency. This type of concurrency control can be described with three key ideas:
- Whenever a write(Q) is issued, that creates a new version of that data item
- Usage of timestamps to label versions
- Upon a read(Q) performed by a transaction , the value that should be returned by this function depends on the timestamp of ; an appropriate version must be chosen
In this chapter, we will explore two different variants of multiversion schemes:
Multiversion Timestamp Ordering
In multiversion timestamp ordering, we can establish the following principles:
- Each data item has its own set of versions:
- Each version has the following two timestamps:
- W-Timestamp - timestamp of the transaction that created version
- R-Timestamp - timestamp of the transaction Q_k$
Algorithm
In the algorithm, a transaction can issue two operations: read(Q) and write(Q). Let denote the version with the largest W-timestamp, such that W-timestamp TS()
- If a transaction issues a read(Q), then:
- the value returned is
- if R-Timestamp() < TS(), then set R-Timestamp() = TS()
- If transaction issues a write(Q), then:
- If R-Timestamp() > TS(), we rollback the transaction with new timestamp
- If W-timestamp() = TS(), then version is overwritten
- Otherwise, a new version is created, and W-timestamp() = R-Timestamp()= TS()
We can easily see that, using this algorithm, read requests never fail.
Guarantees
- This protocol guarantees serializability
- However, it does not guarantee recoverability or cascadelessness
Snapshot Isolation
In snapshot isolation, the idea is to give each transaction, when they are created, a snapshot of the entire database; updates and reads of the transaction will be done locally, in that snapshot; it is at commit time that it is evaluated whether it is possible to update the database values, or if the transaction needs to be rolled back. Just like in multiversion timestamp ordering, read requests never fail (and also never need to wait for reads).
Multiversioning in Snapshot Isolation
In snapshot isolation, both transactions and data items have timestamp:
- Transactions have two types of timestamps:
- StartTS() - represents the timestamp at the time the transaction entered the system
- CommitTS() - represents the timestamp at the time the transaction commits
- Data items only have one type of timestamp:
- W-timestamp() - it is equal to the Commit() that created version
When a transaction reads a data item , it will read the most recent version such that W-timestamp() Start(), and it will not see any updates with timestamp higher than StartTS().
Validation
During transaction execution, no conflicts arise, as transactions work on their own snapshot; conflicts only surge at commit time, and it is necessary to have a validation policy, to know which transactions to commit, and which to roll back.
To define this validation policy, we first have to define what concurrent transactions are:
- Two transactions and are said to be concurrent if:
- StartTS( StartTS( CommitTS(), or
- StartTS( StartTS( _CommitTS()
If these two transactions update the same data item, both will be operating on their own private snapshot, and when changes are committed, one of the updates will be overwritten by the other. There are two approaches to solve this:
First Committer Wins
In this approach, at the commit time of transaction and for every data item that updated, the database manager will analyze if there is any version such that:
- StartTS() W-timestamp() CommitTS()
If there is, that means that another transaction updated while was being executed. According to first committer wins, should not be allowed to commit, and must be rolled backed. In the case there was no such that would verify the above condition, then would be allowed to commit.
First Updater Wins
This approach considers the usage of locks on data items; whenever a transaction wants to update , it must acquire an exclusive lock on . The algorithm, for a given transaction trying to update is this:
- Transaction tries to acquire lock on
- If it is able to get lock:
- If has been updated by a concurrent transaction (verify StartTS() W-timestamp() CommitTS()), needs t be rolled back
- Otherwise, is allowed to do the update
- If it is not able to get the lock, assuming holds the lock:
- Wait until commits or aborts
- If aborts, is able to get the lock; go to step 2
- If aborts, must be rolled back
Locks are released when transactions commit or abort