Transaction isolation levels in database

Transaction isolation level is a conditional value that determines to what extent, as a result of logically parallel transactions in the DBMS, it is allowed to receive inconsistent data. The transaction isolation scale contains a range of values, ranked from lowest to highest; a higher level of isolation means better data consistency, but using it can reduce the number of physically concurrent transactions. Conversely, a lower level of isolation allows more concurrent transactions, but reduces the accuracy of the data. Thus, choosing the used level of isolation of transactions, the developer of the information system, to a certain extent, provides a choice between the speed of work and ensuring the guaranteed consistency of the data received from the system.

Concurrent access issues using transactions

With parallel execution of transactions, the following problems are possible:

  • lost update - if one data block is simultaneously changed by different transactions, all changes are lost except for the last one;
  • dirty read - reading data added or modified by a transaction, which will not be confirmed (rolled back);
  • non-repeatable read - when re-reading within one transaction, previously read data is changed;
  • phantom reads - one transaction during its execution several times selects many rows according to the same criteria. Another transaction in the intervals between these selections adds rows or changes columns of some rows used in the selection criteria of the first transaction and ends successfully. As a result, it turns out that the same samples in the first transaction give different sets of rows.

Consider situations in which these problems may arise.

Lost update

A situation when, when changing one data block at the same time by different transactions, one of the changes is lost.

Suppose there are two transactions running at the same time:

Transaction 1 Transaction 2
UPDATE tbl1 SET f2=f2+20 WHERE f1=1; UPDATE tbl1 SET f2=f2+25 WHERE f1=1;

In both transactions, the value of the f2 field is changed; upon their completion, the value of the field must be increased by 45. In reality, the following sequence of actions may occur:

  • Both transactions read the current state of the field at the same time. Exact physical concurrency is not required here, it is sufficient that the second-in-order read operation is performed before another transaction writes its result.
  • Both transactions calculate the new field value by adding 20 and 25, respectively, to the previously read value.
  • Transactions are trying to write the result of calculations back to field f2. Since it is physically impossible to perform two recordings at the same time, in reality one of the recording operations will be performed earlier, the other later. In this case, the second write operation will overwrite the result of the first.

As a result, after the completion of both transactions, the value of the f2 field may increase not by 45, but by 20 or 25, that is, one of the transactions that modify the data will “disappear”.

Dirty read

Reading data added or modified by a transaction that is subsequently not confirmed (rolled back).

Suppose you have two transactions opened by different applications that execute the following SQL statements:

Transaction 1 Transaction 2
UPDATE tbl1 SET f2 = f2 + 1 WHERE f1 = 1;
SELECT f2 FROM tbl1 WHERE f1 = 1;
ROLLBACK WORK;

In transaction 1, the value of the f2 field is changed, and then in transaction 2 the value of this field is selected. Then transaction 1 is rolled back. As a result, the value obtained by the second transaction will differ from the value stored in the database.

Non-repeatable read

The situation when, when re-reading within one transaction, the previously read data turns out to be changed.

Suppose you have two transactions opened by different applications that execute the following SQL statements:

Transaction 1 Transaction 2
SELECT f2 FROM tbl1 WHERE f1=1;
UPDATE tbl1 SET f2=f2+1 WHERE f1=1;
COMMIT;
SELECT f2 FROM tbl1 WHERE f1=1;

In transaction 2, the value of the f2 field is selected, then in transaction 1 the value of the f2 field is changed. If you try to select a value from f2 again in transaction 2, you will get a different result. This situation is especially unacceptable when data is read in order to partially change it and write it back to the database.

Phantom reads

A situation when, when rereading within the same transaction, the same sample gives different sets of rows.

Suppose you have two transactions opened by different applications that executed the following SQL statements:

Transaction 1 Transaction 2
SELECT SUM(f2) FROM tbl1;
INSERT INTO tbl1 (f1,f2) VALUES (15,20);
COMMIT;
SELECT SUM(f2) FROM tbl1;

Transaction 2 runs an SQL statement that uses all the values ​​in the f2 field. Then, in transaction 1, a new row is inserted, causing the re-execution of the SQL statement in transaction 2 to produce a different result. This situation is called phantom read. It differs from non-repeatable read in that the result of repeated access to the data has changed not due to changes/deletion of the data itself, but due to the appearance of new (phantom) data.

Isolation levels

The "transaction isolation level" refers to the degree of protection provided by the internal mechanisms of the DBMS (that is, it does not require special programming) against all or some of the above types of data inconsistencies arising from parallel execution of transactions. The SQL-92 standard defines a scale of four isolation levels: Read uncommitted, Read committed, Repeatable read, Serializable. The first of them is the weakest, the last is the strongest, each subsequent one includes all the previous ones.

Read uncommitted

The lowest (first) isolation level. It only guarantees that there are no lost updates. If multiple concurrent transactions try to modify the same row in a table, then the final row will have the value determined by the entire set of successful transactions. In this case, it is possible to read not only logically inconsistent data, but also data whose changes have not yet been recorded.

A typical way to implement this isolation level is to lock the data while the update command is executing, which ensures that commands to modify the same lines running in parallel are actually executed sequentially, and none of the changes are lost. Read-only transactions are never blocked at this isolation level.

Read committed

Most industrial DBMSs, in particular Microsoft SQL Server, PostgreSQL and Oracle, use this level by default. At this level, protection against rough, "dirty" reads is provided, however, during the operation of one transaction, another can be successfully completed and the changes made by it are committed. As a result, the first transaction will work with a different set of data.

Read committed implementation can be based on one of two approaches: blocking or versioning.

Blocking readable and variable data.

It consists in the fact that the writing transaction locks the mutable data for read transactions operating at the read committed level or higher until its completion, thus preventing "dirty" reading, and the data locked by the reading transaction is released immediately after the SELECT operation completes (thus, a "nonrepeatable read" situation can occur at this isolation level).

Saving multiple versions of mutable strings in parallel.

Each time a row is changed, the DBMS creates a new version of this row, with which the transaction that changed the data continues to work, while any other "reading" transaction returns the last committed version. The advantage of this approach is that it is faster because it prevents blockages. However, it requires, in comparison with the first, significantly more consumption of RAM, which is spent on storing row versions. In addition, when changing data in parallel by several transactions, a situation may arise where several parallel transactions will make inconsistent changes to the same data (since there are no locks, nothing prevents you from doing this). Then the transaction that is committed first will save its changes in the main database, and the rest of the parallel transactions will be impossible to commit (since this will lead to the loss of the update of the first transaction). The only thing that the DBMS can do in such a situation is to roll back the rest of the transactions and display an error message "The record has already been changed."

The specific implementation method is chosen by the DBMS developers, and in some cases it can be customized. So, by default MS SQL uses locks, but (in version 2005 and higher) when the READ_COMMITTED_SNAPSHOT parameter is set, the database switches to the versioning strategy, Oracle initially works only according to the versioning scheme. Informix can prevent conflicts between read and write transactions by setting the configuration parameter USELASTCOMMITTED (since 11.1) so that the read transaction will receive the latest committed data.

Repeatable read

The level at which the reading transaction "does not see" changes to the data that it previously read. However, no other transaction can change the data read by the current transaction until it is completed.

Shared mode locks are applied to all data read by any instruction in a transaction and are retained until completion. This prevents other transactions from modifying rows that were read by an incomplete transaction. However, other transactions can insert new rows that match the search criteria for instructions contained in the current transaction. When the statement is restarted, the current transaction will fetch new rows, resulting in a ghost read. Given that shared locks are held until the end of the transaction rather than released at the end of each statement, the degree of parallelism is lower than with the READ COMMITTED isolation level. Therefore, using data and higher transaction levels unnecessarily is generally not recommended.

Serializable

Highest level of isolation; transactions are completely isolated from each other, each executed as if concurrent transactions did not exist. Only at this level, concurrent transactions are not subject to the "phantom read" effect.

Support for transaction isolation in real DBMS

DBMS providing transactionality do not always support all four levels, and may also introduce additional ones. Various nuances in providing isolation are also possible.

For example, Oracle, in principle, does not support the zero level, since its implementation of transactions excludes "dirty reads", and formally does not allow setting the Repeatable read level, that is, it only supports Read committed (by default) and Serializable. At the same time, at the level of individual commands, it, in fact, guarantees repeatability of reading (if the SELECT command in the first transaction selects a set of rows from the database, and at this time a parallel second transaction changes some of these rows, then the result set obtained by the first transaction will be contain unchanged rows as if there was no second transaction). Oracle also supports so-called READ-ONLY transactions, which conform to Serializable, but cannot modify the data themselves.

Microsoft SQL Server supports all four standard transaction isolation levels, and additionally - the SNAPSHOT level, at which a transaction sees the data state that was committed before it was started, as well as changes made by itself, that is, it behaves as if it received running a snapshot of the database data and working with it. The difference from Serialized is that no locks are used, but as a result, committing may not be possible if a concurrent transaction has changed the same data before; in this case, the second transaction, when attempting to COMMIT, will generate an error message and will be canceled.


Read also:


Comments

Popular posts from this blog

Methods for reading XML in Java

XML, well-formed XML and valid XML

ArrayList and LinkedList in Java, memory usage and speed