Ad

Could Anyone Tell Me About The Lazy Loading And Transaction Of Django?

background like this:

data = User.objects.get(pk=1) 
if data.age > 20:
    with transaction.atomic():
        data.age -=2
        data.save()

I want to know ,if many process do the code at the same time,it like that,each process would get the data at the same time without transaction,for example ,age is 30.

then,one process do the next,make age-2=28 and save.

Then the next process do,when it do data.age -=2 ,the data get by data.

age ,would be 18 or 20? If it is 20,did it means,the transaction add the wrong place? Or it means , the transaction would not work,because the transaction would add to the data do select lines,and can change and save.but the transaction add with out select line?

second question:

if I do like this:

data = User.objects.get(pk=1)
with transaction.atomic():
    if data.age > 20:
        data.age -=2
        data.save()

this demo,add the transaction before the data.age > 20. For the lazy loading,the sql lines would do when I use it ,such as data.age > 20. But when it readly do sql lines,the transction had add before. So, I want to know,did this demo would add transaction on sql lines?

thanks a lot,nice people.

Ad

Answer

There are two issues we need to address here; transactions and locking, and lazy loading (which your code doesn't appear to use).

You have a race condition in all your examples; multiple requests fetching the age of the same user will try to update the database table to set 18 if they all fetched 20 before any of them committed the transaction.

It doesn't matter here if the column was fetched inside or outside of the transaction. All that the transaction guarantees, is that all writes will succeed together, or will all fail together. The data read will be consistent (so multiple reads will produce the same data), but the transaction will not prevent other transactions from reading and updating based on the read data.

That's because an atomic transaction only (briefly) locks rows when writing the data; all the changes in the transaction are written together, as one unit. But that doesn't mean that what you write to the database is correct, multiple transaction can read 20 as the age, and will all write 18 to the row when it is their turn to get the lock and have their commit succeed.

But, to address the lazy loading question, unless you explicitly marked the age column with defer() you are not using any lazy loading. The age value will have been loaded with all other User data when executing the User.objects.get() method. It doesn't really matter here, because even if the user.age > 20 test triggers a separate statement to read the age column, you still are going to read inconsistent data (you can read 20 just before another transaction commits and writes 18).

What you need then, is to lock the row before reading, so that other requests can't read the wrong value. If you lock first, then read, then commit, then unlock, other requests will have to wait until the lock is released and then read the age column.

You can use the select_for_update() method to lock a specific row, at which point any other request trying to get a lock on the same row will have to wait until you are done with the lock:

with transaction.atomic():
    data = User.objects.select_for_update().get(pk=1)
    if data.age > 20:
        data.age -=2
        data.save()

However, you should only use locking as a last resort. Locking will create a performance bottleneck, because now requests have to wait for each other. Unless your actual use case is more complex and covers multiple reads and writes that all have to be executed as one unit and you can only use Python code to make the decisions, you do not need to resort to using row locking.

Instead, if you need to update a column atomically, you should use an update() query with a filter on the age, at which point it is the database that determines if the age needs updating. Together with an F() expression, you then leave the whole calculation to the database, which executes this atomically:

from django.db.models import F

rowcount = User.objects.filter(pk=1, age_gt=20).update(age=F('age') - 2)

For more complex scenarious, you could use a conditional expression to determine the final value in an update.

Using UPDATE syntax with appropriate filters and expressions move the work to the database, both to test for your condition and the value calculation, and it will do so while committing, so while the row is locked. That ensures the lock is held for the minimum amount possible, reducing the bottleneck.

Ad
source: stackoverflow.com
Ad