High-performance database operations with Django ORM
Table of Contents
- QuerySet Evaluation
- select_related
- prefetch_related
- Combining Relations
- Partial Loading
- Bulk Operations
- Aggregation
- Raw SQL
- Database Routers
- Profiling
- Quick Reference
1. QuerySet Evaluation
QuerySets are Lazy
# No database hit until evaluated
qs = User.objects.filter(active=True) # No query yet!
# EVALUATION TRIGGERS:
# ├── Iteration: for user in qs
# ├── Slicing with step: qs[::2]
# ├── len(): len(qs)
# ├── list(): list(qs)
# ├── bool(): if qs
# ├── Pickling
# └── repr()
Avoid Multiple Evaluations
# ❌ BAD: Multiple evaluations
users = User.objects.filter(active=True)
count = len(users) # Query 1
first = users[0] # Query 2
for user in users: # Query 3
print(user.name)
# ✅ GOOD: Single evaluation
users = list(User.objects.filter(active=True)) # Query 1
count = len(users) # No query
first = users[0] # No query
for user in users: # No query
print(user.name)
2. select_related
Use for ForeignKey and OneToOneField relationships (creates SQL JOIN).
Basic Usage
# ❌ N+1: Each order.customer triggers a query
orders = Order.objects.all()[:100]
for order in orders:
print(order.customer.name) # 100 additional queries!
# ✅ JOIN: Single query with related data
orders = Order.objects.select_related('customer').all()[:100]
for order in orders:
print(order.customer.name) # No additional queries
Multiple Relations
orders = Order.objects.select_related(
'customer',
'customer__company',
'shipping_address'
).all()[:100]
# Generated SQL:
# SELECT orders.*, customers.*, companies.*, addresses.*
# FROM orders
# JOIN customers ON orders.customer_id = customers.id
# JOIN companies ON customers.company_id = companies.id
# JOIN addresses ON orders.shipping_address_id = addresses.id
# LIMIT 100
3. prefetch_related
Use for ManyToMany and reverse ForeignKey relationships (creates separate IN query).
Basic Usage
# ❌ N+1: Each author.books triggers a query
authors = Author.objects.all()[:100]
for author in authors:
for book in author.books.all(): # 100 additional queries!
print(book.title)
# ✅ Prefetch: 2 queries total
authors = Author.objects.prefetch_related('books').all()[:100]
for author in authors:
for book in author.books.all(): # No additional queries
print(book.title)
# Generated SQL:
# Query 1: SELECT * FROM authors LIMIT 100
# Query 2: SELECT * FROM books WHERE author_id IN (1, 2, 3, ..., 100)
Prefetch Object for Custom Filtering
from django.db.models import Prefetch
# Prefetch with filtering
authors = Author.objects.prefetch_related(
Prefetch(
'books',
queryset=Book.objects.filter(published=True).order_by('-year'),
to_attr='published_books' # Store as list, not manager
)
).all()
for author in authors:
for book in author.published_books: # Access as list
print(book.title)
4. Combining Relations
select_related + prefetch_related
# Complex example with both
orders = Order.objects.select_related(
'customer', # ForeignKey: JOIN
'customer__company', # Nested ForeignKey: JOIN
).prefetch_related(
'items', # Reverse FK: Separate query
'items__product', # Prefetch products for items
Prefetch(
'items__product__reviews',
queryset=Review.objects.filter(rating__gte=4)
)
).filter(
status='completed'
).order_by('-created_at')[:50]
Nested Prefetch with select_related
# Prefetch books, then select_related on publisher
authors = Author.objects.prefetch_related(
Prefetch(
'books',
queryset=Book.objects.select_related('publisher')
)
).all()
5. Partial Loading
only() - Load Only Specified Fields
users = User.objects.only('id', 'name', 'email').all()[:100]
# Other fields loaded lazily on access (causes extra queries!)
defer() - Load All Except Specified
users = User.objects.defer('bio', 'profile_json').all()[:100]
# Deferred fields loaded lazily on access
Important: Include FK Fields
# ❌ This breaks prefetch_related
orders = Order.objects.only('id', 'total').prefetch_related('items').all()
# Missing customer_id causes N+1!
# ✅ Include the FK field
orders = Order.objects.only('id', 'total', 'customer_id').prefetch_related('items').all()
values() and values_list()
# Return dicts instead of model instances
users = User.objects.filter(active=True).values('id', 'name', 'email')
# Return tuples
emails = User.objects.values_list('email', flat=True)
6. Bulk Operations
bulk_create
# ❌ SLOW: Save each object
for data in dataset:
User.objects.create(name=data['name'], email=data['email'])
# ✅ FAST: bulk_create
users = [User(name=d['name'], email=d['email']) for d in dataset]
User.objects.bulk_create(users, batch_size=1000)
# With ignore_conflicts (PostgreSQL)
User.objects.bulk_create(users, ignore_conflicts=True)
# With update_conflicts (PostgreSQL, upsert)
User.objects.bulk_create(
users,
update_conflicts=True,
update_fields=['name'],
unique_fields=['email']
)
bulk_update
users = User.objects.filter(active=True)
for user in users:
user.last_login = timezone.now()
User.objects.bulk_update(users, ['last_login'], batch_size=1000)
update() - Single Query
# ✅ Most efficient for simple updates
User.objects.filter(active=True).update(last_login=timezone.now())
# Single UPDATE query, no model instantiation
delete() - Single Query
# ✅ Efficient bulk delete
User.objects.filter(deleted_at__isnull=False).delete()
7. Aggregation
Aggregate (Returns dict)
from django.db.models import Count, Avg, Sum
stats = Order.objects.aggregate(
total=Sum('amount'),
average=Avg('amount'),
count=Count('id')
)
# {'total': 50000, 'average': 250, 'count': 200}
Annotate (Adds field to each row)
authors = Author.objects.annotate(
book_count=Count('books'),
avg_rating=Avg('books__reviews__rating')
).filter(book_count__gt=5)
F Expressions (Database-side operations)
from django.db.models import F
# Increment without fetching
Product.objects.update(price=F('price') * 1.1) # 10% increase
# Use in annotations
orders = Order.objects.annotate(
profit=F('revenue') - F('cost')
)
Conditional Aggregation
from django.db.models import Q
orders = Order.objects.aggregate(
completed=Count('id', filter=Q(status='completed')),
pending=Count('id', filter=Q(status='pending'))
)
8. Raw SQL
Raw Query with Model Mapping
users = User.objects.raw('''
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id
HAVING COUNT(o.id) > 5
''')
Direct Database Execution
from django.db import connection
with connection.cursor() as cursor:
cursor.execute('''
SELECT name, COUNT(*) as cnt
FROM products
GROUP BY name
ORDER BY cnt DESC
LIMIT 10
''')
results = cursor.fetchall()
Parameterized Queries (Prevent SQL Injection)
cursor.execute(
'SELECT * FROM users WHERE email = %s',
[user_email]
)
9. Database Routers
Read Replica Configuration
# settings.py
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'mydb',
'HOST': 'primary.db.example.com',
},
'replica': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'mydb',
'HOST': 'replica.db.example.com',
}
}
Router Implementation
# routers.py
class ReadReplicaRouter:
def db_for_read(self, model, **hints):
return 'replica'
def db_for_write(self, model, **hints):
return 'default'
def allow_relation(self, obj1, obj2, **hints):
return True
def allow_migrate(self, db, app_label, model_name=None, **hints):
return db == 'default'
# settings.py
DATABASE_ROUTERS = ['myapp.routers.ReadReplicaRouter']
Explicit Routing
User.objects.using('replica').all() # Force read from replica
User.objects.using('default').all() # Force read from primary
10. Profiling
django-debug-toolbar
# Shows:
# ├── SQL queries with timing
# ├── Duplicate queries
# ├── N+1 detection
# └── Query explain plans
INSTALLED_APPS = [
'debug_toolbar',
]
MIDDLEWARE = [
'debug_toolbar.middleware.DebugToolbarMiddleware',
]
INTERNAL_IPS = ['127.0.0.1']
Query Logging
# settings.py
LOGGING = {
'version': 1,
'handlers': {
'console': {
'class': 'logging.StreamHandler',
},
},
'loggers': {
'django.db.backends': {
'level': 'DEBUG',
'handlers': ['console'],
},
},
}
Programmatic Query Count
from django.db import connection, reset_queries
from django.conf import settings
settings.DEBUG = True
reset_queries()
# Your code here
users = list(User.objects.all())
print(f'Queries: {len(connection.queries)}')
for query in connection.queries:
print(f'{query["time"]}s: {query["sql"][:100]}')
11. Quick Reference
N+1 Prevention Cheatsheet
# ForeignKey / OneToOne → select_related (JOIN)
Order.objects.select_related('customer')
# ManyToMany / Reverse FK → prefetch_related (IN query)
Author.objects.prefetch_related('books')
# Both combined
Order.objects.select_related('customer').prefetch_related('items')
# Filtered prefetch
Prefetch('books', queryset=Book.objects.filter(published=True))
Query Optimization Checklist
├── [ ] Identified N+1 with debug toolbar
├── [ ] Added select_related for FK traversal
├── [ ] Added prefetch_related for reverse FK / M2M
├── [ ] Used only/defer to limit fields
├── [ ] Used iterator() for large datasets
├── [ ] Used exists() instead of count() for boolean
├── [ ] Used values()/values_list() for simple data
├── [ ] Used update() instead of save() for bulk
├── [ ] Added indexes for filtered/ordered columns
└── [ ] Verified with EXPLAIN ANALYZE
Performance Tips
DO:
├── Use select_related for ForeignKey
├── Use prefetch_related for ManyToMany
├── Use bulk_create/bulk_update for batch ops
├── Use update() for simple bulk updates
├── Use iterator() for memory efficiency
├── Use values() when you don't need models
└── Profile with django-debug-toolbar
DON'T:
├── Call .all() inside loops
├── Use len(qs) when .count() works
├── Use list(qs) just to check existence
├── Forget to include FK fields with only()
├── Use save() in loops (use bulk_update)
└── Ignore N+1 warnings in debug toolbar