# Performance Optimization Audit This document identifies performance bottlenecks in OLDP and provides actionable, code-level recommendations. Findings are organized by severity. **Context:** OLDP serves legal documents (cases, laws, courts). Data is updated approximately once per week, making it an excellent candidate for aggressive caching. ## Profiling Report Measured profiling methodology and branch verification results (API + frontend + case detail) are documented in: - [Performance Profiling Methodology and Findings (2026-02-22)](performance-profiling-2026-02-22.md) Infrastructure-level tuning beyond Django (Gunicorn, Nginx, MariaDB, Elasticsearch, Redis, plus Docker Compose examples) is documented in: - [Performance Optimization Beyond Django](performance-beyond-django.md) --- ## 1. Database Query Issues (HIGH) ### 1.1 N+1 Queries in `get_related()` Methods **Cases** (`oldp/apps/cases/models.py:258-269`): `Case.get_related()` loops over `RelatedCase` objects and accesses the `related_content` FK on each iteration, triggering a separate query per related item. **Laws** (`oldp/apps/laws/models.py:377-386`): Same pattern in `Law.get_related()` with `RelatedLaw`. **Fix:** Add `.select_related("related_content")` to the queryset in both methods: ```python # cases/models.py — Case.get_related() for item in RelatedCase.objects.filter(seed_content=self) \ .select_related("related_content") \ .order_by("-score")[:n]: items.append(item.related_content) # laws/models.py — Law.get_related() for item in RelatedLaw.objects.filter(seed_content=self) \ .select_related("related_content") \ .order_by("-score")[:n]: items.append(item.related_content) ``` ### 1.2 N+1 in `get_content_as_html()` `oldp/apps/cases/models.py:202-224` — Calls `get_reference_markers()` and `get_markers()`, each triggering separate DB queries. This method is invoked on every case detail page view (`oldp/apps/cases/views.py:91`). **Fix:** Prefetch reference markers and annotation markers on the case queryset in the detail view, or cache the rendered HTML output per case. ### 1.3 Inefficient `get_latest_law_book()` `oldp/apps/laws/views.py:61-75` — Uses `len(candidates)` which forces full queryset evaluation instead of using `.exists()` or `.first()`. **Fix:** ```python def get_latest_law_book(book_slug): candidate = LawBook.objects.filter(slug=book_slug, latest=True).first() if candidate is None: logger.info("Law book not found: %s", book_slug) raise Http404() # Check for duplicates (should not happen) count = LawBook.objects.filter(slug=book_slug, latest=True).count() if count > 1: logger.warning("Book has more than one instance with latest=true: %s", book_slug) return candidate ``` ### 1.4 Uncached `Law.get_next()` and `has_next()` `oldp/apps/laws/models.py:328-347` — Each call to `get_next()` or `has_next()` triggers a separate DB query. When both are called (e.g., in templates), two queries are executed for the same information. **Fix:** Cache the result of `get_next()` on the instance and derive `has_next()` from it: ```python def get_next(self): if not hasattr(self, "_next_cache"): try: self._next_cache = Law.objects.get(previous=self.id) except Law.DoesNotExist: self._next_cache = None except Law.MultipleObjectsReturned: logger.error(f"Multiple laws found with previous={self.id}") self._next_cache = Law.objects.filter(previous=self.id).first() return self._next_cache def has_next(self): return self.get_next() is not None ``` ### 1.5 `Law.get_latest_revision_url()` — 2 DB Queries Per Call `oldp/apps/laws/models.py:397-414` — Fetches the latest book and then checks for law existence on every call. Particularly expensive if called in list contexts. **Fix:** Cache on the instance, or avoid calling in list views entirely. ### 1.6 Missing `select_related` in API Views | Location | Missing | Fix | |---|---|---| | `oldp/apps/laws/api_views.py:49-50` | `LawViewSet.get_queryset()` | Add `.select_related("book")` | | `oldp/apps/annotations/api_views.py:32-33` | `AnnotationLabelViewSet.get_queryset()` | Add `.select_related("owner")` | | `oldp/apps/annotations/api_views.py:62-64` | `CaseAnnotationViewSet` queryset | Add `"label__owner"` to existing `select_related` | ### 1.7 Missing `defer()` on Heavy Text Fields | Location | Issue | Fix | |---|---|---| | `oldp/apps/laws/views.py:107` | `view_book()` loads all Law fields including `content` for list display | Add `.defer("content", "footnotes")` | | `oldp/apps/laws/api_views.py:97` | `LawBookViewSet` loads `changelog`, `footnotes`, `sections` (large JSON) | Add `.defer("changelog", "footnotes", "sections")` for list action | | `oldp/apps/laws/sitemaps.py:10-11` | `LawSitemap` queryset loads `content` | Add `.defer("content", "footnotes")` | Note: `CaseSitemap` (`oldp/apps/cases/sitemaps.py:10-12`) correctly uses `.defer(*Case.defer_fields_list_view)`. --- ## 2. Caching Gaps (HIGH) ### 2.1 Views Missing Cache Entirely | View | Location | Impact | |---|---|---| | `CourtCasesListView` | `oldp/apps/courts/views.py:55-84` | DB queries on every page load | | `CustomSearchView` | `oldp/apps/search/views.py:55-87` | Elasticsearch queries on every request | | `autocomplete_view` | `oldp/apps/search/views.py:200-212` | New `SearchQuerySet` on every keystroke | **Fix for `CourtCasesListView`:** Add `@method_decorator(cache_page(settings.CACHE_TTL))` or use the `@cache_per_user` decorator. **Fix for `autocomplete_view`:** Cache results by query string in Django's cache framework: ```python from django.core.cache import cache def autocomplete_view(request): query = request.GET.get("q", "") cache_key = f"autocomplete:{query}" suggestions = cache.get(cache_key) if suggestions is None: try: sqs = SearchQuerySet().autocomplete(title=query)[:5] suggestions = [result.title for result in sqs] except Exception as e: logger.error("Autocomplete search failed: %s", str(e)) suggestions = [] cache.set(cache_key, suggestions, timeout=settings.CACHE_TTL) return JsonResponse({"results": suggestions}) ``` ### 2.2 API Endpoints with Insufficient or No Caching | Endpoint | Location | Current | Recommended | |---|---|---|---| | Case API | `oldp/apps/cases/api_views.py:88` | 60 seconds | 15 minutes (`settings.CACHE_TTL`) | | Law API | `oldp/apps/laws/api_views.py` | None | Add `cache_page(settings.CACHE_TTL)` on dispatch | | LawBook API | `oldp/apps/laws/api_views.py` | None | Add `cache_page(settings.CACHE_TTL)` on dispatch | | Court API | `oldp/api/views.py:19-43` | None | Add `cache_page(settings.CACHE_TTL)` on dispatch | | City API | `oldp/api/views.py:91-97` | None | Add `cache_page(settings.CACHE_TTL)` on dispatch | | State API | `oldp/api/views.py:100-106` | None | Add `cache_page(settings.CACHE_TTL)` on dispatch | ### 2.3 No Template Fragment Caching Zero `{% cache %}` tags found across all templates. Static fragments like the navbar, footer, and sidebar re-render on every request. **Fix:** Wrap stable template fragments with Django's `{% cache %}` tag: ```html {% load cache %} {% cache 3600 navbar %} ... navbar HTML ... {% endcache %} ``` ### 2.4 Homepage Counts on Every Cache Miss `oldp/apps/homepage/views.py:24-25` — `Law.objects.all().count()` and `Case.get_queryset(request).count()` execute on every cache miss. The `@cache_per_user` decorator mitigates this but each unique user still triggers these queries. **Fix:** Cache the counts separately with a longer TTL (e.g., 1 hour), since exact counts are not critical: ```python from django.core.cache import cache laws_count = cache.get("homepage_laws_count") if laws_count is None: laws_count = Law.objects.count() cache.set("homepage_laws_count", laws_count, timeout=3600) ``` --- ## 3. Elasticsearch / Search (MEDIUM) ### 3.1 `datetime.now()` in Queryset `oldp/apps/search/views.py:82` — `datetime.datetime.now()` is called on every search request for the date facet end boundary. **Fix:** Use a fixed date or compute it once per day via caching. ### 3.2 Heavy Facet Processing in Python `oldp/apps/search/views.py:88-161` — `get_search_facets()` performs nested iteration over facet data, building URL parameters in Python on every search request. **Fix:** Cache the facet processing result keyed by the query + selected facets combination. ### 3.3 Autocomplete Without Caching `oldp/apps/search/views.py:206` — Creates a new `SearchQuerySet()` per request with no caching. See fix in section 2.1 above. --- ## 4. Middleware & HTTP (MEDIUM) ### 4.1 GZip Compression Disabled `oldp/settings.py:138` — `GZipMiddleware` is commented out. Responses are sent uncompressed, increasing transfer sizes. **Fix:** Uncomment `'django.middleware.gzip.GZipMiddleware'` and place it first in the middleware list. Alternatively, enable gzip at the reverse proxy level (nginx). ### 4.2 No Conditional Request Support No `ConditionalGetMiddleware` is configured. The server cannot return `304 Not Modified` for unchanged content, forcing full response re-transmission. **Fix:** Add `'django.middleware.http.ConditionalGetMiddleware'` to `MIDDLEWARE`. ### 4.3 FlatpageFallbackMiddleware on Every 404 `oldp/settings.py:137` — `FlatpageFallbackMiddleware` triggers a DB query on every 404 response to check for a matching flatpage. **Impact:** Low on most requests but adds latency on every 404 (bots, broken links, etc.). ### 4.4 Static File Hashing Disabled `oldp/settings.py:291` — `CompressedManifestStaticFilesStorage` (whitenoise) is commented out. Static files are served without content hashes, preventing long-lived browser cache headers. **Fix:** Uncomment the whitenoise storage backend: ```python STATICFILES_STORAGE = 'whitenoise.storage.CompressedManifestStaticFilesStorage' ``` --- ## 5. Context Processor Overhead (LOW) `oldp/apps/lib/context_processors.py:31` — `reverse("flatpages", ...)` is called on every request to resolve the API info URL. `oldp/apps/lib/context_processors.py:35` — `get_version()` is called on every request. **Fix:** Compute these once at module load time: ```python _API_INFO_URL = None _APP_VERSION = None def global_context_processor(request): global _API_INFO_URL, _APP_VERSION if _API_INFO_URL is None: _API_INFO_URL = reverse("flatpages", kwargs={"url": "/api/"}) if _APP_VERSION is None: _APP_VERSION = get_version() return { ... "api_info_url": _API_INFO_URL, "app_version": _APP_VERSION, } ``` --- ## 6. SQL Injection Risk + Performance (LOW) `oldp/apps/sources/views.py:43` — Raw SQL with Python string formatting: ```python where_clause = ' WHERE c.created_date > "{}"'.format(diff_str) ``` This is a **SQL injection vulnerability**. Although `diff_str` is derived from `datetime.timedelta`, the pattern is dangerous and should be replaced. **Fix:** Use parameterized queries: ```python if "delta" in date_range: diff = today - datetime.timedelta(**date_range["delta"]) where_clause = " WHERE c.created_date > %s" params = [diff.strftime("%Y-%m-%d")] else: where_clause = "" params = [] # ... cursor.execute(query, params) ``` --- ## 7. Deployment / Infrastructure Recommendations ### 7.1 Cache Backend The default cache backend is file-based (`oldp/settings.py:255`). For production, use Redis for significantly better performance: ```python CACHES = { "default": { "BACKEND": "django.core.cache.backends.redis.RedisCache", "LOCATION": "redis://127.0.0.1:6379", } } ``` ### 7.2 Cache TTL `CACHE_TTL = 60 * 15` (15 minutes) at `oldp/settings.py:254`. Given weekly data updates, this could be extended to 1-4 hours for most views, with explicit cache invalidation on data import. ### 7.3 Database Query Logging Enable query logging in development to catch N+1 issues early: ```python # settings_dev.py LOGGING["loggers"]["django.db.backends"] = { "level": "DEBUG", "handlers": ["console"], } ``` Or use `django-debug-toolbar` to inspect query counts per request. --- ## Summary — Priority Matrix | Priority | Finding | Effort | Impact | |---|---|---|---| | HIGH | N+1 in `get_related()` (1.1) | Low | High — reduces queries from N+1 to 1 | | HIGH | Missing caching on search views (2.1) | Low | High — ES queries are expensive | | HIGH | API caching gaps (2.2) | Low | High — repeated identical API calls | | HIGH | Missing `select_related` in API views (1.6) | Low | Medium — extra query per serialized object | | HIGH | Missing `defer()` on heavy fields (1.7) | Low | Medium — reduces memory and transfer | | MEDIUM | GZip compression disabled (4.1) | Low | Medium — reduces response sizes ~60-70% | | MEDIUM | Static file hashing disabled (4.4) | Low | Medium — enables long-lived browser caching | | MEDIUM | Search facet processing (3.2) | Medium | Medium — heavy Python on every search | | MEDIUM | Template fragment caching (2.3) | Medium | Medium — avoids re-rendering static HTML | | LOW | Context processor overhead (5) | Low | Low — minor per-request cost | | LOW | SQL injection in sources (6) | Low | Low (security fix, staff-only view) | | LOW | `get_latest_law_book` queryset eval (1.3) | Low | Low — minor per-request cost |