Performance Optimization Audit

This document identifies performance bottlenecks in OLDP and provides actionable, code-level recommendations. Findings are organized by severity.

Context: OLDP serves legal documents (cases, laws, courts). Data is updated approximately once per week, making it an excellent candidate for aggressive caching.

Profiling Report

Measured profiling methodology and branch verification results (API + frontend + case detail) are documented in:

Infrastructure-level tuning beyond Django (Gunicorn, Nginx, MariaDB, Elasticsearch, Redis, plus Docker Compose examples) is documented in:


1. Database Query Issues (HIGH)

1.2 N+1 in get_content_as_html()

oldp/apps/cases/models.py:202-224 — Calls get_reference_markers() and get_markers(), each triggering separate DB queries. This method is invoked on every case detail page view (oldp/apps/cases/views.py:91).

Fix: Prefetch reference markers and annotation markers on the case queryset in the detail view, or cache the rendered HTML output per case.

1.3 Inefficient get_latest_law_book()

oldp/apps/laws/views.py:61-75 — Uses len(candidates) which forces full queryset evaluation instead of using .exists() or .first().

Fix:

def get_latest_law_book(book_slug):
    candidate = LawBook.objects.filter(slug=book_slug, latest=True).first()
    if candidate is None:
        logger.info("Law book not found: %s", book_slug)
        raise Http404()
    # Check for duplicates (should not happen)
    count = LawBook.objects.filter(slug=book_slug, latest=True).count()
    if count > 1:
        logger.warning("Book has more than one instance with latest=true: %s", book_slug)
    return candidate

1.4 Uncached Law.get_next() and has_next()

oldp/apps/laws/models.py:328-347 — Each call to get_next() or has_next() triggers a separate DB query. When both are called (e.g., in templates), two queries are executed for the same information.

Fix: Cache the result of get_next() on the instance and derive has_next() from it:

def get_next(self):
    if not hasattr(self, "_next_cache"):
        try:
            self._next_cache = Law.objects.get(previous=self.id)
        except Law.DoesNotExist:
            self._next_cache = None
        except Law.MultipleObjectsReturned:
            logger.error(f"Multiple laws found with previous={self.id}")
            self._next_cache = Law.objects.filter(previous=self.id).first()
    return self._next_cache

def has_next(self):
    return self.get_next() is not None

1.5 Law.get_latest_revision_url() — 2 DB Queries Per Call

oldp/apps/laws/models.py:397-414 — Fetches the latest book and then checks for law existence on every call. Particularly expensive if called in list contexts.

Fix: Cache on the instance, or avoid calling in list views entirely.

1.7 Missing defer() on Heavy Text Fields

Location

Issue

Fix

oldp/apps/laws/views.py:107

view_book() loads all Law fields including content for list display

Add .defer("content", "footnotes")

oldp/apps/laws/api_views.py:97

LawBookViewSet loads changelog, footnotes, sections (large JSON)

Add .defer("changelog", "footnotes", "sections") for list action

oldp/apps/laws/sitemaps.py:10-11

LawSitemap queryset loads content

Add .defer("content", "footnotes")

Note: CaseSitemap (oldp/apps/cases/sitemaps.py:10-12) correctly uses .defer(*Case.defer_fields_list_view).


2. Caching Gaps (HIGH)

2.1 Views Missing Cache Entirely

View

Location

Impact

CourtCasesListView

oldp/apps/courts/views.py:55-84

DB queries on every page load

CustomSearchView

oldp/apps/search/views.py:55-87

Elasticsearch queries on every request

autocomplete_view

oldp/apps/search/views.py:200-212

New SearchQuerySet on every keystroke

Fix for CourtCasesListView: Add @method_decorator(cache_page(settings.CACHE_TTL)) or use the @cache_per_user decorator.

Fix for autocomplete_view: Cache results by query string in Django’s cache framework:

from django.core.cache import cache

def autocomplete_view(request):
    query = request.GET.get("q", "")
    cache_key = f"autocomplete:{query}"
    suggestions = cache.get(cache_key)
    if suggestions is None:
        try:
            sqs = SearchQuerySet().autocomplete(title=query)[:5]
            suggestions = [result.title for result in sqs]
        except Exception as e:
            logger.error("Autocomplete search failed: %s", str(e))
            suggestions = []
        cache.set(cache_key, suggestions, timeout=settings.CACHE_TTL)
    return JsonResponse({"results": suggestions})

2.2 API Endpoints with Insufficient or No Caching

Endpoint

Location

Current

Recommended

Case API

oldp/apps/cases/api_views.py:88

60 seconds

15 minutes (settings.CACHE_TTL)

Law API

oldp/apps/laws/api_views.py

None

Add cache_page(settings.CACHE_TTL) on dispatch

LawBook API

oldp/apps/laws/api_views.py

None

Add cache_page(settings.CACHE_TTL) on dispatch

Court API

oldp/api/views.py:19-43

None

Add cache_page(settings.CACHE_TTL) on dispatch

City API

oldp/api/views.py:91-97

None

Add cache_page(settings.CACHE_TTL) on dispatch

State API

oldp/api/views.py:100-106

None

Add cache_page(settings.CACHE_TTL) on dispatch

2.3 No Template Fragment Caching

Zero {% cache %} tags found across all templates. Static fragments like the navbar, footer, and sidebar re-render on every request.

Fix: Wrap stable template fragments with Django’s {% cache %} tag:

{% load cache %}
{% cache 3600 navbar %}
  ... navbar HTML ...
{% endcache %}

2.4 Homepage Counts on Every Cache Miss

oldp/apps/homepage/views.py:24-25Law.objects.all().count() and Case.get_queryset(request).count() execute on every cache miss. The @cache_per_user decorator mitigates this but each unique user still triggers these queries.

Fix: Cache the counts separately with a longer TTL (e.g., 1 hour), since exact counts are not critical:

from django.core.cache import cache

laws_count = cache.get("homepage_laws_count")
if laws_count is None:
    laws_count = Law.objects.count()
    cache.set("homepage_laws_count", laws_count, timeout=3600)

3. Elasticsearch / Search (MEDIUM)

3.1 datetime.now() in Queryset

oldp/apps/search/views.py:82datetime.datetime.now() is called on every search request for the date facet end boundary.

Fix: Use a fixed date or compute it once per day via caching.

3.2 Heavy Facet Processing in Python

oldp/apps/search/views.py:88-161get_search_facets() performs nested iteration over facet data, building URL parameters in Python on every search request.

Fix: Cache the facet processing result keyed by the query + selected facets combination.

3.3 Autocomplete Without Caching

oldp/apps/search/views.py:206 — Creates a new SearchQuerySet() per request with no caching. See fix in section 2.1 above.


4. Middleware & HTTP (MEDIUM)

4.1 GZip Compression Disabled

oldp/settings.py:138GZipMiddleware is commented out. Responses are sent uncompressed, increasing transfer sizes.

Fix: Uncomment 'django.middleware.gzip.GZipMiddleware' and place it first in the middleware list. Alternatively, enable gzip at the reverse proxy level (nginx).

4.2 No Conditional Request Support

No ConditionalGetMiddleware is configured. The server cannot return 304 Not Modified for unchanged content, forcing full response re-transmission.

Fix: Add 'django.middleware.http.ConditionalGetMiddleware' to MIDDLEWARE.

4.3 FlatpageFallbackMiddleware on Every 404

oldp/settings.py:137FlatpageFallbackMiddleware triggers a DB query on every 404 response to check for a matching flatpage.

Impact: Low on most requests but adds latency on every 404 (bots, broken links, etc.).

4.4 Static File Hashing Disabled

oldp/settings.py:291CompressedManifestStaticFilesStorage (whitenoise) is commented out. Static files are served without content hashes, preventing long-lived browser cache headers.

Fix: Uncomment the whitenoise storage backend:

STATICFILES_STORAGE = 'whitenoise.storage.CompressedManifestStaticFilesStorage'

5. Context Processor Overhead (LOW)

oldp/apps/lib/context_processors.py:31reverse("flatpages", ...) is called on every request to resolve the API info URL.

oldp/apps/lib/context_processors.py:35get_version() is called on every request.

Fix: Compute these once at module load time:

_API_INFO_URL = None
_APP_VERSION = None

def global_context_processor(request):
    global _API_INFO_URL, _APP_VERSION
    if _API_INFO_URL is None:
        _API_INFO_URL = reverse("flatpages", kwargs={"url": "/api/"})
    if _APP_VERSION is None:
        _APP_VERSION = get_version()
    return {
        ...
        "api_info_url": _API_INFO_URL,
        "app_version": _APP_VERSION,
    }

6. SQL Injection Risk + Performance (LOW)

oldp/apps/sources/views.py:43 — Raw SQL with Python string formatting:

where_clause = ' WHERE c.created_date > "{}"'.format(diff_str)

This is a SQL injection vulnerability. Although diff_str is derived from datetime.timedelta, the pattern is dangerous and should be replaced.

Fix: Use parameterized queries:

if "delta" in date_range:
    diff = today - datetime.timedelta(**date_range["delta"])
    where_clause = " WHERE c.created_date > %s"
    params = [diff.strftime("%Y-%m-%d")]
else:
    where_clause = ""
    params = []

# ...
cursor.execute(query, params)

7. Deployment / Infrastructure Recommendations

7.1 Cache Backend

The default cache backend is file-based (oldp/settings.py:255). For production, use Redis for significantly better performance:

CACHES = {
    "default": {
        "BACKEND": "django.core.cache.backends.redis.RedisCache",
        "LOCATION": "redis://127.0.0.1:6379",
    }
}

7.2 Cache TTL

CACHE_TTL = 60 * 15 (15 minutes) at oldp/settings.py:254. Given weekly data updates, this could be extended to 1-4 hours for most views, with explicit cache invalidation on data import.

7.3 Database Query Logging

Enable query logging in development to catch N+1 issues early:

# settings_dev.py
LOGGING["loggers"]["django.db.backends"] = {
    "level": "DEBUG",
    "handlers": ["console"],
}

Or use django-debug-toolbar to inspect query counts per request.


Summary — Priority Matrix

Priority

Finding

Effort

Impact

HIGH

N+1 in get_related() (1.1)

Low

High — reduces queries from N+1 to 1

HIGH

Missing caching on search views (2.1)

Low

High — ES queries are expensive

HIGH

API caching gaps (2.2)

Low

High — repeated identical API calls

HIGH

Missing select_related in API views (1.6)

Low

Medium — extra query per serialized object

HIGH

Missing defer() on heavy fields (1.7)

Low

Medium — reduces memory and transfer

MEDIUM

GZip compression disabled (4.1)

Low

Medium — reduces response sizes ~60-70%

MEDIUM

Static file hashing disabled (4.4)

Low

Medium — enables long-lived browser caching

MEDIUM

Search facet processing (3.2)

Medium

Medium — heavy Python on every search

MEDIUM

Template fragment caching (2.3)

Medium

Medium — avoids re-rendering static HTML

LOW

Context processor overhead (5)

Low

Low — minor per-request cost

LOW

SQL injection in sources (6)

Low

Low (security fix, staff-only view)

LOW

get_latest_law_book queryset eval (1.3)

Low

Low — minor per-request cost