Performance Profiling Methodology and Findings (2026-02-22)

This document summarizes the profiling methodology and findings for the current feature branch performance work, including cache correctness fixes, API caching verification, and frontend/case-page profiling.

Scope

Profiled and verified:

  • API endpoints:

    • /api/

    • /api/cases/

    • /api/laws/

    • /api/law_books/

  • Frontend pages:

    • / (frontpage)

    • /case/ (case list)

    • /case/foo-case (case detail, fixture data)

The work included both correctness fixes (cache variance/safety) and performance improvements (query reductions, caching).

Environment and Tooling

  • Date: February 22, 2026

  • Python env: .venv

  • Profiling packages installed with uv:

    • django-silk

    • django-querycount

    • gprof2dot

Commands used to install profiling packages

source .venv/bin/activate
uv pip install django-silk==5.4.3 django-querycount==0.8.3 gprof2dot

Important Runtime Setup Notes

Env variable names (important)

django-configurations uses different env names for different settings types in this project:

  • Database override: DATABASE_URL=...

  • Profiling toggles: DJANGO_PROFILING_ENABLED=1

  • Querycount toggle: DJANGO_QUERYCOUNT_ENABLED=1

  • File cache location: DJANGO_FILE_CACHE_LOCATION=/tmp/...

Using DJANGO_DATABASE_URL does not override the DB here.

Temp SQLite DB location under escalated commands

When profiling with escalated commands, the process did not see the same SQLite file contents created in the sandbox workspace path. The reliable approach was:

  • create/migrate/load the profiling DB in /tmp

  • run DevConfiguration against /tmp/oldp-dev-profile.sqlite3

Methodology

1. Validate profiling middleware stack

  • Enabled Silk + QueryCount via env vars

  • Verified:

    • middleware activation

    • /silk/ route registration

    • X-DjangoQueryCount-* headers on responses

    • Silk request recording in DB

2. Build a temporary profiling DB

Using a fresh SQLite DB for reproducible profiling:

TMPDIR=/tmp SQLITE_TMPDIR=/tmp \
DJANGO_CONFIGURATION=DevConfiguration \
DATABASE_URL=sqlite:////tmp/oldp-dev-profile.sqlite3 \
DJANGO_PROFILING_ENABLED=1 \
DJANGO_QUERYCOUNT_ENABLED=1 \
DJANGO_FILE_CACHE_LOCATION=/tmp/django_cache_devprofile \
.venv/bin/python manage.py migrate --noinput

Load fixture data:

TMPDIR=/tmp SQLITE_TMPDIR=/tmp \
DJANGO_CONFIGURATION=DevConfiguration \
DATABASE_URL=sqlite:////tmp/oldp-dev-profile.sqlite3 \
DJANGO_PROFILING_ENABLED=1 \
DJANGO_QUERYCOUNT_ENABLED=1 \
DJANGO_FILE_CACHE_LOCATION=/tmp/django_cache_devprofile \
.venv/bin/python manage.py loaddata \
  locations/countries.json \
  locations/states.json \
  locations/cities.json \
  courts/courts.json \
  sources/default.json \
  laws/laws.json \
  cases/cases.json

3. Run profiled local server (DevConfiguration)

TMPDIR=/tmp SQLITE_TMPDIR=/tmp \
DJANGO_CONFIGURATION=DevConfiguration \
DATABASE_URL=sqlite:////tmp/oldp-dev-profile.sqlite3 \
DJANGO_PROFILING_ENABLED=1 \
DJANGO_QUERYCOUNT_ENABLED=1 \
DJANGO_FILE_CACHE_LOCATION=/tmp/django_cache_devprofile \
.venv/bin/python manage.py runserver 127.0.0.1:8014 --noreload

4. Measure cold vs warm requests (2-pass)

Two sequential requests per endpoint were issued with curl, capturing headers and timing:

  • status

  • X-DjangoQueryCount-Count

  • Vary

  • Cache-Control

  • ETag

  • time_total

Example:

curl -sS -o /dev/null -D - -w 'time_total=%{time_total}\n' \
  http://127.0.0.1:8014/api/laws/

5. Use Silk SQL traces for root-cause analysis

For hotspots (especially /case/foo-case), Silk SQL traces were inspected to attribute remaining queries to:

  • view code

  • model methods

  • template-triggered calls

Findings

A. DevConfiguration profiling (with real cache behavior)

Endpoint results (cold -> warm)

Endpoint

Querycount

Time

/api/

2 -> 2

0.173s -> 0.043s

/api/cases/

6 -> 4

0.089s -> 0.042s

/api/laws/

6 -> 2

0.115s -> 0.066s

/api/law_books/

6 -> 2

0.070s -> 0.036s

/

8 -> 2

0.528s -> 0.044s

/case/

6 -> 2

0.109s -> 0.091s

/case/foo-case (before case-detail follow-up optimizations)

12 -> 8

0.091s -> 0.091s

Interpretation

  • Cache behavior is working for API and frontend pages under DevConfiguration.

  • Cached API endpoints show the expected Vary headers (including Authorization, Cookie, Accept-Language, Host).

  • Frontpage and case list show significant query-count reductions on warm requests.

  • Case detail remained the main hotspot after the initial pass.

B. Case Detail (/case/foo-case) Deep Dive and Optimizations

Warm-request SQL (before follow-up case optimizations)

Silk showed 3 application SQL queries on warm anonymous requests:

  1. public annotation markers (item.get_markers(request))

  2. related cases (item.get_related())

  3. references query (item.get_grouped_references() via template include)

This showed that the existing shared case cache only covered:

  • case object

  • reference markers

It did not yet cache template-driven shared data (references, related_cases) or anonymous marker/content rendering.

Optimization 2: Cache anonymous public markers + rendered content

Implemented anonymous-only shared caches for:

  • public markers list

  • annotated rendered content (insert_markers(...) result)

This is safe for anonymous users because marker visibility is public-only.

Final /case/foo-case result (anonymous)

  • cold: 12 querycount, ~0.613s

  • warm: 2 querycount, ~0.049s

Silk warm SQL after this step:

  • 0 application SQL queries

  • Remaining querycount is profiling instrumentation writes (Silk)

C. Cache Correctness Fixes Verified During Profiling

The branch now includes cache correctness protections that were important to verify while profiling:

  • API cache variance includes:

    • Authorization

    • Cookie

    • Accept-Language

    • Host

  • Role-based page cache keys vary by:

    • method

    • host

    • language

    • path

    • role

  • Role cache skips responses that:

    • set cookies

    • vary on cookie

    • are not cache-safe

This avoids cross-user/session and cross-locale/domain cache contamination while keeping aggressive caching enabled.

D. Remaining Bottlenecks / Next Targets

After the case-detail follow-up optimizations:

  • Warm anonymous /case/foo-case is effectively DB-free.

  • Remaining performance work is more likely CPU/template/middleware/compression than DB on warm hits.

  • Best next profiling target:

    • /search/?q=...

    • validate search facet cache hit behavior

    • inspect cache-miss CPU hotspots (facet processing and template rendering)

E. Continued Profiling: Search and Law Pages (same temp DB, DevConfiguration)

This follow-up pass profiled:

  • /search/?q=gg

  • /search/autocomplete?q=gg

  • /law/gg/ (law book page)

  • /law/gg/artikel-1 (law detail page)

using the same profiling DB and the same DevConfiguration methodology as above, with a fresh file cache directory.

Endpoint results (cold -> warm)

Endpoint

Querycount

Time

/search/?q=gg

2 -> 2

0.683s -> 0.042s

/search/autocomplete?q=gg

2 -> 4

0.064s -> 0.036s

/law/gg/

408 -> 2

1.615s -> 0.045s

/law/gg/artikel-1

22 -> 2

0.168s -> 0.034s

Silk request metadata (application SQL focus)

querycount includes profiling/instrumentation overhead, so Silk request metadata and SQL traces were used to measure application SQL:

  • /search/?q=gg

    • cold: meta_num_queries = 1

    • warm: meta_num_queries = 1

    • warm Silk SQL count: 0

  • /search/autocomplete?q=gg

    • cold: meta_num_queries = 1

    • warm: meta_num_queries = 1

    • warm Silk SQL count: 0

  • /law/gg/

    • cold: meta_num_queries = 226

    • warm: meta_num_queries = 1

    • warm Silk SQL count: 0

  • /law/gg/artikel-1

    • cold: meta_num_queries = 14

    • warm: meta_num_queries = 1

    • warm Silk SQL count: 0

Interpretation

  • Search and law pages are all effectively DB-free on warm cache hits (application SQL 0 in Silk traces).

  • /search/?q=gg shows a large cold-to-warm drop (~683ms -> ~42ms), which confirms the role-level page cache is paying off.

  • /law/gg/ is now the standout cold-path hotspot (very high SQL query volume before cache).

Search profiling caveat (important)

In this local profiling environment, Elasticsearch was not reachable and the server logged connection failures for the search requests. The search views still returned 200 via the application’s error-handling/fallback path.

Implication:

  • the search endpoint timings above are still useful for measuring page-cache impact and local fallback rendering behavior

  • they do not represent true indexed-search performance under a healthy Elasticsearch backend

F. New Bottleneck Identified: Cold /law/gg/ Page (LawBook.sections N+1)

Observed pattern

Silk SQL trace for cold /law/gg/ showed:

  • 203 SQL rows recorded

  • 226 application SQL queries (meta_num_queries)

  • 198 duplicate queries of the same form:

SELECT laws_lawbook.id, laws_lawbook.sections FROM laws_lawbook WHERE laws_lawbook.id = ... LIMIT 21

Root cause

This is a template-driven N+1 on the law book page:

  • oldp/apps/laws/templates/laws/book.html loops over items and calls item.get_section (oldp/apps/laws/templates/laws/book.html:18, oldp/apps/laws/templates/laws/book.html:21)

  • Law.get_section() calls self.book.get_sections() (oldp/apps/laws/models.py:378)

  • get_sections() accesses self.sections (oldp/apps/laws/models.py:142)

  • In view_book, the queryset defers book__sections via Law.defer_fields_list_view (oldp/apps/laws/views.py:107, oldp/apps/laws/models.py:285)

Because book__sections is deferred, each item.get_section call triggers a lazy refresh of LawBook.sections, causing repeated identical queries.

G. Smaller Cold-Path Pattern: Duplicate get_revision_dates() Queries on Law Pages

Both cold /law/gg/ and /law/gg/artikel-1 traces show duplicate revision-date queries:

  • SELECT laws_lawbook.revision_date ... WHERE code = ... ORDER BY revision_date DESC (executed twice)

Likely cause:

  • templates call book.get_revision_dates multiple times (iteration + length check), e.g. laws/book.html:43 and laws/book.html:49

  • LawBook.get_revision_dates() returns a queryset each time and does not cache/materialize (oldp/apps/laws/models.py:186)

This is a smaller issue than the book.sections N+1 and mostly affects cold requests, but it is still a clean optimization candidate (cache on instance or materialize once in the view).

H. Implemented Follow-up Fix for Law Pages and Re-Profiling Results

A follow-up patch was applied to remove the template-driven duplicate method calls on law pages without undefering book__sections on every Law row.

Implemented changes

  • view_book(...)

    • materialize revision_dates = list(book.get_revision_dates()) once

    • materialize items list once

    • compute item.display_section from book.get_sections() once in the view (avoid item.get_section() in template loop)

  • view_law(...)

    • materialize revision_dates = list(book.get_revision_dates()) once

    • materialize related_laws = item.get_related() once

  • templates updated to use revision_dates, related_laws, and item.display_section

This removes:

  • the /law/gg/ cold LawBook.sections N+1

  • duplicate get_revision_dates() calls in both law templates

  • duplicate item.get_related() calls in laws/law.html

Re-profiled results (same temp DB, DevConfiguration)

Endpoint

Before (cold -> warm)

After (cold -> warm)

/law/gg/ querycount

408 -> 2

10 -> 2

/law/gg/ time

1.615s -> 0.045s

0.676s -> 0.056s

/law/gg/artikel-1 querycount

22 -> 2

20 -> 2

/law/gg/artikel-1 time

0.168s -> 0.034s

0.154s -> 0.037s

Silk application SQL (post-fix)

  • /law/gg/

    • cold: meta_num_queries = 5 (down from 226)

    • warm: meta_num_queries = 1 (profiling write only)

    • warm Silk SQL count: 0

  • /law/gg/artikel-1

    • cold: meta_num_queries = 11 (down from 14)

    • warm: meta_num_queries = 1 (profiling write only)

    • warm Silk SQL count: 0

Remaining cold law-page queries after the fix

The largest law-book cold-path issue is resolved. Remaining cold queries on /law/gg/ are now small and expected (book lookup, count check, revision dates, law list query).

On /law/gg/artikel-1, the remaining cold queries include:

  • law + book fetches

  • related laws query

  • referencing cases query

  • revision-date query

These are much smaller than the removed book.sections N+1 and mostly only matter on cache misses.

I. Final Consolidated Post-Fix Sweep (API + Frontend + Laws + Search)

After the case-detail and law-page follow-up fixes, a final DevConfiguration profiling sweep was run against a fresh file-cache directory over:

  • /api/

  • /api/cases/

  • /api/laws/

  • /api/law_books/

  • /

  • /case/

  • /case/foo-case

  • /law/gg/

  • /law/gg/artikel-1

  • /search/?q=gg

  • /search/autocomplete?q=gg

Header/timing results (cold -> warm)

Endpoint

Querycount

Time

/api/

2 -> 2

0.162s -> 0.040s

/api/cases/

6 -> 2

0.080s -> 0.036s

/api/laws/

6 -> 2

0.117s -> 0.052s

/api/law_books/

6 -> 2

0.067s -> 0.039s

/

8 -> 2

0.514s -> 0.041s

/case/

6 -> 2

0.158s -> 0.045s

/case/foo-case

14 -> 2

0.118s -> 0.049s

/law/gg/

10 -> 2

0.121s -> 0.038s

/law/gg/artikel-1

20 -> 2

0.204s -> 0.035s

/search/?q=gg

2 -> 2

0.088s -> 0.046s

/search/autocomplete?q=gg

2 -> 4

0.061s -> 0.040s

Silk request metadata summary (application SQL)

For the latest warm requests in this sweep:

  • all endpoints above had warm_sql_count = 0 in Silk

  • meta_num_queries = 1 for most warm endpoints (profiling instrumentation write)

  • querycount header values > 1 on warm responses are instrumentation/middleware overhead, not app SQL

Selected cold meta_num_queries from this final sweep:

  • /api/cases/: 3

  • /api/laws/: 3

  • /api/law_books/: 4

  • /: 5

  • /case/: 4

  • /case/foo-case: 6

  • /law/gg/: 6

  • /law/gg/artikel-1: 10

Practical conclusion

For the anonymous request paths profiled here, the current branch has reached a strong state:

  • warm cache hits are effectively database-free across API and frontend pages tested

  • remaining performance work is mostly cold-path optimization and CPU/template/render work

J. Profiling Still Worth Doing (Not Completed in This Session)

The highest-value remaining profiling work is:

  1. Search with a real Elasticsearch backend available

    • current local search timings reflect fallback/error-handling behavior because Elasticsearch was unreachable

    • true search miss/hit performance (including facet processing under real results) still needs validation

  2. Authenticated and staff variants

    • cache variance/correctness has been fixed, but performance for:

      • authenticated non-staff users

      • staff users (extra annotation/marker queries and controls)

    • has not been profiled end-to-end in this session

  3. CPU profiling on warm pages

    • DB work is mostly eliminated on warm hits

    • next likely costs: template render, marker insertion, middleware stack, compression

    • Silk Python profiler reported environment conflicts earlier (Another profiling tool is already active), so a dedicated CPU-profiler pass may need a cleaner runtime

K. Phase 1 (Conservative) Implementation Update: Anonymous Search/API Speed

Implemented Django-only anonymous optimizations focused on public search endpoints and shared request patterns:

  • Added dispatch caching + safe Vary headers to public search APIs:

    • /api/cases/search/

    • /api/laws/search/

  • Hardened autocomplete cache behavior:

    • normalized query handling (trimmed input)

    • blank-query short-circuit ({"results": []})

    • cache key now varies by host and language

    • cache key normalization is case-insensitive (improves hit rate for anon traffic)

  • Small cold-path optimization for court case list pages:

    • CourtCasesListView now fetches court with select_related("state")

Validation completed in this session

Targeted tests (mocked search backend, no Elasticsearch required):

  • search API cache headers + response caching behavior

  • autocomplete normalization cache hit behavior

  • autocomplete blank-query short-circuit

  • autocomplete cache-key host/language variance

Validation still pending (for this phase)

  • ES-backed profiling of /api/cases/search/ and /api/laws/search/ under a reachable Elasticsearch backend

    • local environment still lacks reachable ES, so only mocked/test validation was performed for the search APIs

Test command used

source .venv/bin/activate
python manage.py test \
  oldp.apps.search.tests.test_api_cache_headers \
  oldp.apps.search.tests.test_views.MockedSearchViewsTestCase.test_autocomplete_blank_query_short_circuits \
  oldp.apps.search.tests.test_views.MockedSearchViewsTestCase.test_autocomplete_cache_key_varies_by_host_and_language \
  oldp.apps.search.tests.test_views.MockedSearchViewsTestCase.test_autocomplete_cache_normalizes_query \
  --verbosity 2

L. Phase 2 (Conservative) Implementation Update: Public Docs/Autocomplete + Small View Wins

Implemented additional low-risk anonymous optimizations that do not require Elasticsearch:

  • Wrapped public API schema/docs routes with cache_per_role(settings.CACHE_TTL):

    • /api/schema.json

    • /api/schema.yaml

    • /api/schema/ (Swagger UI)

    • /api/docs/ (ReDoc)

  • Wrapped courts autocomplete endpoints with cache_per_role(settings.CACHE_TTL):

    • /court/autocomplete/

    • /court/autocomplete/state/

  • Small cold-path query reduction:

    • CourtCasesListView court lookup now uses select_related("state")

  • Homepage anonymous rendering tightened:

    • latest cases query is sliced in the view (only fetch 10 rows)

    • homepage count cache keys now vary by host/language

Phase 2 profiling run (DevConfiguration, temp SQLite, no ES dependency)

Measured endpoints (cold -> warm):

Endpoint

Querycount

Time

/api/schema.json

2 -> 2

1.599s -> 0.055s

/api/schema.yaml

4 -> 2

1.564s -> 0.046s

/api/schema/

2 -> 2

0.057s -> 0.043s

/api/docs/

4 -> 2

0.046s -> 0.044s

/court/autocomplete/?q=ag

4 -> 2

0.045s -> 0.031s

/court/autocomplete/state/?q=ba

6 -> 2

0.047s -> 0.031s

/court/ag-aalen/

6 -> 2

0.427s -> 0.033s

Silk application SQL summary (Phase 2 endpoints)

All warm requests above had:

  • warm_sql_count = 0 (application SQL)

  • meta_num_queries = 1 for most endpoints (profiling instrumentation write)

Selected cold meta_num_queries:

  • /api/schema.json: 1

  • /api/schema.yaml: 1

  • /court/autocomplete/: 2

  • /court/autocomplete/state/: 3

  • /court/ag-aalen/: 4

Important header caveat for drf-yasg schema/docs

Even though internal Django caching now speeds up repeated schema/docs requests, drf-yasg responses still emit cache-preventing headers such as:

  • Cache-Control: max-age=0, no-cache, no-store, must-revalidate, private

Implication:

  • internal server-side caching works (as shown by timing improvements and Silk warm SQL = 0)

  • downstream/browser caching is still effectively disabled for these endpoints

If desired later, a follow-up can adjust schema/docs response headers explicitly (carefully) to enable browser/proxy caching.

Caveats

  • django-querycount numbers include profiling middleware/instrumentation overhead (especially Silk writes), so warm cached requests will not necessarily show 0.

  • Timing is local development timing (SQLite + dev server), useful for relative comparisons and regression detection, not production latency estimates.

Files Changed During This Profiling Work (Relevant to Findings)

  • oldp/settings.py (profiling/querycount runtime activation + querycount config keys)

  • oldp/utils/cache_per_user.py (cache correctness + variance)

  • oldp/apps/search/views.py (facet caching)

  • oldp/apps/laws/api_views.py (LawBookViewSet.defer(...))

  • oldp/apps/cases/api_views.py (API cache vary headers)

  • oldp/api/views.py (API cache vary headers)

  • oldp/apps/cases/views.py (case detail caching improvements)

  • oldp/apps/cases/templates/cases/case.html (avoid template-triggered related query)