Development Guide¶
1. Architecture and Boundaries¶
CoAuthors is a query frontend. It does not build DBLP data or run coauthorship computation itself. Responsibilities are split into:
- HTTP/UI layer (
app.py,templates/,static/) - Renders
GET /. - Exposes local runtime APIs (cache + telemetry).
- Runtime persistence layer (
runtime_store.py) - Stores visits, cache entries, query events, and logs in SQLite.
- Backend integration layer (
static/query_app.js+API_BASE_URL) - Forwards coauthor queries to DblpService
POST /api/coauthors/pairs.
Author resolution, coauthor pair computation, and DB constraints are implemented in CoAuthors/DblpService.
2. End-to-End Request Flow¶
2.1 Page Visit Flow¶
- Browser requests
GET /. app.pycallsRuntimeStore.record_page_visit()to writepage_visitsand incrementvisit_count.- Server renders
index.htmlwith: app_versionvisit_countapi_base(fromAPI_BASE_URL)
2.2 Coauthor Query Flow (with cache)¶
- Frontend reads left/right inputs and splits lines (
parseLines). - Each author entry is normalized (
sanitizeAuthorEntries): - trim and collapse whitespace;
- strip organization suffixes (for example
Name || Org,Name (Org)); - de-duplicate while preserving first occurrence order.
- Build payload (
left/right/exact_base_match/limit_per_pair/author_limit/year_min). - Build cache key:
pairs:v1:<fnv1a32(json_payload)>. - Read local runtime cache via
POST /api/runtime/cache/get. - On cache hit, render directly; on miss, call DblpService
POST /api/coauthors/pairs. - Asynchronously write successful remote result to cache via
POST /api/runtime/cache/put. - Report telemetry via
POST /api/runtime/query/eventfor both success and failure.
Important: cache write and telemetry reporting are best-effort and non-blocking. Failures are swallowed and do not fail the main query path.
3. Cache Design (Detailed)¶
3.1 Cache Layers¶
- L1 (frontend memory state): in-page render state (for example selected pair state).
- L2 (SQLite persistent cache):
query_cache, effective across requests and page refreshes.
3.2 L2 Key/Value¶
- Key:
cache_key(current namespacepairs:v1:*). - Value: full JSON response (
response_json). - Hit metadata:
hit_count,last_hit_at.
3.3 Invalidation Behavior¶
- There is currently no TTL, no LRU, and no capacity limit.
- Same key overwrites old value (
ON CONFLICT DO UPDATE). - If underlying DBLP data changes, cache is not auto-invalidated. You must clear cache manually or bump key namespace (for example
pairs:v2).
4. Concurrency Model and Overload Behavior (Wait / Reject / Degrade)¶
| Scenario | Constraint | Behavior when exceeded | Result |
|---|---|---|---|
| Frontend author count | max 50 per side |
reject in browser (no request sent) | UI error |
| DblpService author count | max MAX_ENTRIES_PER_SIDE (default 50, hard-capped at 50) |
immediate reject | 400 |
limit_per_pair |
clamped to [1, MAX_LIMIT] (default MAX_LIMIT=200) |
no reject, auto-clamp | 200 |
author_limit |
clamped to max MAX_AUTHOR_RESOLVE (default 800) |
no reject, auto-clamp | 200 |
| Runtime SQLite write contention | sqlite timeout=30s + WAL |
wait for lock first | timeout raises exception (typically 500) |
| DblpService DB lock contention | PRAGMA busy_timeout=30000 |
wait for lock first | timeout fails (commonly 500; DB unavailable can be 503) |
| Pipeline start while running | only one pipeline thread allowed | immediate reject | 409 Pipeline is already running |
| Pipeline reset while running | reset forbidden during running | immediate reject | 409 Cannot reset while running |
| Runtime cache read failure | none | degrade to direct backend query | request continues |
| Runtime cache write / telemetry failure | none | ignore failure | request continues |
Summary: request concurrency is mainly handled by waiting on DB locks; business limit violations are immediate rejects; runtime-observability failures are handled with degradation.
5. API Validation and Error Semantics¶
5.1 CoAuthors runtime APIs (app.py)¶
POST /api/runtime/cache/getkey: length1..256- invalid input returns
422via FastAPI/Pydantic POST /api/runtime/cache/putkey: length1..256data: JSON object- blank key after trim returns
400 POST /api/runtime/query/eventleft_count/right_count:0..500total_pairs:0..250000duration_ms:0..86400000error_message: max length1000- out-of-range values return
422
5.2 DblpService query API¶
POST /api/coauthors/pairsrequires both left and right lists non-empty; otherwise400.- Missing database file or incomplete schema returns
503.
6. Runtime Tables and Observability¶
runtime_store.py initializes:
runtime_counterspage_visitsquery_cachequery_eventsevent_logs
Recommended KPIs:
- query volume:
query_event_count - cache hits:
cache_hit_count - cache writes:
cache_write_count - cache size: row count of
query_cache - error ratio:
query_events.success=0ratio
7. Extension Guidelines¶
- Keep business/query logic in DblpService; CoAuthors should stay orchestration + presentation only.
- Bump cache key namespace when cache semantics change (for example
pairs:v2). - Prefer appending telemetry fields in
query_events.extra_jsonto avoid schema churn. - For any new high-cost feature, define and document its overload strategy explicitly: wait, reject, or degrade.