← Tepna preprints

The interesting part isn't the waveform: what Tepna's deterministic harness suggests about the open frontier in synthetic physiological data

Tepna Project  ·  [corresponding author — to be assigned]

Tepna physiological-signal suite — cross-node perspective

Perspective v1 · June 2026 · No new analysis tool · scope note: parked for later, out of current build scope

Abstract

Single-signal synthetic physiological generation — plausible-looking ECG, PPG, SpO₂, or CGM traces from GANs and diffusion models — is a mature research area; reproducing one channel that looks right is largely a solved problem. Yet the failures that actually break a multi-device analysis suite live almost entirely outside the waveform: in timestamp pathology, in cross-modal temporal coherence, and in the provenance metadata that lets a derived metric be trusted. Tepna's harness already models these as first-class concerns — a frozen Clock Contract, an event-bus currency that reconstructs absolute time across nodes, and a build-hash provenance gate — and does so deterministically, without any learned generator. This note records, while the observation is fresh, four directions where synthetic-data work is still genuinely open, why Tepna is unusually well-positioned to study them, and which would be worth a real paper once the suite's core scope is met. This is a perspective, not a study; it makes no empirical claims.

Keywords: synthetic data · physiological time-series · multimodal fusion · timestamp normalization · data provenance · deterministic generation · benchmarking

1. Why write this down now

The question that prompted this note was simply whether Tepna is interesting for synthetic-data work, or whether the field has moved past it. The honest answer is split, and the split is itself the contribution: the obvious target (better-looking signals) is largely done elsewhere, while the targets Tepna happens to take seriously (time, cross-modal coherence, provenance) remain under-served. That asymmetry is easy to lose once the build resumes, so it is recorded here as a parked agenda rather than pursued — explicitly out of the current scope.

2. What is already solved (and not our frontier)

Generating a single physiological channel that passes visual and short-horizon statistical inspection is well-trodden: adversarial and diffusion models for ECG morphology, PPG synthesis, and CGM trajectory generation are established. If the goal were "produce fake vitals that look real," off-the-shelf approaches suffice and Tepna adds little. We therefore treat per-channel realism as a solved input, not a research question, and direct attention to the layers above it.

3. The open frontier, as the harness exposes it

  1. Timestamp pathology as a modeled phenomenon openSynthetic corpora almost never model how clocks actually misbehave — yet multi-device fusion breaks here, not in the morphology. Tepna's Clock Contract enumerates the failure surface precisely: floating wall-clock time with no zone, DMY/MDY ambiguity resolved per-file, monotonic midnight rollover from time-only stamps, zoned-versus-naive equivalence, and stamp-less rows that must read as null rather than "now". A synthetic generator that reproduces O2Ring vs Welltory vs Polar Sensor Logger stamp quirks would be more valuable for testing real pipelines than any prettier waveform.
  2. Cross-node temporal coherence openThe hard problem is not one stream but several that must stay mutually consistent: SpO₂, HRV, raw ECG, and CGM sharing physiology, drift, and anchors so an event on one channel lands at the right instant on the others. Tepna's Ganglior event bus already defines this currency — startEpochMs plus wall-clock event strings reconstructed to absolute time, monotonic past midnight. Correlated multimodal synthesis with shared temporal anchors is substantially harder than single-channel generation and remains largely unaddressed.
  3. Provenance as a data property openMost synthetic datasets ship a tensor and nothing about its trustworthiness. Tepna carries two metadata layers that synthetic work usually omits entirely: a build-hash provenance gate that ties every export to a reproducible build skeleton (the hash fingerprints the bundle template, not the executed code — itself an instructive limitation), and a 5-level evidence ladder grading each derived metric by how well-validated it is. Synthesizing data that also carries faithful, gradeable provenance — including realistic degradation of that provenance — is an unexplored direction.
  4. Deterministic generation as a benchmark substrate tractable nowTepna's cohort harness generates stress fixtures deterministically — patient k is a pure function of k — with no model training, so a fixture set is byte-reproducible and auditable. This is the right tool for adversarially probing parsers and fusion logic (malformed stamps, wrapped clocks, mixed locales, dropped channels). The open question is whether a deterministic, fully-specified fixture suite can serve as a shared "where do consumer multi-signal pipelines break" benchmark — complementary to, and more reproducible than, a learned generator.

4. A rough map of effort vs novelty

Table 1. Where each layer sits. "Maturity" is the field's, not Tepna's; "fit" is how directly the existing harness supports the work.
LayerField maturityHarness fitNeeds ML?
Single-channel waveform realismmaturelowyes
Timestamp / clock pathologyopenhighno
Cross-modal temporal coherenceopenhighpartly
Provenance & evidence metadataopenhighno
Deterministic fixture benchmarkemerginghighno

Three of the four open directions need no learned generator at all — they are specification, generation, and verification problems, which is exactly what the deterministic cohort harness is good at. That keeps any future work cheap to start and reproducible by construction.

5. What a real paper would look like (later)

The most defensible first paper is the narrowest: a deterministic timestamp-pathology benchmark. Enumerate the Clock Contract's failure surface, generate a fixture corpus that exercises each case with known ground truth, run the production parsers against it, and report where consumer-style parsers (and, separately, naive new Date(str) baselines) diverge from the contract. It needs no cohort physiology, no model training, and grounds out entirely in code Tepna already ships. The multimodal-coherence and provenance-degradation directions are larger and better deferred until the cross-node Integrator scope is complete.

Scope note. This is a perspective and parking document, not a result. Nothing here is measured; the "open" labels reflect a reading of the field, not a survey. Its only job is to keep the observation — that Tepna's value to synthetic-data research is in time, coherence, and provenance rather than waveforms — from being rediscovered later. Pursue only after the suite's core build scope is fulfilled.

6. Reproducibility

References

  1. Project documentation: CLAUDE.md (Clock Contract, provenance & evidence gates), INTEGRATOR-BUILD-BRIEF.md, COHORT-WORKFLOW-GUIDE.md, Tepna suite.
  2. Survey references on GAN/diffusion synthesis of ECG, PPG, and CGM time-series — to be added if this is promoted to a study.
  3. References on multimodal time-series alignment and on synthetic-data provenance/benchmarking — to be added at submission.
T © 2026 Michal Planicka ·Tepna v1.0.0 ·Apache-2.0 ·◈ Asheville, NC ·not a medical device