Observation Models#
Modules: leaspy.models.obs_models
While the Logistic Model defines the ideal, noise-free disease trajectory, real-world data is messy. The Observation Model bridges this gap by defining the probability of observing specific data points given the model’s prediction:
The Abstract Base: ObservationModel#
This interface describes how data “attaches” to the model, formally defining the Negative Log-Likelihood (NLL) that algorithms minimize. Its main responsibilities are:
Data Connection: Extracts relevant features from the raw
Dataset.Likelihood Computation: Defines the statistical distribution of the residuals (Gaussian, Bernoulli, or Weibull depending on the model).
Variable Generation: Creates the
nll_attachvariables in the computational graph (DAG).
The Standard: GaussianObservationModel#
Leaspy primarily uses a Gaussian observation model. It assumes that the observed data is simply the model’s prediction plus random noise:
Note: The word “Logistic” in
LogisticModelrefers to the shape of the mean prediction \(y_{model}(t)\) (a sigmoid curve). The noise on top of that prediction is always Gaussian. These are two independent choices.
Handling Noise (FullGaussianObservationModel)#
The estimation of the noise level \(\sigma\) is integrated directly into the model fitting (in the M-step). We support two different noise structures, regardless of how many features the model has:
Scalar Noise (Homoscedastic): The model estimates a single global \(\sigma\) shared by all features. This constrains the model to assume that every biomarker has the same noise level.
Diagonal Noise (Heteroscedastic): The model estimates a distinct \(\sigma_k\) for each feature. This is crucial for multivariate models where some sources of data might be much noisier than others.
Why does this matter?
The noise level acts as a natural “weighting” mechanism. The actual NLL that the algorithm minimizes per observation is (from NormalFamily._nll):
Notice the two competing terms:
\(\frac{1}{2}\left(\frac{y_k - \mu_k}{\sigma_k}\right)^2\) — pushes \(\sigma_k\) down to penalize large residuals.
\(\ln(\sigma_k)\) — pushes \(\sigma_k\) up, preventing it from trivially going to \(0\) or being used to ignore any feature.
This balance is what allows \(\sigma_k\) to be meaningfully estimated during fitting. If the model learns that Feature A is very noisy (large \(\sigma_A\)) and Feature B is clean (small \(\sigma_B\)), it will prioritize fitting Feature B accurately, while being more forgiving of deviations in Feature A.