# Observation Models **Modules:** `leaspy.models.obs_models` While the **Logistic Model** defines the ideal, noise-free disease trajectory, real-world data is messy. The **Observation Model** bridges this gap by defining the probability of observing specific data points given the model's prediction: $$ P(y_{observed} | y_{model}) $$ ## The Abstract Base: `ObservationModel` This interface describes how data "attaches" to the model, formally defining the **Negative Log-Likelihood (NLL)** that algorithms minimize. Its main responsibilities are: 1. **Data Connection**: Extracts relevant features from the raw `Dataset`. 2. **Likelihood Computation**: Defines the statistical distribution of the residuals (Gaussian, Bernoulli, or Weibull depending on the model). 3. **Variable Generation**: Creates the `nll_attach` variables in the computational graph (DAG). ## The Standard: `GaussianObservationModel` Leaspy primarily uses a Gaussian observation model. It assumes that the observed data is simply the model's prediction plus random noise: $$ y_{observed} = y_{model}(t) + \epsilon, \quad \epsilon \sim \mathcal{N}(0, \sigma^2) $$ > **Note**: The word "Logistic" in `LogisticModel` refers to the *shape of the mean prediction* $y_{model}(t)$ (a sigmoid curve). The noise on top of that prediction is always Gaussian. These are two independent choices. ### Handling Noise (`FullGaussianObservationModel`) The estimation of the noise level $\sigma$ is integrated directly into the model fitting (in the M-step). We support two different noise structures, regardless of how many features the model has: * **Scalar Noise (Homoscedastic)**: The model estimates a **single global $\sigma$** shared by all features. This constrains the model to assume that every biomarker has the same noise level. * **Diagonal Noise (Heteroscedastic)**: The model estimates a **distinct $\sigma_k$ for each feature**. This is crucial for multivariate models where some sources of data might be much noisier than others. **Why does this matter?** The noise level acts as a natural "weighting" mechanism. The actual NLL that the algorithm minimizes per observation is (from `NormalFamily._nll`): $$ \text{NLL}_k = \frac{1}{2}\left(\frac{y_k - \mu_k}{\sigma_k}\right)^2 + \ln(\sigma_k) + \frac{1}{2}\ln(2\pi) $$ Notice the **two competing terms**: * $\frac{1}{2}\left(\frac{y_k - \mu_k}{\sigma_k}\right)^2$ — pushes $\sigma_k$ **down** to penalize large residuals. * $\ln(\sigma_k)$ — pushes $\sigma_k$ **up**, preventing it from trivially going to $0$ or being used to ignore any feature. This balance is what allows $\sigma_k$ to be meaningfully estimated during fitting. If the model learns that Feature A is very noisy (large $\sigma_A$) and Feature B is clean (small $\sigma_B$), it will prioritize fitting Feature B accurately, while being more forgiving of deviations in Feature A.