Architecture & Data Flow#

This section provides a simplified explanation of Leaspy’s internal architecture from a code perspective. Even if this guide could seem long and tedious, we simplified the work for over 200 files and dousents of functions in just some modules. You have two ways to read it: taking a look to the simplified versions here, or take a look to the modules you want to have a deeper inderstanding about how works leaspy inside. If you are not going to developpe new features/methods, the overview is enough.

Key steps include:

  1. Data Preparation: Adapting raw data and putting it into the Data format.

  2. Model Fitting: Creating a model (e.g. model = LogisticModel(...)) and fitting it (model.fit(...)).

  3. Personalization (Optional): Estimating individual trajectories for each patient (model.personalize(...)).

This guide offers two levels of depth: a global explanation and a detailed deep-dive. Feel free to explore according to your needs.

High-Level Overview#

When you run a method, a lot happens under the hood. Here is a simplified code snippet that runs a LogisticModel:

from leaspy.io.data import Data
from leaspy.models import LogisticModel

data = Data.from_dataframe(alzheimer_df)		# Data part
model = LogisticModel(name="test-model", 
	source_dimension=2)			# Model creation
model.fit(data, "mcmc_saem", seed=42,
    n_iter=100,progress_bar=False)		# Fitting

Inside the leaspy library, most of the code you interact with is organized into modules, which contain classes. A class bundles methods (functions attached to the class) and attributes (data stored on the object). For example, the LogisticModel.py module defines two classes, each providing its own methods and attributes.

If you’re a bit lost, you can check what a module, class, method, and attribute are.

You can visualize the structure of the logistic (inside /src/leaspy/models/logistic.py) module like this:

        %%{init: {"flowchart": {"rankSpacing": 10, "nodeSpacing": 20}} }%%
flowchart TD
    %% Softer cohesive palette (indigo/lilac/teal/sand)
    classDef module fill:#EEF2FF,stroke:#4F46E5,stroke-width:2px,color:#1F2A5A,rx:8,ry:8;
    classDef cls    fill:#F3E8FF,stroke:#7C3AED,stroke-width:2px,color:#3B0764,rx:8,ry:8;
    classDef method fill:#E6FFFB,stroke:#0F766E,stroke-width:1px,color:#134E4A,rx:6,ry:6;
    classDef attr   fill:#FFF7ED,stroke:#C2410C,stroke-width:1px,color:#7C2D12,rx:6,ry:6;

    %% Micro spacer (nearly zero height)
    classDef micro fill:transparent,stroke:transparent,color:transparent,font-size:1px;

    %% Module Subgraph
    subgraph Module["__Module: logistic(.py)__"]
        direction TB

        %% tiny spacer so the subgraph title doesn't get overlapped
        ModuleMicro["."]:::micro

        %% Class 1: Mixin (no pad inside -> less empty space)
        subgraph ClassMixin["__Class: LogisticInitializationMixin__"]
            direction TB
            Method1("Method: _compute_initial_values_for_model_parameters"):::method
        end

        %% Class 2: Model
        subgraph ClassModel["__Class: LogisticModel__"]
            direction TB
            Attr1("Attribute: name"):::attr
            Method2("Method: \__init__"):::method
            Method3("Method: get_variables_specs"):::method
            Method4("Method: metric"):::method
            Method5("Method: model_with_sources"):::method

            %% Force vertical listing inside the class
            Attr1 --> Method2 --> Method3 --> Method4 --> Method5
        end

        %% Force vertical stacking of the two class subgraphs
        ClassMixin --> ClassModel

        %% Anchor the micro node so it actually affects layout
        ModuleMicro --> ClassMixin
    end

    %% Apply Classes
    Module:::module
    ClassMixin:::cls
    ClassModel:::cls

    %% Hide all layout-forcing edges (keeps it clean)
    linkStyle default stroke:transparent,stroke-width:0;
    

This example allows us to see how is structured a module, we will go deeper in the other modules that compose the worflow from leaspy, specially in the most the most simple scenario: a logistic regression.

Simplified workflow structure#

When you create your model and you fit it a lot happens under the hood. For instance LogisticModel inherits methods and attributes from other classes in a chain. LogisticModel inherits from RiemanianManifoldModel, which inherits from other classes, and so on.

Here is the inheritance chain for the Logistic model. You can click on the nodes to see the details of each class. The diagram in the “Simplified version” tab highlights the essential modules for a standard logistic regression execution, offering a cleaner starting point. The “Complete version” shows the full inheritance hierarchy.

Why This Architecture?#

While you could theoretically write a LogisticModel as one massive 5000-line class, Leaspy breaks it down into a compositional inheritance chain. Each class in the diagram above adds a specific layer of capability:

  • Reusability: LinearModel and JointModel reuse 90% of the same code as LogisticModel (parameter storage, algorithm compatibility). They only override the final mathematical formulas.

  • Extensibility: If you want to create a model with a different time behavior, you don’t start from scratch. You might branch off after McmcSaemCompatibleModel and implement your own time reparameterization, while keeping all the algorithm compatibility for free.

This “layer” approach means complex features (like Riemannian manifold geometry or MCMC sampling) are implemented once and shared across all models, rather than being copy-pasted into every new model type.

You can read the Simplified Overview below for a quick summary of how all these pieces fit together, or follow the path in the table of contents to explore each module in depth one by one.

Simplified Overview

To perform a logistic regression, Leaspy coordinates a stack of specialized modules that transform raw data into a mathematical trajectory.

The Mathematical Core It begins with the Observation Model, which links your noisy measurements to the theoretical curves. Underlying this is the Logistic Model, which imposes the specific S-shape of the progression, supported by the Riemannian Manifold Model which handles the geometric mixing of multiple biomarkers. The Time Reparametrized Model personalizes this process by warping the timeline for each subject.

The Software Infrastructure Supporting this math is a robust backend. ModelInterface and BaseModel define the standard blueprint and orchestration logic (like .fit()). StatefulModel acts as the model’s memory, holding the actual values of parameters during execution. Finally, McmcSaemCompatibleModel acts as a translator, ensuring the model provides the specific statistics needed by the generic MCMC-SAEM optimization algorithm.

If you want more details about a specific module, you can click on its corresponding node in the index. For now, let’s start with the base of the inheritance chain: ModelInterface.