# Architecture & Data Flow This section provides a simplified explanation of Leaspy's internal architecture from a code perspective. Even if this guide could seem long and tedious, we simplified the work for over 200 files and dousents of functions in just some modules. You have two ways to read it: taking a look to the simplified versions here, or take a look to the modules you want to have a deeper inderstanding about how works leaspy inside. If you are not going to developpe new features/methods, the overview is enough. Key steps include: 1. **Data Preparation**: Adapting raw data and putting it into the `Data` format. 2. **Model Fitting**: Creating a model (e.g. `model = LogisticModel(...)`) and fitting it (`model.fit(...)`). 3. **Personalization** (Optional): Estimating individual trajectories for each patient (`model.personalize(...)`). This guide offers two levels of depth: a global explanation and a detailed deep-dive. Feel free to explore according to your needs. ## High-Level Overview When you run a method, a lot happens under the hood. Here is a simplified code snippet that runs a `LogisticModel`: ```python from leaspy.io.data import Data from leaspy.models import LogisticModel data = Data.from_dataframe(alzheimer_df) # Data part model = LogisticModel(name="test-model", source_dimension=2) # Model creation model.fit(data, "mcmc_saem", seed=42, n_iter=100,progress_bar=False) # Fitting ``` Inside the `leaspy` library, most of the code you interact with is organized into **modules**, which contain **classes**. A class bundles **methods** (functions attached to the class) and **attributes** (data stored on the object). For example, the `LogisticModel.py` module defines two classes, each providing its own methods and attributes. > If you’re a bit lost, you can check what a [module](https://docs.python.org/3/glossary.html#term-module), [class](https://docs.python.org/3/glossary.html#term-class), [method](https://docs.python.org/3/glossary.html#term-method), and [attribute](https://docs.python.org/3/glossary.html#term-attribute) are. You can visualize the structure of the `logistic` (*inside /src/leaspy/models/logistic.py*) module like this: ```{mermaid} %%{init: {"flowchart": {"rankSpacing": 10, "nodeSpacing": 20}} }%% flowchart TD %% Softer cohesive palette (indigo/lilac/teal/sand) classDef module fill:#EEF2FF,stroke:#4F46E5,stroke-width:2px,color:#1F2A5A,rx:8,ry:8; classDef cls fill:#F3E8FF,stroke:#7C3AED,stroke-width:2px,color:#3B0764,rx:8,ry:8; classDef method fill:#E6FFFB,stroke:#0F766E,stroke-width:1px,color:#134E4A,rx:6,ry:6; classDef attr fill:#FFF7ED,stroke:#C2410C,stroke-width:1px,color:#7C2D12,rx:6,ry:6; %% Micro spacer (nearly zero height) classDef micro fill:transparent,stroke:transparent,color:transparent,font-size:1px; %% Module Subgraph subgraph Module["__Module: logistic(.py)__"] direction TB %% tiny spacer so the subgraph title doesn't get overlapped ModuleMicro["."]:::micro %% Class 1: Mixin (no pad inside -> less empty space) subgraph ClassMixin["__Class: LogisticInitializationMixin__"] direction TB Method1("Method: _compute_initial_values_for_model_parameters"):::method end %% Class 2: Model subgraph ClassModel["__Class: LogisticModel__"] direction TB Attr1("Attribute: name"):::attr Method2("Method: \__init__"):::method Method3("Method: get_variables_specs"):::method Method4("Method: metric"):::method Method5("Method: model_with_sources"):::method %% Force vertical listing inside the class Attr1 --> Method2 --> Method3 --> Method4 --> Method5 end %% Force vertical stacking of the two class subgraphs ClassMixin --> ClassModel %% Anchor the micro node so it actually affects layout ModuleMicro --> ClassMixin end %% Apply Classes Module:::module ClassMixin:::cls ClassModel:::cls %% Hide all layout-forcing edges (keeps it clean) linkStyle default stroke:transparent,stroke-width:0; ``` This example allows us to see how is structured a module, we will go deeper in the other modules that compose the worflow from `leaspy`, specially in the most the most simple scenario: a logistic regression. ## Simplified workflow structure When you create your model and you fit it a lot happens under the hood. For instance `LogisticModel` inherits methods and attributes from other classes in a chain. `LogisticModel` inherits from `RiemanianManifoldModel`, which inherits from other classes, and so on. Here is the inheritance chain for the Logistic model. You can click on the nodes to see the details of each class. The diagram in the "Simplified version" tab highlights the essential modules for a standard logistic regression execution, offering a cleaner starting point. The "Complete version" shows the full inheritance hierarchy. `````{tabs} ````{tab} Simplified version ```{image} ../../_static/images/arch1.png :alt: Simplified Inheritance Diagram :align: center :width: 50% ``` ```` ````{tab} Complete version ```{image} ../../_static/images/arch2.png :alt: Complete Inheritance Diagram :align: center :width: 80% ``` ```` ````` ## Why This Architecture? While you could theoretically write a `LogisticModel` as one massive 5000-line class, Leaspy breaks it down into a **compositional inheritance chain**. Each class in the diagram above adds a specific layer of capability: * **Reusability**: `LinearModel` and `JointModel` reuse 90% of the same code as `LogisticModel` (parameter storage, algorithm compatibility). They only override the final mathematical formulas. * **Extensibility**: If you want to create a model with a different time behavior, you don't start from scratch. You might branch off after `McmcSaemCompatibleModel` and implement your own time reparameterization, while keeping all the algorithm compatibility for free. This "layer" approach means complex features (like Riemannian manifold geometry or MCMC sampling) are implemented once and shared across all models, rather than being copy-pasted into every new model type. You can read the **Simplified Overview** below for a quick summary of how all these pieces fit together, or follow the path in the table of contents to explore each module in depth one by one. ```{dropdown} Simplified Overview :color: primary :icon: info To perform a logistic regression, Leaspy coordinates a stack of specialized modules that transform raw data into a mathematical trajectory. **The Mathematical Core** It begins with the **Observation Model**, which links your noisy measurements to the theoretical curves. Underlying this is the **Logistic Model**, which imposes the specific S-shape of the progression, supported by the **Riemannian Manifold Model** which handles the geometric mixing of multiple biomarkers. The **Time Reparametrized Model** personalizes this process by warping the timeline for each subject. **The Software Infrastructure** Supporting this math is a robust backend. **ModelInterface** and **BaseModel** define the standard blueprint and orchestration logic (like `.fit()`). **StatefulModel** acts as the model's memory, holding the actual values of parameters during execution. Finally, **McmcSaemCompatibleModel** acts as a translator, ensuring the model provides the specific statistics needed by the generic MCMC-SAEM optimization algorithm. ``` If you want more details about a specific module, you can click on its corresponding node in the index. For now, let's start with the base of the inheritance chain: [`ModelInterface`](logistic/ModelInterface.md).