# BaseModel **Module:** `leaspy.models.base` **Inherits from:** [`ModelInterface`](ModelInterface.md) While [`ModelInterface`](ModelInterface.md) defines the strict contract (the *what*), `BaseModel` provides the concrete implementation of the orchestration layer (the *how*). Every model inherits from BaseModel to gain the built-in infrastructure needed to run optimization algorithms like MCMC-SAEM without rewriting the boilerplate code. ## The Bridge Between Model, Algorithm, and Data When you write `model.fit(data)`, three components need to work together: the **model** (which defines the mathematical equations), the **algorithm** (which optimizes parameters), and the **data** (observations from patients). BaseModel acts as the bridge, so any algorithm can work with any model type. ## Anatomy of fit(): The Three-Step Orchestration When you call: ```python model = LogisticModel(name="test-model", source_dimension=2) model.fit(data, algorithm="mcmc_saem", n_iter=1000, seed=0) ``` BaseModel's `fit()` method executes three critical steps: ### 1. Data Standardization The first step normalizes your input into a consistent format: ```python dataset = BaseModel._get_dataset(data) ``` You might pass a pandas DataFrame, a Leaspy `Data` object, or a `Dataset` directly. The algorithm doesn't care about these differences — it always receives a standardized `Dataset` object. This abstraction allows algorithms to focus on optimization logic rather than data format handling. However models like `JointModels` need some specifications, so we advice to always give a `Data` object to your `fit()`. ### 2. Model Initialization (First-Time Setup) ```python if not self.is_initialized: self.initialize(dataset) ``` On the first call to `fit()`, `BaseModel` triggers the initialization process. While `BaseModel` handles the state flag (`is_initialized`), the actual calculation of starting parameters is **delegated** to the specific model class (e.g., [`LogisticInitializationMixin`](LogisticInitializationMixin.md)). This step ensures: - **Dimension validation**: Verifying data matches model structure. - **Paramater Initialization**: Computing heuristics for starting values (implemented by subclasses). The `is_initialized` flag ensures this setup happens only once. ### 3. Algorithm Factory and Execution ```python algorithm = BaseModel._get_algorithm(algorithm, algorithm_settings, **kwargs) algorithm.run(self, dataset) ``` Finally, `BaseModel` instantiates the requested algorithm (e.g., MCMC-SAEM) and hands over control. **Crucial Point**: Once `algorithm.run()` is called, **BaseModel's job is done**. The algorithm takes over the driver's seat. It will call back into the model to perform specific mathematical operations (like updating parameters or computing likelihoods), but the *loop itself* belongs to the algorithm. BaseModel defines these methods as abstract interfaces, guaranteeing that any concrete model implementation will provide the operations needed by the algorithm. See [`McmcSaemCompatibleModel`](McmcSaemCompatibleModel.md) to understand how the algorithm interacts with the model during the optimization loop. ## Dimension vs Features: Providing the Output Structure It is crucial to distinguish between two concepts: 1. **Output Dimension (N)**: The number of observed variables (e.g., test scores) you want to predict. 2. **Source Dimension (K)**: The number of independent drivers (latent sources) in the model. *This is a separate hyperparameter.* When configuring the model, you are setting the **Output Dimension**: ```python # Approach 1: Explicit dimension (names inferred later from data) model = LogisticModel(name="test-model", dimension=4, source_dimension=2) # Approach 2: Explicit names (dimension inferred from list length) model = LogisticModel(name="test-model", features=["memory", "language", "motor", "behavior"], source_dimension=2) ``` In both cases, we are telling the model: *"You will predict 4 outputs."* * **Approach 1**: The model waits until `fit(data)` to learn that the column names are "memory", "language", etc. * **Approach 2**: The model knows the names immediately. This is safer because it will throw an error if you accidentally pass a dataset with columns ["A", "B", "C", "D"] instead of the expected ["memory", ...]. ## From Abstract to Concrete: The Inheritance Chain BaseModel is abstract — you cannot instantiate it directly. Concrete models like LogisticModel inherit from BaseModel through a chain of intermediate classes, each adding capabilities: - **BaseModel**: Provides `fit()` orchestration and abstract method contracts - **StatefulModel**: Adds parameter storage and state management - **McmcSaemCompatibleModel**: Implements methods needed specifically for MCMC-SAEM - **LogisticModel**: Implements the logistic sigmoid equation and parameter initialization Each layer fulfills part of the contract BaseModel established. By the time you reach LogisticModel, all abstract methods have concrete implementations. This allows the algorithm to call methods like `compute_individual_trajectory()` and receive actual predictions based on the logistic curve formula.