BaseModel#

Module: leaspy.models.base Inherits from: ModelInterface

While ModelInterface defines the strict contract (the what), BaseModel provides the concrete implementation of the orchestration layer (the how). Every model inherits from BaseModel to gain the built-in infrastructure needed to run optimization algorithms like MCMC-SAEM without rewriting the boilerplate code.

The Bridge Between Model, Algorithm, and Data#

When you write model.fit(data), three components need to work together: the model (which defines the mathematical equations), the algorithm (which optimizes parameters), and the data (observations from patients). BaseModel acts as the bridge, so any algorithm can work with any model type.

Anatomy of fit(): The Three-Step Orchestration#

When you call:

model = LogisticModel(name="test-model", source_dimension=2)
model.fit(data, algorithm="mcmc_saem", n_iter=1000, seed=0)

BaseModel’s fit() method executes three critical steps:

1. Data Standardization#

The first step normalizes your input into a consistent format:

dataset = BaseModel._get_dataset(data)

You might pass a pandas DataFrame, a Leaspy Data object, or a Dataset directly. The algorithm doesn’t care about these differences — it always receives a standardized Dataset object. This abstraction allows algorithms to focus on optimization logic rather than data format handling. However models like JointModels need some specifications, so we advice to always give a Data object to your fit().

2. Model Initialization (First-Time Setup)#

if not self.is_initialized:
    self.initialize(dataset)

On the first call to fit(), BaseModel triggers the initialization process. While BaseModel handles the state flag (is_initialized), the actual calculation of starting parameters is delegated to the specific model class (e.g., LogisticInitializationMixin).

This step ensures:

  • Dimension validation: Verifying data matches model structure.

  • Paramater Initialization: Computing heuristics for starting values (implemented by subclasses).

The is_initialized flag ensures this setup happens only once.

3. Algorithm Factory and Execution#

algorithm = BaseModel._get_algorithm(algorithm, algorithm_settings, **kwargs)
algorithm.run(self, dataset)

Finally, BaseModel instantiates the requested algorithm (e.g., MCMC-SAEM) and hands over control.

Crucial Point: Once algorithm.run() is called, BaseModel’s job is done. The algorithm takes over the driver’s seat. It will call back into the model to perform specific mathematical operations (like updating parameters or computing likelihoods), but the loop itself belongs to the algorithm. BaseModel defines these methods as abstract interfaces, guaranteeing that any concrete model implementation will provide the operations needed by the algorithm.

See McmcSaemCompatibleModel to understand how the algorithm interacts with the model during the optimization loop.

Dimension vs Features: Providing the Output Structure#

It is crucial to distinguish between two concepts:

  1. Output Dimension (N): The number of observed variables (e.g., test scores) you want to predict.

  2. Source Dimension (K): The number of independent drivers (latent sources) in the model. This is a separate hyperparameter.

When configuring the model, you are setting the Output Dimension:

# Approach 1: Explicit dimension (names inferred later from data)
model = LogisticModel(name="test-model", dimension=4, source_dimension=2)

# Approach 2: Explicit names (dimension inferred from list length)
model = LogisticModel(name="test-model", features=["memory", "language", "motor", "behavior"], source_dimension=2)

In both cases, we are telling the model: “You will predict 4 outputs.”

  • Approach 1: The model waits until fit(data) to learn that the column names are “memory”, “language”, etc.

  • Approach 2: The model knows the names immediately. This is safer because it will throw an error if you accidentally pass a dataset with columns [“A”, “B”, “C”, “D”] instead of the expected [“memory”, …].

From Abstract to Concrete: The Inheritance Chain#

BaseModel is abstract — you cannot instantiate it directly. Concrete models like LogisticModel inherit from BaseModel through a chain of intermediate classes, each adding capabilities:

  • BaseModel: Provides fit() orchestration and abstract method contracts

  • StatefulModel: Adds parameter storage and state management

  • McmcSaemCompatibleModel: Implements methods needed specifically for MCMC-SAEM

  • LogisticModel: Implements the logistic sigmoid equation and parameter initialization

Each layer fulfills part of the contract BaseModel established. By the time you reach LogisticModel, all abstract methods have concrete implementations. This allows the algorithm to call methods like compute_individual_trajectory() and receive actual predictions based on the logistic curve formula.