EmbeddingELModel

class mowl.base_models.EmbeddingELModel(dataset, embed_dim, batch_size, extended=True, model_filepath=None, load_normalized=False, device='cpu', learning_rate=0.001, neg_sampling_gcis=None)[source]

Bases: Model

Abstract class for \(\mathcal{EL}\) embedding methods.

Parameters:
  • dataset (mowl.datasets.Dataset) – mOWL dataset to use for training and evaluation.

  • embed_dim (int) – The embedding dimension.

  • batch_size (int) – The batch size to use for training.

  • extended (bool, optional) – If True, the model is supposed with 7 EL normal forms. This will be reflected on the DataLoaders that will be generated and also the model must contain 7 loss functions. If False, the model will work with 4 normal forms only, merging the 3 extra to their corresponding origin normal forms. Defaults to True

  • load_normalized (bool, optional) – If True, the ontology is assumed to be normalized and GCIs are extracted directly. Defaults to False.

  • device (str, optional) – The device to use for training. Defaults to “cpu”.

  • neg_sampling_gcis (list of str, optional) – List of GCI names for which negative sampling should be applied during training. If None (default), negative sampling is applied automatically to all GCIs declared in the module’s neg_capable_gcis (i.e. only what the module actually supports). Pass an explicit list to override this — a NotImplementedError is raised at the start of training if any requested GCI is not in neg_capable_gcis. Bot GCIs ("gci0_bot", "gci1_bot", "gci3_bot") are never subject to negative sampling.

Changed in version 2.0.0: Added the ‘load_normalized’ parameter.

Attributes Summary

class_embeddings

Returns a dictionary with class names as keys and class embeddings as values.

eval_gci_name

The GCI type to use for evaluation (e.g., 'gci0', 'gci1', 'gci2', 'gci3').

evaluation_model

Returns the evaluation model for use with evaluators.

head_entities

individual_embeddings

Returns a dictionary with individual names as keys and individual embeddings as values.

object_property_embeddings

Returns a dictionary with object property names as keys and object property embeddings as values.

tail_entities

testing_dataloaders

Returns the testing dataloaders for each GCI type.

testing_datasets

Returns the testing datasets for each GCI type.

testing_set

training_dataloaders

Returns the training dataloaders for each GCI type.

training_datasets

Returns the training datasets for each GCI type.

training_set

validation_dataloaders

Returns the validation dataloaders for each GCI type.

validation_datasets

Returns the validation datasets for each GCI type.

Methods Summary

add_axioms(*axioms)

This method adds axioms to the dataset contained in the model and reorders the embedding information for each entity accordingly.

compute_loss(pos_scores[, neg_scores])

Compute loss from positive and negative scores.

eval_method(data)

Evaluation method used for scoring.

from_pretrained(model)

This method loads a pretrained model from a file.

generate_negatives(gci_name, gci_dataset)

Generate negative samples for a given GCI type.

get_embeddings()

Get trained embeddings for entities, relations, and individuals.

get_negative_sampling_config()

Returns the active negative sampling configuration.

get_optimizer()

Create and return the optimizer.

get_regularization_loss()

Get regularization loss from the module.

init_module()

load_best_model()

Load the best model from the model filepath.

load_pairwise_eval_data()

score(axiom)

Returns the score of the given axiom.

train(epochs[, validate_every, epoch_callback])

Train the model.

Attributes Documentation

class_embeddings
eval_gci_name

The GCI type to use for evaluation (e.g., ‘gci0’, ‘gci1’, ‘gci2’, ‘gci3’). Must be explicitly set before evaluation.

Return type:

str

evaluation_model

Returns the evaluation model for use with evaluators.

If a custom evaluation model has been set via the setter, it is returned. Otherwise, for EL models, this returns the module which can be called with (data, gci_name). Requires eval_gci_name to be set in the latter case.

Return type:

torch.nn.Module

Raises:

ValueError – If no custom model is set and eval_gci_name has not been set

head_entities
individual_embeddings
object_property_embeddings
tail_entities
testing_dataloaders

Returns the testing dataloaders for each GCI type. Each dataloader is an instance of torch.utils.data.DataLoader

Return type:

dict

testing_datasets

Returns the testing datasets for each GCI type. Each dataset is an instance of mowl.datasets.el.ELDataset

Return type:

dict

testing_set
training_dataloaders

Returns the training dataloaders for each GCI type. Each dataloader is an instance of torch.utils.data.DataLoader

Return type:

dict

training_datasets

Returns the training datasets for each GCI type. Each dataset is an instance of mowl.datasets.el.ELDataset

Return type:

dict

training_set
validation_dataloaders

Returns the validation dataloaders for each GCI type. Each dataloader is an instance of torch.utils.data.DataLoader

Return type:

dict

validation_datasets

Returns the validation datasets for each GCI type. Each dataset is an instance of mowl.datasets.el.ELDataset

Return type:

dict

Methods Documentation

add_axioms(*axioms)[source]

This method adds axioms to the dataset contained in the model and reorders the embedding information for each entity accordingly. New entites are initalized with random embedding.

Parameters:

axioms (org.semanticweb.owlapi.model.OWLAxiom) – Axioms to be added to the dataset.

Added in version 0.2.0.

compute_loss(pos_scores, neg_scores=None)[source]

Compute loss from positive and negative scores.

Override this method to use different loss functions (e.g., MSE loss).

Parameters:
  • pos_scores (torch.Tensor) – Scores for positive samples (should be minimized)

  • neg_scores (torch.Tensor or None) – Scores for negative samples (should be maximized), or None

Returns:

Combined loss value

Return type:

torch.Tensor

eval_method(data)[source]

Evaluation method used for scoring. Override if needed.

Parameters:

data (torch.Tensor) – Input data for evaluation

Returns:

Evaluation scores

Return type:

torch.Tensor

Raises:

ValueError – If eval_gci_name has not been set

from_pretrained(model)[source]

This method loads a pretrained model from a file.

Parameters:

file_name (str) – Path to the pretrained model file.

Added in version 0.2.0.

generate_negatives(gci_name, gci_dataset)[source]

Generate negative samples for a given GCI type.

Override this method for custom negative sampling strategies.

Parameters:
  • gci_name (str) – Name of the GCI type (e.g., ‘gci2’)

  • gci_dataset (torch.Tensor) – The dataset containing positive samples

Returns:

Negative samples tensor, or None if no negatives for this GCI type

Return type:

torch.Tensor or None

get_embeddings()[source]

Get trained embeddings for entities, relations, and individuals.

Returns:

Tuple of (entity_embeddings, relation_embeddings, individual_embeddings)

Return type:

tuple

get_negative_sampling_config()[source]

Returns the active negative sampling configuration.

When neg_sampling_gcis is None (the default), the configuration is derived automatically from the intersection of _DEFAULT_NEG_SAMPLING_CONFIG and the module’s neg_capable_gcis — so only GCIs that the module genuinely supports are included.

When neg_sampling_gcis is set explicitly, only those GCIs are included. Training will raise NotImplementedError if any of them are absent from neg_capable_gcis.

Override this method to customise which GCI types require negative sampling and how negatives should be generated.

Returns:

Dictionary mapping GCI names to their negative sampling config. Each entry has:

  • 'index_pool': 'classes' or 'individuals' — pool to sample from

  • 'corrupt_column': int — which column of the data tensor to replace

Return type:

dict

get_optimizer()[source]

Create and return the optimizer.

Override this method to use a different optimizer or configuration.

Returns:

Optimizer instance

Return type:

torch.optim.Optimizer

get_regularization_loss()[source]

Get regularization loss from the module.

Override this method if your module has a regularization loss.

Returns:

Regularization loss value

Return type:

torch.Tensor

init_module()[source]
load_best_model()[source]

Load the best model from the model filepath.

load_pairwise_eval_data()[source]
score(axiom)[source]

Returns the score of the given axiom.

Parameters:

axiom (org.semanticweb.owlapi.model.OWLAxiom) – The axiom to score.

Added in version 0.2.0.

train(epochs, validate_every=1, epoch_callback=None)[source]

Train the model.

This is the generic training loop for EL embedding models. Subclasses can customize behavior by overriding: - get_negative_sampling_config(): Configure which GCIs need negatives - generate_negatives(): Custom negative sampling strategy - compute_loss(): Custom loss computation (e.g., MSE loss) - get_regularization_loss(): Add regularization - get_optimizer(): Use different optimizer

Parameters:
  • epochs (int) – Number of training epochs

  • validate_every (int, optional) – Validate and log every N epochs. Defaults to 1.

  • epoch_callback (callable, optional) – Optional callable invoked after each epoch as epoch_callback(epoch, model), where epoch is the 0-based epoch index and model is this model instance. Use it to capture snapshots for animation, custom logging, or early stopping. Defaults to None.