EmbeddingELModel
- class mowl.base_models.EmbeddingELModel(dataset, embed_dim, batch_size, extended=True, model_filepath=None, load_normalized=False, device='cpu', learning_rate=0.001, neg_sampling_gcis=None)[source]
Bases:
ModelAbstract class for \(\mathcal{EL}\) embedding methods.
- Parameters:
dataset (
mowl.datasets.Dataset) – mOWL dataset to use for training and evaluation.embed_dim (int) – The embedding dimension.
batch_size (int) – The batch size to use for training.
extended (bool, optional) – If True, the model is supposed with 7 EL normal forms. This will be reflected on the
DataLoadersthat will be generated and also the model must contain 7 loss functions. If False, the model will work with 4 normal forms only, merging the 3 extra to their corresponding origin normal forms. Defaults to Trueload_normalized (bool, optional) – If True, the ontology is assumed to be normalized and GCIs are extracted directly. Defaults to False.
device (str, optional) – The device to use for training. Defaults to “cpu”.
neg_sampling_gcis (list of str, optional) – List of GCI names for which negative sampling should be applied during training. If
None(default), negative sampling is applied automatically to all GCIs declared in the module’sneg_capable_gcis(i.e. only what the module actually supports). Pass an explicit list to override this — aNotImplementedErroris raised at the start of training if any requested GCI is not inneg_capable_gcis. Bot GCIs ("gci0_bot","gci1_bot","gci3_bot") are never subject to negative sampling.
Changed in version 2.0.0: Added the ‘load_normalized’ parameter.
Attributes Summary
Returns a dictionary with class names as keys and class embeddings as values.
The GCI type to use for evaluation (e.g., 'gci0', 'gci1', 'gci2', 'gci3').
Returns the evaluation model for use with evaluators.
Returns a dictionary with individual names as keys and individual embeddings as values.
Returns a dictionary with object property names as keys and object property embeddings as values.
Returns the testing dataloaders for each GCI type.
Returns the testing datasets for each GCI type.
Returns the training dataloaders for each GCI type.
Returns the training datasets for each GCI type.
Returns the validation dataloaders for each GCI type.
Returns the validation datasets for each GCI type.
Methods Summary
add_axioms(*axioms)This method adds axioms to the dataset contained in the model and reorders the embedding information for each entity accordingly.
compute_loss(pos_scores[, neg_scores])Compute loss from positive and negative scores.
eval_method(data)Evaluation method used for scoring.
from_pretrained(model)This method loads a pretrained model from a file.
generate_negatives(gci_name, gci_dataset)Generate negative samples for a given GCI type.
Get trained embeddings for entities, relations, and individuals.
Returns the active negative sampling configuration.
Create and return the optimizer.
Get regularization loss from the module.
Load the best model from the model filepath.
score(axiom)Returns the score of the given axiom.
train(epochs[, validate_every, epoch_callback])Train the model.
Attributes Documentation
- class_embeddings
- eval_gci_name
The GCI type to use for evaluation (e.g., ‘gci0’, ‘gci1’, ‘gci2’, ‘gci3’). Must be explicitly set before evaluation.
- Return type:
- evaluation_model
Returns the evaluation model for use with evaluators.
If a custom evaluation model has been set via the setter, it is returned. Otherwise, for EL models, this returns the module which can be called with (data, gci_name). Requires eval_gci_name to be set in the latter case.
- Return type:
- Raises:
ValueError – If no custom model is set and eval_gci_name has not been set
- head_entities
- individual_embeddings
- object_property_embeddings
- tail_entities
- testing_dataloaders
Returns the testing dataloaders for each GCI type. Each dataloader is an instance of
torch.utils.data.DataLoader- Return type:
- testing_datasets
Returns the testing datasets for each GCI type. Each dataset is an instance of
mowl.datasets.el.ELDataset- Return type:
- testing_set
- training_dataloaders
Returns the training dataloaders for each GCI type. Each dataloader is an instance of
torch.utils.data.DataLoader- Return type:
- training_datasets
Returns the training datasets for each GCI type. Each dataset is an instance of
mowl.datasets.el.ELDataset- Return type:
- training_set
- validation_dataloaders
Returns the validation dataloaders for each GCI type. Each dataloader is an instance of
torch.utils.data.DataLoader- Return type:
- validation_datasets
Returns the validation datasets for each GCI type. Each dataset is an instance of
mowl.datasets.el.ELDataset- Return type:
Methods Documentation
- add_axioms(*axioms)[source]
This method adds axioms to the dataset contained in the model and reorders the embedding information for each entity accordingly. New entites are initalized with random embedding.
- Parameters:
axioms (org.semanticweb.owlapi.model.OWLAxiom) – Axioms to be added to the dataset.
Added in version 0.2.0.
- compute_loss(pos_scores, neg_scores=None)[source]
Compute loss from positive and negative scores.
Override this method to use different loss functions (e.g., MSE loss).
- Parameters:
pos_scores (torch.Tensor) – Scores for positive samples (should be minimized)
neg_scores (torch.Tensor or None) – Scores for negative samples (should be maximized), or None
- Returns:
Combined loss value
- Return type:
- eval_method(data)[source]
Evaluation method used for scoring. Override if needed.
- Parameters:
data (torch.Tensor) – Input data for evaluation
- Returns:
Evaluation scores
- Return type:
- Raises:
ValueError – If eval_gci_name has not been set
- from_pretrained(model)[source]
This method loads a pretrained model from a file.
- Parameters:
file_name (str) – Path to the pretrained model file.
Added in version 0.2.0.
- generate_negatives(gci_name, gci_dataset)[source]
Generate negative samples for a given GCI type.
Override this method for custom negative sampling strategies.
- Parameters:
gci_name (str) – Name of the GCI type (e.g., ‘gci2’)
gci_dataset (torch.Tensor) – The dataset containing positive samples
- Returns:
Negative samples tensor, or None if no negatives for this GCI type
- Return type:
torch.Tensor or None
- get_embeddings()[source]
Get trained embeddings for entities, relations, and individuals.
- Returns:
Tuple of (entity_embeddings, relation_embeddings, individual_embeddings)
- Return type:
- get_negative_sampling_config()[source]
Returns the active negative sampling configuration.
When
neg_sampling_gcisisNone(the default), the configuration is derived automatically from the intersection of_DEFAULT_NEG_SAMPLING_CONFIGand the module’sneg_capable_gcis— so only GCIs that the module genuinely supports are included.When
neg_sampling_gcisis set explicitly, only those GCIs are included. Training will raiseNotImplementedErrorif any of them are absent fromneg_capable_gcis.Override this method to customise which GCI types require negative sampling and how negatives should be generated.
- Returns:
Dictionary mapping GCI names to their negative sampling config. Each entry has:
'index_pool':'classes'or'individuals'— pool to sample from'corrupt_column': int — which column of the data tensor to replace
- Return type:
- get_optimizer()[source]
Create and return the optimizer.
Override this method to use a different optimizer or configuration.
- Returns:
Optimizer instance
- Return type:
- get_regularization_loss()[source]
Get regularization loss from the module.
Override this method if your module has a regularization loss.
- Returns:
Regularization loss value
- Return type:
- score(axiom)[source]
Returns the score of the given axiom.
- Parameters:
axiom (
org.semanticweb.owlapi.model.OWLAxiom) – The axiom to score.
Added in version 0.2.0.
- train(epochs, validate_every=1, epoch_callback=None)[source]
Train the model.
This is the generic training loop for EL embedding models. Subclasses can customize behavior by overriding: -
get_negative_sampling_config(): Configure which GCIs need negatives -generate_negatives(): Custom negative sampling strategy -compute_loss(): Custom loss computation (e.g., MSE loss) -get_regularization_loss(): Add regularization -get_optimizer(): Use different optimizer- Parameters:
epochs (int) – Number of training epochs
validate_every (int, optional) – Validate and log every N epochs. Defaults to 1.
epoch_callback (callable, optional) – Optional callable invoked after each epoch as
epoch_callback(epoch, model), where epoch is the 0-based epoch index and model is this model instance. Use it to capture snapshots for animation, custom logging, or early stopping. Defaults toNone.