Skip to content

Commit 6dddfa6

Browse files
authored
Merge pull request #2 from alkidbaci/develop
Updated owlapy version, fixed bugs and some refactoring
2 parents 70322f2 + a4d86ca commit 6dddfa6

16 files changed

Lines changed: 315 additions & 838 deletions

.github/workflows/test.yml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
name: Python package
2+
3+
on: [push,pull_request]
4+
jobs:
5+
build:
6+
runs-on: ubuntu-latest
7+
strategy:
8+
matrix:
9+
python-version: ["3.10.13"]
10+
11+
steps:
12+
- uses: actions/checkout@v3
13+
- name: Set up Python ${{ matrix.python-version }}
14+
uses: actions/setup-python@v4
15+
with:
16+
python-version: ${{ matrix.python-version }}
17+
- name: Install dependencies
18+
run: |
19+
python -m pip install --upgrade pip
20+
pip install -e .
21+
- name: Test with pytest
22+
run: |
23+
pip install pytest
24+
python -m pytest -p no:warnings -x

LICENSE

Lines changed: 21 additions & 661 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 4 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# OntoSample
22

3+
[![Downloads](https://static.pepy.tech/badge/ontosample)](https://pepy.tech/project/ontosample)
4+
[![Downloads](https://img.shields.io/pypi/dm/ontosample)](https://pypi.org/project/ontosample/)
5+
[![Pypi](https://img.shields.io/badge/pypi-0.2.6-blue)](https://pypi.org/project/ontosample/0.2.6/)
6+
37
OntoSample is a python package that offers classic sampling techniques for OWL ontologies/knowledge
48
bases. Furthermore, we have tailored the classic sampling techniques to the setting of concept
59
learning making use of learning problem.
@@ -47,87 +51,6 @@ sampler.save_sample(kb=sampled_kb, filename='sampled_kb')
4751

4852
Check the [examples](https://github.com/alkidbaci/OntoSample/tree/main/examples) folder for more.
4953

50-
51-
## About the paper
52-
53-
### Abstact
54-
55-
Node classification is an important task in many fields, e.g., predicting entity types in knowledge graphs, classifying papers in citation
56-
graphs, or classifying nodes in social networks. In many cases, it
57-
is crucial to explain why certain predictions are made. Towards
58-
this end, concept learning has been proposed as a means of interpretable node classification: given positive and negative examples
59-
in a knowledge base, concepts in description logics are learned that
60-
serve as classification models. However, state-of-the-art concept
61-
learners, including EvoLearner and CELOE exhibit long runtimes.
62-
In this paper, we propose to accelerate concept learning with graph
63-
sampling techniques. We experiment with seven techniques and tailor them to the setting of concept learning. In our experiments, we
64-
achieve a reduction in training size by over 90% while maintaining
65-
a high predictive performance.
66-
67-
### Reproducing paper results
68-
69-
You will find in examples folder the script used to generate the results in paper.
70-
`evaluation_table_generator.py` generates every result for each dataset-sampler-sampling_size
71-
combination and store them in a csv.
72-
73-
#### To generate results of Table 2
74-
Install the whole ontolearn package to use its learning algorithms like EvoLearner and CELOE because
75-
they are not included here to keep the number of dependencies low.
76-
77-
```shell
78-
pip install ontolearn
79-
```
80-
81-
The evaluation results for a certain sampling percentage can be simply reproduced by using `examples/evaluation_table_generator.py`.
82-
83-
There are the following arguments that the user can give:
84-
- `learner` → type of learner: 'evolerner' or 'celeo'.
85-
- `datasets_and_lp` → list containing the name of the json files that contains the path to the knowledge graph and
86-
the learning problem.
87-
- `samplers` → list of the abbreviation of the samplers as strings.
88-
- `csv_path` → path of the csv file to save the results.
89-
- `sampling_size` → the sampling percentage
90-
- `iterations` → number of iterations for each sampler
91-
92-
Table 2 results can be generated using the following instructions:
93-
94-
1. Execute the script `evaluation_table_generator.py` using the default parameters.
95-
2. After the script has finished executing, set the argument `--learner` to `celoe`
96-
3. Set the csv path to another path by using the `--csv_path` argument.
97-
4. Execute again.
98-
99-
In the end you will have 2 csv files, one for each learner.
100-
101-
> **Note 1**: Not all datasets are included in the project because some of them are too large.
102-
> You can download all the SML-bench datasets [here](https://github.com/SmartDataAnalytics/SML-Bench/tree/updates/learningtasks).
103-
> They need to go to their respective folder named after them inside KGs directory.
104-
105-
> **Note 2**: Keep in mind that this file needs a considerable amount of time to execute (more than 40 hours for each concept learner
106-
> depending on the machine specifications) when using the default values which were also used to construct
107-
> the results for the paper.
108-
>
109-
> If you want quicker execution, you can enter a lower number of iterations.
110-
111-
---------------------------------------------------
112-
113-
#### To generate results of Figure 1
114-
115-
To generate results used in Figure 1 you need to follow the instructions below
116-
when writing the command to execute the script `examples/evaluation_table_generator.py`:
117-
118-
119-
```shell
120-
cd examples
121-
python evaluation_table_generator.py --datasets_and_lp {"hepatitis_lp.json", "carcinogenesis_lp.json"} --samplers {"RNLPC", "RWJLPC", "RWJPLPC", "RELPC", "FFLPC"} --sampling_size 0.25
122-
```
123-
124-
Repeat the command for sampling sizes of `0.20`, `0.15`, `0.10`, `0.5`
125-
126-
127-
> **Note:** Make sure to set a different csv path using the `--csv_path` argument each time you execute to avoid
128-
> overriding the previous results.
129-
130-
13154
### Citing
13255

13356
```

examples/evaluation_table_generator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ def start(args):
131131
p = set(examples['positive_examples'])
132132
n = set(examples['negative_examples'])
133133
for individual in removed_individuals:
134-
individual_as_str = individual.get_iri().as_str()
134+
individual_as_str = individual.str
135135
if individual_as_str in p:
136136
p.remove(individual_as_str)
137137
if individual_as_str in n:

ontolearn_light/abstracts.py

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,20 +4,18 @@
44
from abc import ABCMeta, abstractmethod
55
from typing import Set, List, Tuple, Iterable, TypeVar, Generic, ClassVar, Optional
66
from owlapy.class_expression import OWLClassExpression
7-
from owlapy.owl_ontology import OWLOntology
7+
from owlapy.owl_ontology import Ontology
88
from owlapy.utils import iter_count
99
from .data_struct import Experience
1010
from .utils import read_csv
1111
from collections import OrderedDict
12-
12+
from owlapy import owl_expression_to_dl
1313
_N = TypeVar('_N') #:
1414
_KB = TypeVar('_KB', bound='AbstractKnowledgeBase') #:
1515

1616
logger = logging.getLogger(__name__)
1717

18-
# @TODO:CD: Each Class definiton in abstract.py should share a prefix, e.g., BaseX or AbstractX.
19-
# @TODO:CD: All imports must be located on top of the script
20-
from owlapy import owl_expression_to_dl
18+
2119
class EncodedLearningProblem(metaclass=ABCMeta):
2220
"""Encoded Abstract learning problem for use in Scorers."""
2321
__slots__ = ()
@@ -28,7 +26,6 @@ class EncodedPosNegLPStandardKind(EncodedLearningProblem, metaclass=ABCMeta):
2826
__slots__ = ()
2927

3028

31-
# @TODO: Why we need Generic[_N] and if we need it why we di not use it in all other abstract classes?
3229
class AbstractScorer(Generic[_N], metaclass=ABCMeta):
3330
"""
3431
An abstract class for quality functions.
@@ -54,7 +51,6 @@ def score_elp(self, instances: set, learning_problem: EncodedLearningProblem) ->
5451
"""
5552
if len(instances) == 0:
5653
return False, 0
57-
# @TODO: It must be moved to the top of the abstracts.py
5854
from ontolearn_light.learning_problem import EncodedPosNegLPStandard
5955
if isinstance(learning_problem, EncodedPosNegLPStandard):
6056
tp = len(learning_problem.kb_pos.intersection(instances))
@@ -82,7 +78,6 @@ def score2(self, tp: int, fn: int, fp: int, tn: int) -> Tuple[bool, Optional[flo
8278
"""
8379
pass
8480

85-
# @TODO:CD: Why there is '..' in AbstractNode
8681
def apply(self, node: 'AbstractNode', instances, learning_problem: EncodedLearningProblem) -> bool:
8782
"""Apply the quality function to a search tree node after calculating the quality score on the given instances.
8883
@@ -99,7 +94,6 @@ def apply(self, node: 'AbstractNode', instances, learning_problem: EncodedLearni
9994
f'Expected EncodedLearningProblem but got {type(learning_problem)}'
10095
assert isinstance(node, AbstractNode), \
10196
f'Expected AbstractNode but got {type(node)}'
102-
# @TODO: It must be moved to the top of the abstracts.py
10397
from ontolearn_light.search import _NodeQuality
10498
assert isinstance(node, _NodeQuality), \
10599
f'Expected _NodeQuality but got {type(_NodeQuality)}'
@@ -331,7 +325,7 @@ class AbstractKnowledgeBase(metaclass=ABCMeta):
331325

332326
# CD: This function is used as "a get method". Insteadf either access the atttribute directly
333327
# or use it as a property @abstractmethod
334-
def ontology(self) -> OWLOntology:
328+
def ontology(self) -> Ontology:
335329
"""The base ontology of this knowledge base."""
336330
pass
337331

ontolearn_light/concept_generator.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,6 @@ def union_from_iterables(a_operands: Iterable[OWLClassExpression],
5252
b_operands: Iterable[OWLClassExpression]) -> Iterable[OWLObjectUnionOf]:
5353
""" Create an union of each class expression in a_operands with each class expression in b_operands."""
5454
assert (isinstance(a_operands, Generator) is False) and (isinstance(b_operands, Generator) is False)
55-
# TODO: if input sizes say 10^4, we can employ multiprocessing
5655
seen = set()
5756
for i in a_operands:
5857
for j in b_operands:
@@ -73,8 +72,6 @@ def intersection(self, ops: Iterable[OWLClassExpression]) -> OWLObjectIntersecti
7372
Returns:
7473
Intersection with all operands (intersections are merged).
7574
"""
76-
# TODO CD: I would rather prefer def intersection(self, a: OWLClassExpression, b: OWLClassExpression). This is
77-
# TODO CD: more advantages as one does not need to create a tuple of a list before intersection two expressions.
7875
operands: List[OWLClassExpression] = []
7976
for c in ops:
8077
if isinstance(c, OWLObjectIntersectionOf):

ontolearn_light/data_struct.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,6 @@ class Experience:
9191
"""
9292

9393
def __init__(self, maxlen: int):
94-
# @TODO we may want to not forget experiences yielding high rewards
9594
self.current_states = deque(maxlen=maxlen)
9695
self.next_states = deque(maxlen=maxlen)
9796
self.rewards = deque(maxlen=maxlen)

ontolearn_light/ea_utils.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,6 @@ def ind_to_string(ind: List[Tree]) -> str:
136136
return ''.join([prim.name for prim in ind])
137137

138138

139-
# TODO: Ugly hack for now
140139
def owlliteral_to_primitive_string(lit: OWLLiteral, pe: Optional[Union[OWLDataProperty, OWLObjectProperty]] = None) \
141140
-> str:
142141
str_ = type(lit.to_python()).__name__ + escape(lit.get_literal())

ontolearn_light/knowledge_base.py

Lines changed: 16 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -14,15 +14,12 @@
1414
from owlapy.owl_datatype import OWLDatatype
1515
from owlapy.owl_individual import OWLNamedIndividual
1616
from owlapy.owl_literal import BooleanOWLDatatype, NUMERIC_DATATYPES, DoubleOWLDatatype, TIME_DATATYPES, OWLLiteral
17-
from owlapy.owl_ontology import OWLOntology
18-
from owlapy.owl_ontology_manager import OWLOntologyManager
1917
from owlapy.owl_property import OWLObjectProperty, OWLDataProperty, OWLObjectPropertyExpression, \
2018
OWLDataPropertyExpression
21-
from owlapy.owl_reasoner import OWLReasoner
2219

2320
from owlapy.owl_ontology import Ontology
2421
from owlapy.owl_ontology_manager import OntologyManager
25-
from owlapy.owl_reasoner import FastInstanceCheckerReasoner, OntologyReasoner
22+
from owlapy.owl_reasoner import StructuralReasoner
2623

2724
from owlapy.render import DLSyntaxObjectRenderer
2825
from ontolearn_light.search import EvaluatedConcept
@@ -35,18 +32,16 @@
3532
from .utils.static_funcs import (init_length_metric, init_hierarchy_instances,
3633
init_named_individuals, init_individuals_from_concepts)
3734

38-
from owlapy.class_expression import OWLDataMaxCardinality, OWLDataSomeValuesFrom
39-
from owlapy import owl_expression_to_sparql, owl_expression_to_dl
35+
from owlapy.class_expression import OWLDataSomeValuesFrom
4036
from owlapy.owl_data_ranges import OWLDataRange
4137
from owlapy.class_expression import OWLDataOneOf
4238

4339
logger = logging.getLogger(__name__)
4440

4541

46-
def depth_Default_ReasonerFactory(onto: OWLOntology) -> OWLReasoner:
42+
def depth_Default_ReasonerFactory(onto: Ontology) -> StructuralReasoner:
4743
assert isinstance(onto, Ontology)
48-
base_reasoner = OntologyReasoner(ontology=onto)
49-
return FastInstanceCheckerReasoner(ontology=onto, base_reasoner=base_reasoner)
44+
return StructuralReasoner(ontology=onto, class_cache=False, property_cache=False)
5045

5146

5247
class KnowledgeBase(AbstractKnowledgeBase):
@@ -89,9 +84,9 @@ class KnowledgeBase(AbstractKnowledgeBase):
8984
@overload
9085
def __init__(self, *,
9186
path: str,
92-
ontologymanager_factory: Callable[[], OWLOntologyManager] = OntologyManager(
87+
ontologymanager_factory: Callable[[], OntologyManager] = OntologyManager(
9388
world_store=None),
94-
reasoner_factory: Callable[[OWLOntology], OWLReasoner] = None,
89+
reasoner_factory: Callable[[Ontology], StructuralReasoner] = None,
9590
length_metric: Optional[OWLClassExpressionLengthMetric] = None,
9691
length_metric_factory: Optional[Callable[[], OWLClassExpressionLengthMetric]] = None,
9792
individuals_cache_size=128,
@@ -101,8 +96,8 @@ def __init__(self, *,
10196

10297
@overload
10398
def __init__(self, *,
104-
ontology: OWLOntology,
105-
reasoner: OWLReasoner,
99+
ontology: Ontology,
100+
reasoner: StructuralReasoner,
106101
load_class_hierarchy: bool = True,
107102
length_metric: Optional[OWLClassExpressionLengthMetric] = None,
108103
length_metric_factory: Optional[Callable[[], OWLClassExpressionLengthMetric]] = None,
@@ -112,12 +107,12 @@ def __init__(self, *,
112107
def __init__(self, *,
113108
path: Optional[str] = None,
114109

115-
ontologymanager_factory: Optional[Callable[[], OWLOntologyManager]] = None,
116-
reasoner_factory: Optional[Callable[[OWLOntology], OWLReasoner]] = None,
110+
ontologymanager_factory: Optional[Callable[[], OntologyManager]] = None,
111+
reasoner_factory: Optional[Callable[[Ontology], StructuralReasoner]] = None,
117112
length_metric_factory: Optional[Callable[[], OWLClassExpressionLengthMetric]] = None,
118113

119-
ontology: Optional[OWLOntology] = None,
120-
reasoner: Optional[OWLReasoner] = None,
114+
ontology: Optional[Ontology] = None,
115+
reasoner: Optional[StructuralReasoner] = None,
121116
length_metric: Optional[OWLClassExpressionLengthMetric] = None,
122117

123118
individuals_cache_size=128,
@@ -152,14 +147,13 @@ def __init__(self, *,
152147
self.manager.save_world()
153148
logger.debug("Synced world to backend store")
154149

155-
reasoner: OWLReasoner
150+
reasoner: StructuralReasoner
156151
if reasoner is not None:
157152
self.reasoner = reasoner
158153
elif reasoner_factory is not None:
159154
self.reasoner = reasoner_factory(self.ontology)
160155
else:
161-
self.reasoner = FastInstanceCheckerReasoner(ontology=self.ontology, base_reasoner=OntologyReasoner(
162-
ontology=self.ontology))
156+
self.reasoner = StructuralReasoner(ontology=self.ontology, class_cache=False, property_cache=False)
163157

164158
self.length_metric = init_length_metric(length_metric, length_metric_factory)
165159

@@ -317,7 +311,6 @@ def tbox(self, entities: Union[Iterable[OWLClass], Iterable[OWLDataProperty], It
317311
If no concept-s|propert-y/ies are given, get all tbox axioms.
318312
319313
Args:
320-
@TODO: entities or namedindividuals ?!
321314
entities: Entities to obtain tbox axioms from. This can be a single
322315
OWLClass/OWLDataProperty/OWLObjectProperty object, a list of those objects or None. If you enter a list
323316
that combines classes and properties (which we don't recommend doing), only axioms for one type will be
@@ -525,8 +518,6 @@ def concept_len(self, ce: OWLClassExpression) -> int:
525518
Returns:
526519
Length of the concept.
527520
"""
528-
# @TODO: CD: Computing the length of a concept should be disantangled from KB
529-
# @TODO: CD: Ideally, this should be a static function
530521

531522
return self.length_metric.length(ce)
532523

@@ -550,7 +541,7 @@ def cache_individuals(self, ce: OWLClassExpression) -> None:
550541
raise TypeError
551542
if ce in self.ind_cache:
552543
return
553-
if isinstance(self.reasoner, FastInstanceCheckerReasoner):
544+
if isinstance(self.reasoner, StructuralReasoner):
554545
self.ind_cache[ce] = self.reasoner._find_instances(ce) # performance hack
555546
else:
556547
temp = self.reasoner.instances(ce)
@@ -666,8 +657,6 @@ def data_properties_for_domain(self, domain: OWLClassExpression, data_properties
666657

667658
def encode_learning_problem(self, lp: PosNegLPStandard):
668659
"""
669-
@TODO: A learning problem (DL concept learning problem) should not be a part of a knowledge base
670-
671660
Provides the encoded learning problem (lp), i.e. the class containing the set of OWLNamedIndividuals
672661
as follows:
673662
kb_pos --> the positive examples set,
@@ -720,8 +709,6 @@ def evaluate_concept(self, concept: OWLClassExpression, quality_func: AbstractSc
720709
encoded_learning_problem: EncodedLearningProblem) -> EvaluatedConcept:
721710
"""Evaluates a concept by using the encoded learning problem examples, in terms of Accuracy or F1-score.
722711
723-
@ TODO: A knowledge base is a data structure and the context of "evaluating" a concept seems to be unrelated
724-
725712
Note:
726713
This method is useful to tell the quality (e.q) of a generated concept by the concept learners, to get
727714
the set of individuals (e.inds) that are classified by this concept and the amount of them (e.ic).
@@ -752,22 +739,16 @@ def get_leaf_concepts(self, concept: OWLClass):
752739

753740
def get_least_general_named_concepts(self) -> Generator[OWLClass, None, None]:
754741
"""Get leaf classes.
755-
@TODO: Docstring needed
756-
Returns:
757742
"""
758743
yield from self.class_hierarchy.leaves()
759744

760745
def least_general_named_concepts(self) -> Generator[OWLClass, None, None]:
761746
"""Get leaf classes.
762-
@TODO: Docstring needed
763-
Returns:
764747
"""
765748
yield from self.class_hierarchy.leaves()
766749

767750
def get_most_general_classes(self) -> Generator[OWLClass, None, None]:
768-
"""Get most general named concepts classes.
769-
@TODO: Docstring needed
770-
Returns:"""
751+
"""Get most general named concepts classes."""
771752
yield from self.class_hierarchy.roots()
772753

773754
def get_direct_sub_concepts(self, concept: OWLClass) -> Iterable[OWLClass]:

0 commit comments

Comments
 (0)