Open theses

Make use of mined horn rules in query processing, i.e., query planning and execution.

Type: Master's Thesis
Language: English
Supervisors: Disha Purohit, Philipp D. Rohde, Maria-Esther Vidal

The Resource Description Framework (RDF) [1] is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. SPARQL [2] is the W3C recommended language to query RDF data. In this context, rule mining is the process of discovering patterns between entities in a knowledge graph. The mined horn rules can be used during query processing.
The goal of this thesis is to define an algorithm that considers mined rules (and their metrics) during query decomposition, query optimization, and query execution.

Requirements

Enrollment at a German University
Good English skills (written and spoken)
Good programming skills (Python)

Useful Courses

Grundlagen der Datenbanksysteme (Introduction to Database Systems)
Datenstrukturen und Algorithmen (Data Structures and Algorithms)
Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
Komplexität von Algorithmen (Algorithms and Complexity)
Scientific Data Management and Knowledge Graphs

Topics

Big Data
Knowledge Graphs
Query Processing
Rule Mining

Literature

[1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

[2] https://www.w3.org/TR/2017/REC-shacl-20170720/

Efficient Computation of Detailed Source Descriptions for Knowledge Graphs

Computation of semantic source descriptions for federations of knowledge graphs in an efficient way.

Type: Master's Thesis
Language: English
Supervisors: Philipp D. Rohde, Maria-Esther Vidal

When querying a system that consists of several knowledge graphs, the system needs to decide which parts of the query can be answered from which knowledge graph. Most systems use simple source descriptions, however, more detailed source descriptions enable the system to find better plans.
The goal of this thesis is to provide a formal definition as well as an implementation for an algorithm that efficiently collects detailed source descriptions for knowledge graphs.

Requirements

Enrollment at a German University
Good English skills (written and spoken)
Good programming skills (Python)

Useful Courses

Datenstrukturen und Algorithmen (Data Structures and Algorithms)
Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
Komplexität von Algorithmen (Algorithms and Complexity)

Topics

Big Data
Knowledge Graphs
Query Processing

Efficient Generation of Knowledge Graphs using RML-star with JSON and XML

Efficiently generating RDF-star data from JSON and XML using RML-star.

Type: 2 Master's Theses
Language: English
Supervisors: Enrique Iglesias, Maria-Esther Vidal

In recent years, the amount of data generated has increased exponentially, and knowledge graphs have gained attention as data structures to integrate data and knowledge harvested from myriad data sources. Thus, the need to develop knowledge graph creation engines capable of handling data complexities like large volume, high-duplicate rate, and heterogeneity. The SDM-RDFizer is a knowledge graph creation engine that follows the standard established by the RDF Mapping Language (RML). RML is a mapping language that expresses customized mapping rules from heterogeneous data structures and serializations to the RDF data model. This thesis aims to define an extension of the SDM-RDFizer that allows the tool to transform RML-star mappings using JSON or XML files as a data source into knowledge graphs. RML-star is an extension of RML, which uses the RDF-star data model.

Theses

Thesis 1: JSON
Thesis 2: XML

Requirements

Enrollment at a German University
Good English skills (written and spoken)
Good programming skills (Python)
Knowledge in Mapping Languages

Useful Courses

Datenstrukturen und Algorithmen (Data Structures and Algorithms)
Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
Komplexität von Algorithmen (Algorithms and Complexity)
Scientific Data Management and Knowledge Graphs

Topics

Big Data
Knowledge Graph Creation
Mapping Languages

Literature

[1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

[2] https://w3c.github.io/rdf-star/cg-spec/editors_draft.html

[3] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana, M.-E. Vidal: SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. 2020. URL: https://doi.org/10.1145/3340531.3412881

[4] E. Iglesias, S. Jozashoori, M.-E. Vidal: Scaling up knowledge graph creation to large and heterogeneous data sources

. 2022. URL: doi.org/10.48550/arXiv.2201.09694

[5] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana, M.-E. Vidal: Empowering the SDM-RDFizer Tool for Scaling Up to Complex Knowledge Graph Creation Pipelines (Under Review). 2023. URL: https://www.semantic-web-journal.net/system/files/swj3246.pdf

[6] A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannes, R. Van de Walle: RML: A Generic Language for Integrated RDF Mappings of Hererogeneous Data. 2014. https://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf

Efficient Mining of Horn Rules from Knowledge Graphs

Efficiently mining horn rules from knowledge graphs using SPARQL queries.

Type: Master's Thesis
Language: English
Supervisors: Disha Purohit, Maria-Esther Vidal

Rule Mining is the process of discovering interesting patterns or relationships between the variables. Mining rules on top of Knowledge Graphs is discovering patterns between the entities present in it. The goal of this thesis is to provide a formal definition as well as an implementation for an algorithm that is able to mine rules from Knowledge Graphs. This algorithm can than be enhanced to mine rules from multiple knowledge graphs.

Requirements

Enrollment at a German University
Good English skills (written and spoken)
Good programming skills (Python)

Useful Courses

Datenstrukturen und Algorithmen (Data Structures and Algorithms)
Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
Komplexität von Algorithmen (Algorithms and Complexity)
Scientific Data Management and Knowledge Graphs

Topics

Big Data
Knowledge Graphs
Rule Mining

Efficient Query Processing by Discovering Synonym Predicates in Knowledge Graphs

Complete query answers by discovering synonymous predicates.

Type: Master's Thesis
Language: English
Supervisors: Emetis Niazmand, Maria-Esther Vidal

Every knowledge graph consists of many duplicated data and metadata which have the same meaning but are defined differently. Thus, synonym predicates can connect the same resource to different entities which leads query engines to retrieve incomplete answers. This thesis aims to enhance query processing by discovering the synonymous predicates to retrieve the complete answers.

Requirements

Enrollment at a German University
Good English skills (written and spoken)
Good programming skills (Python)

Useful Courses

Datenstrukturen und Algorithmen (Data Structures and Algorithms)
Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
Komplexität von Algorithmen (Algorithms and Complexity)
Scientific Data Management and Knowledge Graphs

Topics

Big Data
Knowledge Graphs
Query Processing

Efficient Validation of RDF Data using SHACL

Efficiently validating integrity constraints over RDF data using SHACL and SPARQL.

Type: Master's Thesis
Language: English
Supervisors: Philipp D. Rohde, Maria-Esther Vidal

The Resource Description Framework (RDF) is the W3C standard for publishing and exchanging data on the Web. Many data sources suffer from data quality issues. The Shapes Constraint Language (SHACL) is the W3C recommendation language for defining integrity constraints over RDF data. Corman et. al [1] showed that the validation of an RDF data source using an arbitrary SHACL shape schema is NP-hard. The goal of this thesis is to define efficient methods to validate SHACL shape schemas over RDF data sources accessible via SPARQL; a query language for RDF data sources. The implementation part of the thesis will be based on an already existing prototype for simple constraints.

Requirements

Enrollment at a German University
Good English skills (written and spoken)
Good programming skills (Python)

Useful Courses

Grundlagen der Datenbanksysteme (Introduction to Database Systems)
Datenstrukturen und Algorithmen (Data Structures and Algorithms)
Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
Komplexität von Algorithmen (Algorithms and Complexity)

Topics

Big Data
Knowledge Graphs
Quality Assessment

Literature

[1] J. Corman, J.L. Reutter, O. Savković: Semantics and Validation of Recursive SHACL. 2018.

[2] J. Corman, F. Florenzano, J.L. Reutter, O. Savković: Validating SHACL Constraints over a SPARQL Endpoint. 2019.

[3] M. Figuera, P.D. Rohde, M.-E. Vidal: Trav-SHACL: Efficiently Validating Networks of SHACL Constraints. 2021.

Extending SPARQL with SHACL-validation-based Filters

Extending SPARQL with Filters based on SHACL validation results.

Type: Master's Thesis
Language: English
Supervisors: Philipp D. Rohde, Maria-Esther Vidal

The Resource Description Framework (RDF) [1] is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. The Shapes Constraint Language (SHACL) [2] is the W3C recommendation language for defining integrity constraints over RDF data. In SHACL, constraints are expressed as a network of shapes, called SHACL shape schema. A shape represents integrity constraints over the properties of a class or set of entities. However, in contrast to relational databases, those integrity constraints are not checked during data insertion. The evaluation of a SHACL shape schema reports the entities that do not satisfy the imposed constraints; Trav-SHACL [3] is an engine capable of validating SHACL shape schemas against knowledge graphs accessible via SPARQL endpoints. SPARQL [4] is the W3C recommended language to query RDF data. Recently, Rohde [5] proposed to annotate the query results with the results from the validation for more transparency.
The goal of this thesis is to define new SPARQL filters that are capable of filtering query results given a shape and desired validation result. The implementation part of the thesis will be based on an already existing prototype for annotating the query results with the validation result.

Requirements

Enrollment at a German University
Good English skills (written and spoken)
Good programming skills (Python)

Useful Courses / Skills

Grundlagen der Datenbanksysteme (Introduction to Database Systems)
Datenstrukturen und Algorithmen (Data Structures and Algorithms)
Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
Komplexität von Algorithmen (Algorithms and Complexity)

Topics

Big Data
Knowledge Graphs
Query Processing
Quality Assessment

Literature

[1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

[2] https://www.w3.org/TR/2017/REC-shacl-20170720/

[3] M. Figuera, P.D. Rohde, M.-E. Vidal: Trav-SHACL: Efficiently Validating Networks of SHACL Constraints. 2021.

[4] https://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/

[5] P.D. Rohde: SHACL Constraint Validation during SPARQL Query Processing. 2021.

On-the-fly Semantification for Querying Heterogeneous Sources with SPARQL

Extending a SPARQL query engine to other data formats using the SDM-RDFizer as a wrapper.

Type: Master's Thesis
Language: English
Supervisors: Philipp D. Rohde, Enrique Iglesias, Maria-Esther Vidal

The Resource Description Framework (RDF) [1] is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. SPARQL [2] is the W3C recommended language to query RDF data. However, data on the Web are still available in many different formats. The SDM-RDFizer [3] is a tool that is able, with the use of mappings specified in the RDF Mapping Language (RML) [4], to semantify various data formats.
The goal of this thesis is to define an efficient approach to use on-the-fly semantification for non-RDF sources to answer SPARQL queries. The implementation part of the thesis will be based on an already existing SPARQL query engine which will be extended to collect data from non-RDF sources using the SDM-RDFizer as a wrapper.

Requirements

Enrollment at a German University
Good English skills (written and spoken)
Good programming skills (Python

Useful Courses

Grundlagen der Datenbanksysteme (Introduction to Database Systems)
Datenstrukturen und Algorithmen (Data Structures and Algorithms)
Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
Komplexität von Algorithmen (Algorithms and Complexity)
Scientific Data Management and Knowledge Graphs

Topics

Big Data
Knowledge Graphs
Query Processing
Data Integration

Literature

[1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

[2] https://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/

[3] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana, M.-E. Vidal: SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. 2020. URL: https://doi.org/10.1145/3340531.3412881

[4] A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannes, R. Van de Walle: RML: A Generic Language for Integrated RDF Mappings of Hererogeneous Data. 2014. https://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf

Translating SPARQL Queries to Native Query Languages of Various DB Models Supporting Virtual Knowledge Graph Creation

Type: Master's Thesis
Language: English
Supervisors: Philipp D. Rohde, Maria-Esther Vidal

The Resource Description Framework (RDF) is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. The recommended language to query RDF data is SPARQL. Even though the number of publicly available knowledge graphs is increasing, many data sources are still available in classical formats like relational databases. In some cases it is not possible to transform the data models into one common format and integrate them all in one place. This thesis aims at virtual data integration by transforming the queries during query processing.
The goal of this thesis is to support virtual knowledge graph creation by transforming SPARQL queries into query languages that are natively supported by various database models. The new approach will be integrated into an existing query engine. The work also includes analyzing the state-of-the-art translators as well as comparing their performance with the proposed approach.

Requirements

Enrollment at a German University
Good English skills (written and spoken)
Good programming skills (Python)

Useful Courses / Skills

Grundlagen der Datenbanksysteme (Introduction to Database Systems)
Datenstrukturen und Algorithmen (Data Structures and Algorithms)
Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
Komplexität von Algorithmen (Algorithms and Complexity)

Topics

Big Data
Knowledge Graphs
Query Processing
Data Integration

Privacy-aware Query Processing

Integrating privacy mechanisms into query processing.

Type: Master's Thesis
Language: English
Supervisors: Philipp D. Rohde, Maria-Esther Vidal

The Resource Description Framework (RDF) is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. The recommended language to query RDF data is SPARQL. Even though the number of publicly available knowledge graphs is increasing, many data sources are still available in classical formats like relational databases. In some cases it is not possible to transform the data models into one common format and integrate them all in one place. This thesis aims at virtual data integration by transforming the queries during query processing.
The goal of this thesis is to support virtual knowledge graph creation by transforming SPARQL queries into query languages that are natively supported by various database models. The new approach will be integrated into an existing query engine. The work also includes analyzing the state-of-the-art translators as well as comparing their performance with the proposed approach.

Requirements

Enrollment at a German University
Good English skills (written and spoken)
Good programming skills (Python)

Useful Courses / Skills

Grundlagen der Datenbanksysteme (Introduction to Database Systems)
Datenstrukturen und Algorithmen (Data Structures and Algorithms)
Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
Komplexität von Algorithmen (Algorithms and Complexity)

Topics

Big Data
Knowledge Graphs
Query Processing
Privacy

Literature

[1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

[2] https://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/

[3] K.M. Endris, Z. Almhithawi, I. Lytra, M.-E. Vidal, S. Auer: BOUNCER: Privacy-Aware Query Processing over Federations of RDF Datasets. 2018. URL: https://doi.org/10.1007/978-3-319-98809-2_5

Representing SPARQL Query Plans in RDF

Representing query plans of SPARQL queries in RDF.

Type: Master's Thesis
Language: English
Supervisors: Philipp D. Rohde, Maria-Esther Vidal

The Resource Description Framework (RDF) [1] is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. SPARQL [2] is the W3C recommended language to query RDF data. Query engines implement different methods for source selection, query decomposition, physical operators, etc. Hence, they produce different query plans.
The goal of this thesis is to represent SPARQL query plans in RDF. This might be achieved by reusing parts of other vocabularies like SPIN [3].

Requirements

Enrollment at a German University
Good English skills (written and spoken)
Good programming skills (Python)

Useful Courses / Skills

Grundlagen der Datenbanksysteme (Introduction to Database Systems)
Datenstrukturen und Algorithmen (Data Structures and Algorithms)
Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)

Topics

Knowledge Graphs
Query Processing

Literature

[1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

[2] https://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/

[3] https://www.w3.org/Submission/2011/SUBM-spin-sparql-20110222/

Structured Digital Learning and AI Optimization for Heart Failure Education

Type: Master's Thesis
Language: English
Supervisors: Disha Purohit, Maria-Esther Vidal

Implantable defibrillators are widely used in patients with chronic heart failure. To improve patient outcomes and minimize complications, comprehensive monitoring and, effective patient education are essential. Various tools have been developed to integrate (semi-)continuous monitoring into routine clinical practice.

To support patient education, there are numerous websites and platforms available online that provide information for people with heart failure. However, much of this information comes from other patients or unverified sources, with limited input from medical societies. In addition, structured approaches and tools specifically designed to educate this patient population remain scarce. To highlight the importance of patient involvement, tools such as patient-reported outcome measures (PROMs) have been developed to provide objective, structured, and patient-centered assessments. This work will be carried out in the context of DigiStrucMed and will build on an existing structured digital learning program, DigiHeartICD, developed for patients with heart failure and implantable cardioverter defibrillators. DigiHeartICD consists of several modules that address key topics relevant to this patient population, such as the definition of heart failure, diagnostic procedures, treatment options, and living with an ICD. The content of the program has been compiled from authoritative sources (e.g., guidelines, websites, and recommendations) that have been reviewed by cardiac societies. This data served as the basis for generating patient-focused content using a Large Language Model (LLM), ChatGPT, as one of the primary goals of the project was to explore the potential of generative AI in advancing patient education. The goal of this thesis is to conduct an empirical evaluation of the DigiHeartICD and to develop a neuro-symbolic system to reduce hallucinations and improve intelligibility. The specific objectives of the project are as follows

Design an empirical evaluation of the effectiveness of DigiHeartICD.
Identify the specific conditions in the prompts and requests that lead to hallucinations in the existing digital learning program, DigiHeartICD, when generating patient education content.
Develop and apply advanced techniques, such as prompt engineering and retrieval-assisted generation, to minim

Requirements

Enrollment at a German University
Good English skills (written and spoken)
Good programming skills (Python)

Useful Courses

Data Structures and Algorithms
Knowledge Engineering and Semantic Web

Topics

Knowledge Graphs
Large Language Models
Neuro-symbolic AI

Course registration and further information