OPEN THESES

Theses (Bachelor/Master):

This section contains current topics and information on studies and thesis work in the field of Scientific Data Management:

  • Query Processing Guided by Mined Rules

    Make use of mined horn rules in query processing, i.e., query planning and execution.

     

    The Resource Description Framework (RDF) [1] is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. SPARQL [2] is the W3C recommended language to query RDF data. In this context, rule mining is the process of discovering patterns between entities in a knowledge graph. The mined horn rules can be used during query processing.
    The goal of this thesis is to define an algorithm that considers mined rules (and their metrics) during query decomposition, query optimization, and query execution.

     

    Requirements

    • Enrollment at a German University
    • Good English skills (written and spoken)
    • Good programming skills (Python)

     

    Useful Courses

    • Grundlagen der Datenbanksysteme (Introduction to Database Systems)
    • Datenstrukturen und Algorithmen (Data Structures and Algorithms)
    • Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
    • Komplexität von Algorithmen (Algorithms and Complexity)
    • Scientific Data Management and Knowledge Graphs

     

    Topics

    • Big Data
    • Knowledge Graphs
    • Query Processing
    • Rule Mining

     

    Literature

    [1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

    [2] https://www.w3.org/TR/2017/REC-shacl-20170720/

  • Efficient Computation of Detailed Source Descriptions for Knowledge Graphs

    Computation of semantic source descriptions for federations of knowledge graphs in an efficient way.

     

     

    When querying a system that consists of several knowledge graphs, the system needs to decide which parts of the query can be answered from which knowledge graph. Most systems use simple source descriptions, however, more detailed source descriptions enable the system to find better plans.
    The goal of this thesis is to provide a formal definition as well as an implementation for an algorithm that efficiently collects detailed source descriptions for knowledge graphs.

     

    Requirements

    • Enrollment at a German University
    • Good English skills (written and spoken)
    • Good programming skills (Python)

     

    Useful Courses

    • Datenstrukturen und Algorithmen (Data Structures and Algorithms)
    • Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
    • Komplexität von Algorithmen (Algorithms and Complexity)

     

    Topics

    • Big Data

    • Knowledge Graphs

    • Query Processing

  • Efficient Generation of Knowledge Graphs using RML-star with JSON and XML

    Efficiently generating RDF-star data from JSON and XML using RML-star. 

     

     

    In recent years, the amount of data generated has increased exponentially, and knowledge graphs have gained attention as data structures to integrate data and knowledge harvested from myriad data sources. Thus, the need to develop knowledge graph creation engines capable of handling data complexities like large volume, high-duplicate rate, and heterogeneity. The SDM-RDFizer is a knowledge graph creation engine that follows the standard established by the RDF Mapping Language (RML). RML is a mapping language that expresses customized mapping rules from heterogeneous data structures and serializations to the RDF data model. This thesis aims to define an extension of the SDM-RDFizer that allows the tool to transform RML-star mappings using JSON or XML files as a data source into knowledge graphs. RML-star is an extension of RML, which uses the RDF-star data model.

     

    Theses

    • Thesis 1: JSON
    • Thesis 2: XML

     

    Requirements

    • Enrollment at a German University
    • Good English skills (written and spoken)
    • Good programming skills (Python)
    • Knowledge in Mapping Languages

     

    Useful Courses

    • Datenstrukturen und Algorithmen (Data Structures and Algorithms)
    • Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
    • Komplexität von Algorithmen (Algorithms and Complexity)
    • Scientific Data Management and Knowledge Graphs

     

    Topics

    • Big Data
    • Knowledge Graph Creation
    • Mapping Languages

     

    Literature

    [1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

    [2] https://w3c.github.io/rdf-star/cg-spec/editors_draft.html

    [3] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana, M.-E. Vidal: SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. 2020. URL: https://doi.org/10.1145/3340531.3412881

    [4] E. Iglesias, S. Jozashoori, M.-E. Vidal: Scaling up knowledge graph creation to large and heterogeneous data sources

    . 2022. URL: doi.org/10.48550/arXiv.2201.09694

    [5] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana, M.-E. Vidal: Empowering the SDM-RDFizer Tool for Scaling Up to Complex Knowledge Graph Creation Pipelines (Under Review). 2023. URL: https://www.semantic-web-journal.net/system/files/swj3246.pdf

    [6] A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannes, R. Van de Walle: RML: A Generic Language for Integrated RDF Mappings of Hererogeneous Data. 2014. https://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf

     

  • Efficient Mining of Horn Rules from Knowledge Graphs

    Efficiently mining horn rules from knowledge graphs using SPARQL queries.

     

     

    Rule Mining is the process of discovering interesting patterns or relationships between the variables. Mining rules on top of Knowledge Graphs is discovering patterns between the entities present in it. The goal of this thesis is to provide a formal definition as well as an implementation for an algorithm that is able to mine rules from Knowledge Graphs. This algorithm can than be enhanced to mine rules from multiple knowledge graphs.

     

    Requirements

    • Enrollment at a German University
    • Good English skills (written and spoken)
    • Good programming skills (Python)

     

    Useful Courses

    • Datenstrukturen und Algorithmen (Data Structures and Algorithms)
    • Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
    • Komplexität von Algorithmen (Algorithms and Complexity)
    • Scientific Data Management and Knowledge Graphs

     

    Topics

    • Big Data
    • Knowledge Graphs
    • Rule Mining
  • Efficient Query Processing by Discovering Synonym Predicates in Knowledge Graphs

    Complete query answers by discovering synonymous predicates.

     

     

    Every knowledge graph consists of many duplicated data and metadata which have the same meaning but are defined differently. Thus, synonym predicates can connect the same resource to different entities which leads query engines to retrieve incomplete answers. This thesis aims to enhance query processing by discovering the synonymous predicates to retrieve the complete answers.

     

    Requirements

    • Enrollment at a German University
    • Good English skills (written and spoken)
    • Good programming skills (Python)

     

    Useful Courses

    • Datenstrukturen und Algorithmen (Data Structures and Algorithms)
    • Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
    • Komplexität von Algorithmen (Algorithms and Complexity)
    • Scientific Data Management and Knowledge Graphs

     

    Topics

    • Big Data
    • Knowledge Graphs
    • Query Processing
  • Efficient Validation of RDF Data using SHACL

    Efficiently validating integrity constraints over RDF data using SHACL and SPARQL.

     

     

    The Resource Description Framework (RDF) is the W3C standard for publishing and exchanging data on the Web. Many data sources suffer from data quality issues. The Shapes Constraint Language (SHACL) is the W3C recommendation language for defining integrity constraints over RDF data. Corman et. al [1] showed that the validation of an RDF data source using an arbitrary SHACL shape schema is NP-hard. The goal of this thesis is to define efficient methods to validate SHACL shape schemas over RDF data sources accessible via SPARQL; a query language for RDF data sources. The implementation part of the thesis will be based on an already existing prototype for simple constraints.

     

    Requirements

    • Enrollment at a German University
    • Good English skills (written and spoken)
    • Good programming skills (Python)

     

    Useful Courses

    • Grundlagen der Datenbanksysteme (Introduction to Database Systems)
    • Datenstrukturen und Algorithmen (Data Structures and Algorithms)
    • Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
    • Komplexität von Algorithmen (Algorithms and Complexity)

     

    Topics

    • Big Data

    • Knowledge Graphs

    • Quality Assessment

     

    Literature

    [1] J. Corman, J.L. Reutter, O. Savković: Semantics and Validation of Recursive SHACL. 2018. 

    [2] J. Corman, F. Florenzano, J.L. Reutter, O. Savković: Validating SHACL Constraints over a SPARQL Endpoint. 2019. 

    [3] M. Figuera, P.D. Rohde, M.-E. Vidal: Trav-SHACL: Efficiently Validating Networks of SHACL Constraints. 2021. 

  • Extending SPARQL with SHACL-validation-based Filters

    Extending SPARQL with Filters based on SHACL validation results. 

     

     

    The Resource Description Framework (RDF) [1] is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. The Shapes Constraint Language (SHACL) [2] is the W3C recommendation language for defining integrity constraints over RDF data. In SHACL, constraints are expressed as a network of shapes, called SHACL shape schema. A shape represents integrity constraints over the properties of a class or set of entities. However, in contrast to relational databases, those integrity constraints are not checked during data insertion. The evaluation of a SHACL shape schema reports the entities that do not satisfy the imposed constraints; Trav-SHACL [3] is an engine capable of validating SHACL shape schemas against knowledge graphs accessible via SPARQL endpoints. SPARQL [4] is the W3C recommended language to query RDF data. Recently, Rohde [5] proposed to annotate the query results with the results from the validation for more transparency.
    The goal of this thesis is to define new SPARQL filters that are capable of filtering query results given a shape and desired validation result. The implementation part of the thesis will be based on an already existing prototype for annotating the query results with the validation result.

     

    Requirements

    • Enrollment at a German University
    • Good English skills (written and spoken)
    • Good programming skills (Python)

     

    Useful Courses / Skills

    • Grundlagen der Datenbanksysteme (Introduction to Database Systems)
    • Datenstrukturen und Algorithmen (Data Structures and Algorithms)
    • Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
    • Komplexität von Algorithmen (Algorithms and Complexity)

    Topics

    • Big Data

    • Knowledge Graphs

    • Query Processing

    • Quality Assessment

     

    Literature

    [1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

    [2] https://www.w3.org/TR/2017/REC-shacl-20170720/

    [3] M. Figuera, P.D. Rohde, M.-E. Vidal: Trav-SHACL: Efficiently Validating Networks of SHACL Constraints. 2021. 

    [4] https://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/

    [5] P.D. Rohde: SHACL Constraint Validation during SPARQL Query Processing. 2021. 

  • On-the-fly Semantification for Querying Heterogeneous Sources with SPARQL

    Extending a SPARQL query engine to other data formats using the SDM-RDFizer as a wrapper.

     

     

    The Resource Description Framework (RDF) [1] is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. SPARQL [2] is the W3C recommended language to query RDF data. However, data on the Web are still available in many different formats. The SDM-RDFizer [3] is a tool that is able, with the use of mappings specified in the RDF Mapping Language (RML) [4], to semantify various data formats.
    The goal of this thesis is to define an efficient approach to use on-the-fly semantification for non-RDF sources to answer SPARQL queries. The implementation part of the thesis will be based on an already existing SPARQL query engine which will be extended to collect data from non-RDF sources using the SDM-RDFizer as a wrapper.

     

    Requirements

    • Enrollment at a German University
    • Good English skills (written and spoken)
    • Good programming skills (Python

     

    Useful Courses

    • Grundlagen der Datenbanksysteme (Introduction to Database Systems)
    • Datenstrukturen und Algorithmen (Data Structures and Algorithms)
    • Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
    • Komplexität von Algorithmen (Algorithms and Complexity)
    • Scientific Data Management and Knowledge Graphs

     

    Topics

    • Big Data
    • Knowledge Graphs
    • Query Processing
    • Data Integration

     

    Literature

    [1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

    [2] https://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/

    [3] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana, M.-E. Vidal: SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. 2020. URL: https://doi.org/10.1145/3340531.3412881

    [4] A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannes, R. Van de Walle: RML: A Generic Language for Integrated RDF Mappings of Hererogeneous Data. 2014. https://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf

  • Translating SPARQL Queries to Native Query Languages of Various DB Models Supporting Virtual Knowledge Graph Creation

    Translating SPARQL Queries to Native Query Languages of Various DB Models Supporting Virtual Knowledge Graph Creation

     

     

    The Resource Description Framework (RDF) is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. The recommended language to query RDF data is SPARQL. Even though the number of publicly available knowledge graphs is increasing, many data sources are still available in classical formats like relational databases. In some cases it is not possible to transform the data models into one common format and integrate them all in one place. This thesis aims at virtual data integration by transforming the queries during query processing.
    The goal of this thesis is to support virtual knowledge graph creation by transforming SPARQL queries into query languages that are natively supported by various database models. The new approach will be integrated into an existing query engine. The work also includes analyzing the state-of-the-art translators as well as comparing their performance with the proposed approach.

     

    Requirements

    • Enrollment at a German University
    • Good English skills (written and spoken)
    • Good programming skills (Python)

     

    Useful Courses / Skills

    • Grundlagen der Datenbanksysteme (Introduction to Database Systems)
    • Datenstrukturen und Algorithmen (Data Structures and Algorithms)
    • Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
    • Komplexität von Algorithmen (Algorithms and Complexity)

     

    Topics

    • Big Data

    • Knowledge Graphs

    • Query Processing

    • Data Integration

  • Privacy-aware Query Processing

    Integrating privacy mechanisms into query processing.

     

     

    The Resource Description Framework (RDF) is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. The recommended language to query RDF data is SPARQL. Even though the number of publicly available knowledge graphs is increasing, many data sources are still available in classical formats like relational databases. In some cases it is not possible to transform the data models into one common format and integrate them all in one place. This thesis aims at virtual data integration by transforming the queries during query processing.
    The goal of this thesis is to support virtual knowledge graph creation by transforming SPARQL queries into query languages that are natively supported by various database models. The new approach will be integrated into an existing query engine. The work also includes analyzing the state-of-the-art translators as well as comparing their performance with the proposed approach.

     

    Requirements

    • Enrollment at a German University
    • Good English skills (written and spoken)
    • Good programming skills (Python)

     

    Useful Courses / Skills

    • Grundlagen der Datenbanksysteme (Introduction to Database Systems)
    • Datenstrukturen und Algorithmen (Data Structures and Algorithms)
    • Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)
    • Komplexität von Algorithmen (Algorithms and Complexity)

     

    Topics

    • Big Data

    • Knowledge Graphs

    • Query Processing

    • Privacy

    Literature

    [1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

    [2] https://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/

    [3] K.M. Endris, Z. Almhithawi, I. Lytra, M.-E. Vidal, S. Auer: BOUNCER: Privacy-Aware Query Processing over Federations of RDF Datasets. 2018. URL: https://doi.org/10.1007/978-3-319-98809-2_5

  • Representing SPARQL Query Plans in RDF

    Representing query plans of SPARQL queries in RDF.

     

    The Resource Description Framework (RDF) [1] is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. SPARQL [2] is the W3C recommended language to query RDF data. Query engines implement different methods for source selection, query decomposition, physical operators, etc. Hence, they produce different query plans.
    The goal of this thesis is to represent SPARQL query plans in RDF. This might be achieved by reusing parts of other vocabularies like SPIN [3].

     

    Requirements

    • Enrollment at a German University
    • Good English skills (written and spoken)
    • Good programming skills (Python)

     

    Useful Courses / Skills

    • Grundlagen der Datenbanksysteme (Introduction to Database Systems)
    • Datenstrukturen und Algorithmen (Data Structures and Algorithms)
    • Knowledge Engineering und Semantic Web (Knowledge Engineering and Semantic Web)

     

    Topics

    • Knowledge Graphs

    • Query Processing

    Literature

    [1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

    [2] https://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/

    [3] https://www.w3.org/Submission/2011/SUBM-spin-sparql-20110222/