Offene Abschulessarbeiten

Theses (Bachelor/Master):

Dieser Abschnitt enthält aktuelle Themen und Informationen zu Studien- und Abschlussarbeiten am Fachgebiet Scientific Data Management:

  • Efficient Computation of Detailed Source Descriptions for Knowledge Graphs

    Computation of semantic source descriptions for federations of knowledge graphs in an efficient way.

     

    When querying a system that consists of several knowledge graphs, the system needs to decide which parts of the query can be answered from which knowledge graph. Most systems use simple source descriptions, however, more detailed source descriptions enable the system to find better plans.
    The goal of this thesis is to provide a formal definition as well as an implementation for an algorithm that efficiently collects detailed source descriptions for knowledge graphs.

     

    Voraussetzungen

    • Immatrikulation an einer deutschen Universität

    • Gute Englischkenntnisse in Schrift und Wort

    • Gute Programmierkenntnisse in Python

     

    Hilfreiche Lehrveranstaltungen

    • Datenstrukturen und Algorithmen

    • Knowledge Engineering und Semantic Web

    • Komplexität von Algorithmen

     

    Abgedeckte Themen

    • Big Data

    • Knowledge Graphs

    • Query Processing

  • Efficient Generation of Knowledge Graphs using RML-star with JSON and XML

    Efficiently generating RDF-star data from JSON and XML using RML-star. 

     

    In recent years, the amount of data generated has increased exponentially, and knowledge graphs have gained attention as data structures to integrate data and knowledge harvested from myriad data sources. Thus, the need to develop knowledge graph creation engines capable of handling data complexities like large volume, high-duplicate rate, and heterogeneity. The SDM-RDFizer is a knowledge graph creation engine that follows the standard established by the RDF Mapping Language (RML). RML is a mapping language that expresses customized mapping rules from heterogeneous data structures and serializations to the RDF data model. This thesis aims to define an extension of the SDM-RDFizer that allows the tool to transform RML-star mappings using JSON or XML files as a data source into knowledge graphs. RML-star is an extension of RML, which uses the RDF-star data model.

     

    Arbeiten

    • Arbeit 1: JSON
    • Arbeit 2: XML

     

    Voraussetzungen

    • Immatrikulation an einer deutschen Universität
    • Gute Englischkenntnisse in Schrift und Wort
    • Gute Programmierkenntnisse in Python
    • Kenntnisse in Mapping Languages

     

    Hilfreiche Lehrveranstaltungen

    • Datenstrukturen und Algorithmen
    • Knowledge Engineering und Semantic Web
    • Komplexität von Algorithmen
    • Scientific Data Management and Knowledge Graphs

     

    Abgedeckte Themen

    • Big Data
    • Knowledge Graph Creation
    • Mapping Languages

     

    Literatur

    [1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

    [2] https://w3c.github.io/rdf-star/cg-spec/editors_draft.html

    [3] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana, M.-E. Vidal: SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. 2020. URL: https://doi.org/10.1145/3340531.3412881

    [4] E. Iglesias, S. Jozashoori, M.-E. Vidal: Scaling up knowledge graph creation to large and heterogeneous data sources

    . 2022. URL: doi.org/10.48550/arXiv.2201.09694

    [5] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana, M.-E. Vidal: Empowering the SDM-RDFizer Tool for Scaling Up to Complex Knowledge Graph Creation Pipelines (Under Review). 2023. URL: https://www.semantic-web-journal.net/system/files/swj3246.pdf

    [6] A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannes, R. Van de Walle: RML: A Generic Language for Integrated RDF Mappings of Hererogeneous Data. 2014. https://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf

     

  • Efficient Mining of Horn Rules from Knowledge Graphs

    Efficiently mining horn rules from knowledge graphs using SPARQL queries.

     

    Rule Mining is the process of discovering interesting patterns or relationships between the variables. Mining rules on top of Knowledge Graphs is discovering patterns between the entities present in it. The goal of this thesis is to provide a formal definition as well as an implementation for an algorithm that is able to mine rules from Knowledge Graphs. This algorithm can than be enhanced to mine rules from multiple knowledge graphs.

     

    Voraussetzungen

    • Immatrikulation an einer deutschen Universität
    • Gute Englischkenntnisse in Schrift und Wort
    • Gute Programmierkenntnisse in Python

     

    Hilfreiche Lehrveranstaltungen

    • Datenstrukturen und Algorithmen
    • Knowledge Engineering und Semantic Web
    • Komplexität von Algorithmen
    • Scientific Data Management and Knowledge Graphs

     

    Abgedeckte Themen

    • Big Data
    • Knowledge Graphs
    • Rule Mining
  • Efficient Query Processing by Discovering Synonym Predicates in Knowledge Graphs

    Complete query answers by discovering synonymous predicates.

     

    Every knowledge graph consists of many duplicated data and metadata which have the same meaning but are defined differently. Thus, synonym predicates can connect the same resource to different entities which leads query engines to retrieve incomplete answers. This thesis aims to enhance query processing by discovering the synonymous predicates to retrieve the complete answers.

     

    Voraussetzungen

    • Immatrikulation an einer deutschen Universität
    • Gute Englischkenntnisse in Schrift und Wort
    • Gute Programmierkenntnisse in Python

     

    Hilfreiche Lehrveranstaltungen

    • Knowledge Engineering und Semantic Web
    • Datenstrukturen und Algorithmen
    • Komplexität von Algorithmen
    • Scientific Data Management and Knowledge Graphs

     

    Abgedeckte Themen

     

    • Big Data
    • Knowledge Graphs
    • Query Processing
  • Efficient Validation of RDF Data using SHACL

    Efficiently validating integrity constraints over RDF data using SHACL and SPARQL.

     

    The Resource Description Framework (RDF) is the W3C standard for publishing and exchanging data on the Web. Many data sources suffer from data quality issues. The Shapes Constraint Language (SHACL) is the W3C recommendation language for defining integrity constraints over RDF data. Corman et. al [1] showed that the validation of an RDF data source using an arbitrary SHACL shape schema is NP-hard. The goal of this thesis is to define efficient methods to validate SHACL shape schemas over RDF data sources accessible via SPARQL; a query language for RDF data sources. The implementation part of the thesis will be based on an already existing prototype for simple constraints.

     

    Voraussetzungen

    • Immatrikulation an einer deutschen Universität

    • Gute Englischkenntnisse in Schrift und Wort

    • Gute Programmierkenntnisse in Python

     

    Hilfreiche Lehrveranstaltungen

    • Grundlagen der Datenbanksysteme

    • Datenstrukturen und Algorithmen

    • Knowledge Engineering und Semantic Web

    • Komplexität von Algorithmen

     

    Abgedeckte Themen

    • Big Data

    • Knowledge Graphs

    • Quality Assessment

     

    Literatur

    [1] J. Corman, J.L. Reutter, O. Savković: Semantics and Validation of Recursive SHACL. 2018. 

    [2] J. Corman, F. Florenzano, J.L. Reutter, O. Savković: Validating SHACL Constraints over a SPARQL Endpoint. 2019. 

    [3] M. Figuera, P.D. Rohde, M.-E. Vidal: Trav-SHACL: Efficiently Validating Networks of SHACL Constraints. 2021. 

  • Extending SPARQL with SHACL-validation-based Filters

    Extending SPARQL with Filters based on SHACL validation results. 

     

    The Resource Description Framework (RDF) [1] is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. The Shapes Constraint Language (SHACL) [2] is the W3C recommendation language for defining integrity constraints over RDF data. In SHACL, constraints are expressed as a network of shapes, called SHACL shape schema. A shape represents integrity constraints over the properties of a class or set of entities. However, in contrast to relational databases, those integrity constraints are not checked during data insertion. The evaluation of a SHACL shape schema reports the entities that do not satisfy the imposed constraints; Trav-SHACL [3] is an engine capable of validating SHACL shape schemas against knowledge graphs accessible via SPARQL endpoints. SPARQL [4] is the W3C recommended language to query RDF data. Recently, Rohde [5] proposed to annotate the query results with the results from the validation for more transparency.
    The goal of this thesis is to define new SPARQL filters that are capable of filtering query results given a shape and desired validation result. The implementation part of the thesis will be based on an already existing prototype for annotating the query results with the validation result.

     

    Voraussetzungen

    • Immatrikulation an einer deutschen Universität

    • Gute Englischkenntnisse in Schrift und Wort

    • Gute Programmierkenntnisse in Python

     

    Hilfreiche Lehrveranstaltungen

    • Grundlagen der Datenbanksysteme

    • Datenstrukturen und Algorithmen

    • Knowledge Engineering und Semantic Web

    • Komplexität von Algorithmen

     

    Abgedeckte Themen

    • Big Data

    • Knowledge Graphs

    • Query Processing

    • Quality Assessment

     

    Literatur

    [1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

    [2] https://www.w3.org/TR/2017/REC-shacl-20170720/

    [3] M. Figuera, P.D. Rohde, M.-E. Vidal: Trav-SHACL: Efficiently Validating Networks of SHACL Constraints. 2021. 

    [4] https://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/

    [5] P.D. Rohde: SHACL Constraint Validation during SPARQL Query Processing. 2021. 

  • On-the-fly Semantification for Querying Heterogeneous Sources with SPARQL

    Extending a SPARQL query engine to other data formats using the SDM-RDFizer as a wrapper.

     

    The Resource Description Framework (RDF) [1] is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. SPARQL [2] is the W3C recommended language to query RDF data. However, data on the Web are still available in many different formats. The SDM-RDFizer [3] is a tool that is able, with the use of mappings specified in the RDF Mapping Language (RML) [4], to semantify various data formats.
    The goal of this thesis is to define an efficient approach to use on-the-fly semantification for non-RDF sources to answer SPARQL queries. The implementation part of the thesis will be based on an already existing SPARQL query engine which will be extended to collect data from non-RDF sources using the SDM-RDFizer as a wrapper.

     

    Voraussetzungen

    • Immatrikulation an einer deutschen Universität
    • Gute Englischkenntnisse in Schrift und Wort
    • Gute Programmierkenntnisse in Python

     

    Hilfreiche Lehrveranstaltungen

    • Grundlagen der Datenbanksysteme
    • Datenstrukturen und Algorithmen
    • Knowledge Engineering und Semantic Web
    • Komplexität von Algorithmen
    • Scientific Data Management and Knowledge Graphs

     

    Abgedeckte Themen

    • Big Data
    • Knowledge Graphs
    • Query Processing
    • Data Integration

     

    Literatur

    [1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

    [2] https://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/

    [3] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana, M.-E. Vidal: SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. 2020. URL: https://doi.org/10.1145/3340531.3412881

    [4] A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannes, R. Van de Walle: RML: A Generic Language for Integrated RDF Mappings of Hererogeneous Data. 2014. https://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf

  • Query Processing Guided by Mined Rules

    Make use of mined horn rules in query processing, i.e., query planning and execution.

     

     

    The Resource Description Framework (RDF) [1] is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. SPARQL [2] is the W3C recommended language to query RDF data. In this context, rule mining is the process of discovering patterns between entities in a knowledge graph. The mined horn rules can be used during query processing.
    The goal of this thesis is to define an algorithm that considers mined rules (and their metrics) during query decomposition, query optimization, and query execution.

     

    Voraussetzungen

    • Immatrikulation an einer deutschen Universität
    • Gute Englischkenntnisse in Schrift und Wort
    • Gute Programmierkenntnisse in Python

     

    Hilfreiche Lehrveranstaltungen

    • Grundlagen der Datenbanksysteme
    • Datenstrukturen und Algorithmen
    • Knowledge Engineering und Semantic Web
    • Komplexität von Algorithmen
    • Scientific Data Management and Knowledge Graphs

     

    Abgedeckte Themen

    • Big Data
    • Knowledge Graphs
    • Query Processing
    • Rule Mining

     

    Literatur

    [1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

    [2] https://www.w3.org/TR/2017/REC-shacl-20170720/

  • Translating SPARQL Queries to Native Query Languages of Various DB Models Supporting Virtual Knowledge Graph Creation

    Translating SPARQL Queries to Native Query Languages of Various DB Models Supporting Virtual Knowledge Graph Creation

     

    The Resource Description Framework (RDF) is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. The recommended language to query RDF data is SPARQL. Even though the number of publicly available knowledge graphs is increasing, many data sources are still available in classical formats like relational databases. In some cases it is not possible to transform the data models into one common format and integrate them all in one place. This thesis aims at virtual data integration by transforming the queries during query processing.
    The goal of this thesis is to support virtual knowledge graph creation by transforming SPARQL queries into query languages that are natively supported by various database models. The new approach will be integrated into an existing query engine. The work also includes analyzing the state-of-the-art translators as well as comparing their performance with the proposed approach.

     

    Voraussetzungen

    • Immatrikulation an einer deutschen Universität

    • Gute Englischkenntnisse in Schrift und Wort

    • Gute Programmierkenntnisse in Python

     

    Hilfreiche Lehrveranstaltungen

    • Grundlagen der Datenbanksysteme

    • Datenstrukturen und Algorithmen

    • Knowledge Engineering und Semantic Web

    • Komplexität von Algorithmen

     

    Abgedeckte Themen

    • Big Data

    • Knowledge Graphs

    • Query Processing

    • Data Integration

  • Privacy-aware Query Processing

    Integrating privacy mechanisms into query processing.

     

    The Resource Description Framework (RDF) [1] is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. SPARQL [2] is the W3C recommended language to query RDF data. Query engines implemented to execute SPARQL queries over knowledge graphs usually assume that the data is not restricted by privacy policies. BOUNCER [3] is an exception from that rule and implements simple privacy policies considered during query processing.
    The goal of this thesis is to evaluate and potentially extend the privacy policies defined in BOUNCER. The implementation part of the thesis will be based on an already existing SPARQL query engine.

     

    Voraussetzungen

    • Immatrikulation an einer deutschen Universität

    • Gute Englischkenntnisse in Schrift und Wort

    • Gute Programmierkenntnisse in Python

     

    Hilfreiche Lehrveranstaltungen

    • Grundlagen der Datenbanksysteme
    • Datenstrukturen und Algorithmen
    • Knowledge Engineering und Semantic Web
    • Scientific Data Management and Knowledge Graphs
    • Komplexität von Algorithmen

     

    Abgedeckte Themen

    • Big Data

    • Knowledge Graphs

    • Query Processing

    • Privacy

    Literatur

    [1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

    [2] https://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/

    [3] K.M. Endris, Z. Almhithawi, I. Lytra, M.-E. Vidal, S. Auer: BOUNCER: Privacy-Aware Query Processing over Federations of RDF Datasets. 2018. URL: https://doi.org/10.1007/978-3-319-98809-2_5

  • Representing SPARQL Query Plans in RDF

    Representing query plans of SPARQL queries in RDF.

    The Resource Description Framework (RDF) [1] is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. SPARQL [2] is the W3C recommended language to query RDF data. Query engines implement different methods for source selection, query decomposition, physical operators, etc. Hence, they produce different query plans.
    The goal of this thesis is to represent SPARQL query plans in RDF. This might be achieved by reusing parts of other vocabularies like SPIN [3].

    Voraussetzungen

    • Immatrikulation an einer deutschen Universität

    • Gute Englischkenntnisse in Schrift und Wort

    • Gute Programmierkenntnisse in Python

     

    Hilfreiche Lehrveranstaltungen

    • Grundlagen der Datenbanksysteme
    • Datenstrukturen und Algorithmen
    • Knowledge Engineering und Semantic Web
    • Scientific Data Management and Knowledge Graphs

    Abgedeckte Themen

    • Knowledge Graphs
    • Query Processing

    Literatur

    [1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

    [2] https://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/

    [3] https://www.w3.org/Submission/2011/SUBM-spin-sparql-20110222/