Offene Abschlussarbeiten

  • Efficient Computation of Detailed Source Descriptions for Knowledge Graphs

    Computation of semantic source descriptions for federations of knowledge graphs in an efficient way.

     

    When querying a system that consists of several knowledge graphs, the system needs to decide which parts of the query can be answered from which knowledge graph. Most systems use simple source descriptions, however, more detailed source descriptions enable the system to find better plans.
    The goal of this thesis is to provide a formal definition as well as an implementation for an algorithm that efficiently collects detailed source descriptions for knowledge graphs.

     

    Voraussetzungen

    • Immatrikulation an einer deutschen Universität

    • Gute Englischkenntnisse in Schrift und Wort

    • Gute Programmierkenntnisse in Python

     

    Hilfreiche Lehrveranstaltungen

    • Datenstrukturen und Algorithmen

    • Knowledge Engineering und Semantic Web

    • Komplexität von Algorithmen

     

    Abgedeckte Themen

    • Big Data

    • Knowledge Graphs

    • Query Processing

  • Efficient Validation of RDF Data using SHACL

    Efficiently validating integrity constraints over RDF data using SHACL and SPARQL.

     

    The Resource Description Framework (RDF) is the W3C standard for publishing and exchanging data on the Web. Many data sources suffer from data quality issues. The Shapes Constraint Language (SHACL) is the W3C recommendation language for defining integrity constraints over RDF data. Corman et. al [1] showed that the validation of an RDF data source using an arbitrary SHACL shape schema is NP-hard. The goal of this thesis is to define efficient methods to validate SHACL shape schemas over RDF data sources accessible via SPARQL; a query language for RDF data sources. The implementation part of the thesis will be based on an already existing prototype for simple constraints.

     

    Voraussetzungen

    • Immatrikulation an einer deutschen Universität

    • Gute Englischkenntnisse in Schrift und Wort

    • Gute Programmierkenntnisse in Python

     

    Hilfreiche Lehrveranstaltungen

    • Grundlagen der Datenbanksysteme

    • Datenstrukturen und Algorithmen

    • Knowledge Engineering und Semantic Web

    • Komplexität von Algorithmen

     

    Abgedeckte Themen

    • Big Data

    • Knowledge Graphs

    • Quality Assessment

     

    Literatur

    [1] J. Corman, J.L. Reutter, O. Savković: Semantics and Validation of Recursive SHACL. 2018. 

    [2] J. Corman, F. Florenzano, J.L. Reutter, O. Savković: Validating SHACL Constraints over a SPARQL Endpoint. 2019. 

    [3] M. Figuera, P.D. Rohde, M.-E. Vidal: Trav-SHACL: Efficiently Validating Networks of SHACL Constraints. 2021. 

  • Efficiently Validating Property Graphs

    Efficiently validating integrity constraints over property graphs. 

     

    Property graphs are commonly used to represent knowledge. Recently, a language was proposed to validate the data quality of property graphs called Property Graph Shapes Language (ProGS) [1]. The goal of this thesis is to define an efficient algorithm to validate a property graph given a set of constraints expressed in ProGS.

     

    Voraussetzungen

    • Immatrikulation an einer deutschen Universität

    • Gute Englischkenntnisse in Schrift und Wort

    • Gute Programmierkenntnisse in Python

    • Erfahrung mit Graphdatenbanken (z.B. Neo4j)

     

    Hilfreiche Lehrveranstaltungen

    • Grundlagen der Datenbanksysteme

    • Datenstrukturen und Algorithmen

    • Knowledge Engineering und Semantic Web

    • Komplexität von Algorithmen

     

    Abgedeckte Themen

    • Graph Databases

    • Quality Assessment

     

    Literatur

    [1] P. Seifer, R. Lämmel, S. Staab: ProGS: Property Graph Shapes Language. 2021.

  • Extending SPARQL with SHACL-validation-based Filters

    Extending SPARQL with Filters based on SHACL validation results. 

     

    The Resource Description Framework (RDF) [1] is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. The Shapes Constraint Language (SHACL) [2] is the W3C recommendation language for defining integrity constraints over RDF data. In SHACL, constraints are expressed as a network of shapes, called SHACL shape schema. A shape represents integrity constraints over the properties of a class or set of entities. However, in contrast to relational databases, those integrity constraints are not checked during data insertion. The evaluation of a SHACL shape schema reports the entities that do not satisfy the imposed constraints; Trav-SHACL [3] is an engine capable of validating SHACL shape schemas against knowledge graphs accessible via SPARQL endpoints. SPARQL [4] is the W3C recommended language to query RDF data. Recently, Rohde [5] proposed to annotate the query results with the results from the validation for more transparency.
    The goal of this thesis is to define new SPARQL filters that are capable of filtering query results given a shape and desired validation result. The implementation part of the thesis will be based on an already existing prototype for annotating the query results with the validation result.

     

    Voraussetzungen

    • Immatrikulation an einer deutschen Universität

    • Gute Englischkenntnisse in Schrift und Wort

    • Gute Programmierkenntnisse in Python

     

    Hilfreiche Lehrveranstaltungen

    • Grundlagen der Datenbanksysteme

    • Datenstrukturen und Algorithmen

    • Knowledge Engineering und Semantic Web

    • Komplexität von Algorithmen

     

    Abgedeckte Themen

    • Big Data

    • Knowledge Graphs

    • Query Processing

    • Quality Assessment

     

    Literatur

    [1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

    [2] https://www.w3.org/TR/2017/REC-shacl-20170720/

    [3] M. Figuera, P.D. Rohde, M.-E. Vidal: Trav-SHACL: Efficiently Validating Networks of SHACL Constraints. 2021. 

    [4] https://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/

    [5] P.D. Rohde: SHACL Constraint Validation during SPARQL Query Processing. 2021. 

  • Negative Sampling using Integrity Constraints

    Exploiting results from integrity constraint validation in negative sampling. 

     

    The Resource Description Framework (RDF) is the W3C standard for publishing and exchanging data on the Web. An RDF graph is also called a Knowledge Graph. Embeddings are a low-dimensional space into which high-dimensional vectors can be translated. The goal of this work is to use integrity constraints over the Knowledge Graph to create the negative samples.

     

    Voraussetzungen

    • Immatrikulation an einer deutschen Universität

    • Gute Englischkenntnisse in Schrift und Wort

    • Gute Programmierkenntnisse in Python

     

    Hilfreiche Lehrveranstaltungen

    • Knowledge Engineering und Semantic Web

    • Datenstrukturen und Algorithmen

    • Machine Learning for Graphs

    • (Statistical) Natural Language Processing

     

    Abgedeckte Themen

    • Knowledge Graphs

    • Embeddings

  • Translating SPARQL Queries to Native Query Languages of Various DB Models Supporting Virtual Knowledge Graph Creation

    Translating SPARQL Queries to Native Query Languages of Various DB Models Supporting Virtual Knowledge Graph Creation

     

    The Resource Description Framework (RDF) is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. The recommended language to query RDF data is SPARQL. Even though the number of publicly available knowledge graphs is increasing, many data sources are still available in classical formats like relational databases. In some cases it is not possible to transform the data models into one common format and integrate them all in one place. This thesis aims at virtual data integration by transforming the queries during query processing.
    The goal of this thesis is to support virtual knowledge graph creation by transforming SPARQL queries into query languages that are natively supported by various database models. The new approach will be integrated into an existing query engine. The work also includes analyzing the state-of-the-art translators as well as comparing their performance with the proposed approach.

     

    Voraussetzungen

    • Immatrikulation an einer deutschen Universität

    • Gute Englischkenntnisse in Schrift und Wort

    • Gute Programmierkenntnisse in Python

     

    Hilfreiche Lehrveranstaltungen

    • Grundlagen der Datenbanksysteme

    • Datenstrukturen und Algorithmen

    • Knowledge Engineering und Semantic Web

    • Komplexität von Algorithmen

     

    Abgedeckte Themen

    • Big Data

    • Knowledge Graphs

    • Query Processing

    • Data Integration

  • Predicting the Neoadjuvant Treatment Outcome and Relevant Biomarkers of Breast Cancer Patients

    Applying Knowledge Graph Embedding techniques to Breast Cancer data that is obtained from MHH to predict the outcome of the neoadjuvant treatment and finding out relevant biomarkers. 

     

    AI has revolutionized many industries including healthcare thanks to plethora of data that has been compiled in the recent years together with a fruitful symbiosis with high computational power. However, healthcare industry is a highly sensitive industry that cannot tolerate black-box models that make accurate but blind predictions. Therefore, there remains a high potential for investigating the application of state-of-the-art machine learning models on top of the healthcare data. We are in a close collaboration with Medical School Hannover (MHH) and hence are using a Breast Cancer data that is obtained through real-world clinical studies by MHH. Data is composed of three main categories: clinical, gene and socio-economic records of breast cancer patients.

     

    Voraussetzungen

    • Immatrikulation an einer deutschen Universität
    • Gute Englischkenntnisse in Schrift und Wort
    • Gute Programmierkenntnisse in Python

     

    Hilfreiche Lehrveranstaltungen

    • Knowledge Engineering und Semantic Web
    • Maschinelles Lernen (Machine Learning)
    • Labor: Artificial Intelligence

     

    Literatur

    [1] Bordes, Antoine, et al. "Translating embeddings for modeling multi-relational data." Advances in neural information processing systems 26 (2013).

    [2] Trouillon, Théo, et al. "Complex embeddings for simple link prediction." International conference on machine learning. PMLR, 2016.

    [3] Sun, Zhiqing, et al. "Rotate: Knowledge graph embedding by relational rotation in complex space." arXiv preprint arXiv:1902.10197 (2019).

    [4] Zhang, Shuai, et al. "Quaternion knowledge graph embeddings." Advances in neural information processing systems32 (2019).