Offene Abschlussarbeiten
-
Capturing Visual and Textual Modalities into Knowledge Representation
Capturing knowledge from documents containing text and videos.
- Typ der Arbeit: Masterarbeit
- Bearbeitungssprache: Englisch
- Betreuer: Shahi Dost, Maria-Esther Vidal
To understand the content of a document containing both text and videos, an artificial agent needs to jointly recognize the entities shown in the videos and mentioned in the text and link them to its background knowledge. This is a complex task that needs to jointly address by utilizing the entities, shown in video frames, described in their captions, and linked with their background knowledge (semantics or metadata). Solving this task opens a wide range of opportunities for improving semantic visual interpretation, video captioning, visual indexing, and grounding of visual entities in text and vice-versa. The goal of this thesis is to uniformly represents and develop a knowledge graph consisting of videos, and their associated frames, each frame consists of visual objects, and each frame is defined by textual description, as well as the creation of knowledge graphs.
Voraussetzungen
- Immatrikulation an einer deutschen Universität
- Gute Englischkenntnisse in Schrift und Wort
- Gute Programmierkenntnisse in Python
Hilfreiche Lehrveranstaltungen / Skills
- Scientific Data Management and Knowledge Graphs
- Wissen im Bereich Computer Vision
- Wissen im Bereich Natural Language Processing (NLP)
Abgedeckte Themen
- Big Data
- Knowledge Graphs
- Computer Vision
- NLP
Literatur
[1] S. Dost, L. Serafini, M. Rospocher, L. Ballan, A. Sperduti: VTKEL: a resource for visual-textual-knowledge entity linking. 2020. URL: https://doi.org/10.1145/3341105.3373958
[2] S. Dost, L. Serafini, M. Rospocher, L. Ballan, A. Sperduti: Jointly linking visual and textual entity mentions with background knowledge. 2020. URL: https://doi.org/10.1007/978-3-030-51310-8_24
-
Efficient Computation of Detailed Source Descriptions for Knowledge Graphs
Computation of semantic source descriptions for federations of knowledge graphs in an efficient way.
-
Typ der Arbeit: Masterarbeit
-
Bearbeitungssprache: Englisch
-
Betreuer: Philipp D. Rohde, Maria-Esther Vidal
When querying a system that consists of several knowledge graphs, the system needs to decide which parts of the query can be answered from which knowledge graph. Most systems use simple source descriptions, however, more detailed source descriptions enable the system to find better plans.
The goal of this thesis is to provide a formal definition as well as an implementation for an algorithm that efficiently collects detailed source descriptions for knowledge graphs.Voraussetzungen
-
Immatrikulation an einer deutschen Universität
-
Gute Englischkenntnisse in Schrift und Wort
-
Gute Programmierkenntnisse in Python
Hilfreiche Lehrveranstaltungen
-
Datenstrukturen und Algorithmen
-
Knowledge Engineering und Semantic Web
-
Komplexität von Algorithmen
Abgedeckte Themen
-
Big Data
-
Knowledge Graphs
-
Query Processing
-
-
Efficient Generation of Knowledge Graphs using RML-star with JSON and XML
Efficiently generating RDF-star data from JSON and XML using RML-star.
- Typ der Arbeit: 2 Masterarbeiten
- Bearbeitungssprache: Englisch
- Betreuer: Enrique Iglesias, Maria-Esther Vidal
In recent years, the amount of data generated has increased exponentially, and knowledge graphs have gained attention as data structures to integrate data and knowledge harvested from myriad data sources. Thus, the need to develop knowledge graph creation engines capable of handling data complexities like large volume, high-duplicate rate, and heterogeneity. The SDM-RDFizer is a knowledge graph creation engine that follows the standard established by the RDF Mapping Language (RML). RML is a mapping language that expresses customized mapping rules from heterogeneous data structures and serializations to the RDF data model. This thesis aims to define an extension of the SDM-RDFizer that allows the tool to transform RML-star mappings using JSON or XML files as a data source into knowledge graphs. RML-star is an extension of RML, which uses the RDF-star data model.
Arbeiten
- Arbeit 1: JSON
- Arbeit 2: XML
Voraussetzungen
- Immatrikulation an einer deutschen Universität
- Gute Englischkenntnisse in Schrift und Wort
- Gute Programmierkenntnisse in Python
- Kenntnisse in Mapping Languages
Hilfreiche Lehrveranstaltungen
- Datenstrukturen und Algorithmen
- Knowledge Engineering und Semantic Web
- Komplexität von Algorithmen
- Scientific Data Management and Knowledge Graphs
Abgedeckte Themen
- Big Data
- Knowledge Graph Creation
- Mapping Languages
Literatur
[1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/
[2] https://w3c.github.io/rdf-star/cg-spec/editors_draft.html
[3] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana, M.-E. Vidal: SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. 2020. URL: https://doi.org/10.1145/3340531.3412881
[4] E. Iglesias, S. Jozashoori, M.-E. Vidal: Scaling up knowledge graph creation to large and heterogeneous data sources
. 2022. URL: doi.org/10.48550/arXiv.2201.09694
[5] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana, M.-E. Vidal: Empowering the SDM-RDFizer Tool for Scaling Up to Complex Knowledge Graph Creation Pipelines (Under Review). 2023. URL: https://www.semantic-web-journal.net/system/files/swj3246.pdf
[6] A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannes, R. Van de Walle: RML: A Generic Language for Integrated RDF Mappings of Hererogeneous Data. 2014. https://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf
-
Efficient Mining of Horn Rules from Knowledge Graphs
Efficiently mining horn rules from knowledge graphs using SPARQL queries.
- Typ der Arbeit: Masterarbeit
- Bearbeitungssprache: Englisch
- Betreuer: Disha Purohit, Maria-Esther Vidal
Rule Mining is the process of discovering interesting patterns or relationships between the variables. Mining rules on top of Knowledge Graphs is discovering patterns between the entities present in it. The goal of this thesis is to provide a formal definition as well as an implementation for an algorithm that is able to mine rules from Knowledge Graphs. This algorithm can than be enhanced to mine rules from multiple knowledge graphs.
Voraussetzungen
- Immatrikulation an einer deutschen Universität
- Gute Englischkenntnisse in Schrift und Wort
- Gute Programmierkenntnisse in Python
Hilfreiche Lehrveranstaltungen
- Datenstrukturen und Algorithmen
- Knowledge Engineering und Semantic Web
- Komplexität von Algorithmen
- Scientific Data Management and Knowledge Graphs
Abgedeckte Themen
- Big Data
- Knowledge Graphs
- Rule Mining
-
Efficient Query Processing by Discovering Synonym Predicates in Knowledge Graphs
Complete query answers by discovering synonymous predicates.
- Typ der Arbeit: Masterarbeit
- Bearbeitungssprache: Englisch
- Betreuer: Emetis Niazmand, Maria-Esther Vidal
Every knowledge graph consists of many duplicated data and metadata which have the same meaning but are defined differently. Thus, synonym predicates can connect the same resource to different entities which leads query engines to retrieve incomplete answers. This thesis aims to enhance query processing by discovering the synonymous predicates to retrieve the complete answers.
Voraussetzungen
- Immatrikulation an einer deutschen Universität
- Gute Englischkenntnisse in Schrift und Wort
- Gute Programmierkenntnisse in Python
Hilfreiche Lehrveranstaltungen
- Knowledge Engineering und Semantic Web
- Datenstrukturen und Algorithmen
- Komplexität von Algorithmen
- Scientific Data Management and Knowledge Graphs
Abgedeckte Themen
- Big Data
- Knowledge Graphs
- Query Processing
-
Efficient Validation of RDF Data using SHACL
Efficiently validating integrity constraints over RDF data using SHACL and SPARQL.
-
Typ der Arbeit: Masterarbeit
-
Bearbeitungssprache: Englisch
-
Betreuer: Philipp D. Rohde, Maria-Esther Vidal
The Resource Description Framework (RDF) is the W3C standard for publishing and exchanging data on the Web. Many data sources suffer from data quality issues. The Shapes Constraint Language (SHACL) is the W3C recommendation language for defining integrity constraints over RDF data. Corman et. al [1] showed that the validation of an RDF data source using an arbitrary SHACL shape schema is NP-hard. The goal of this thesis is to define efficient methods to validate SHACL shape schemas over RDF data sources accessible via SPARQL; a query language for RDF data sources. The implementation part of the thesis will be based on an already existing prototype for simple constraints.
Voraussetzungen
-
Immatrikulation an einer deutschen Universität
-
Gute Englischkenntnisse in Schrift und Wort
-
Gute Programmierkenntnisse in Python
Hilfreiche Lehrveranstaltungen
-
Grundlagen der Datenbanksysteme
-
Datenstrukturen und Algorithmen
-
Knowledge Engineering und Semantic Web
-
Komplexität von Algorithmen
Abgedeckte Themen
-
Big Data
-
Knowledge Graphs
-
Quality Assessment
Literatur
[1] J. Corman, J.L. Reutter, O. Savković: Semantics and Validation of Recursive SHACL. 2018.
[2] J. Corman, F. Florenzano, J.L. Reutter, O. Savković: Validating SHACL Constraints over a SPARQL Endpoint. 2019.
[3] M. Figuera, P.D. Rohde, M.-E. Vidal: Trav-SHACL: Efficiently Validating Networks of SHACL Constraints. 2021.
-
-
Efficiently Validating Property Graphs
Efficiently validating integrity constraints over property graphs.
-
Typ der Arbeit: Masterarbeit
-
Bearbeitungssprache: Englisch
-
Betreuer: Philipp D. Rohde, Maria-Esther Vidal
Property graphs are commonly used to represent knowledge. Recently, a language was proposed to validate the data quality of property graphs called Property Graph Shapes Language (ProGS) [1]. The goal of this thesis is to define an efficient algorithm to validate a property graph given a set of constraints expressed in ProGS.
Voraussetzungen
-
Immatrikulation an einer deutschen Universität
-
Gute Englischkenntnisse in Schrift und Wort
-
Gute Programmierkenntnisse in Python
-
Erfahrung mit Graphdatenbanken (z.B. Neo4j)
Hilfreiche Lehrveranstaltungen
-
Grundlagen der Datenbanksysteme
-
Datenstrukturen und Algorithmen
-
Knowledge Engineering und Semantic Web
-
Komplexität von Algorithmen
Abgedeckte Themen
-
Graph Databases
-
Quality Assessment
Literatur
[1] P. Seifer, R. Lämmel, S. Staab: ProGS: Property Graph Shapes Language. 2021.
-
-
Extending SPARQL with SHACL-validation-based Filters
Extending SPARQL with Filters based on SHACL validation results.
-
Typ der Arbeit: Masterarbeit
-
Bearbeitungssprache: Englisch
-
Betreuer: Philipp D. Rohde, Maria-Esther Vidal
The Resource Description Framework (RDF) [1] is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. The Shapes Constraint Language (SHACL) [2] is the W3C recommendation language for defining integrity constraints over RDF data. In SHACL, constraints are expressed as a network of shapes, called SHACL shape schema. A shape represents integrity constraints over the properties of a class or set of entities. However, in contrast to relational databases, those integrity constraints are not checked during data insertion. The evaluation of a SHACL shape schema reports the entities that do not satisfy the imposed constraints; Trav-SHACL [3] is an engine capable of validating SHACL shape schemas against knowledge graphs accessible via SPARQL endpoints. SPARQL [4] is the W3C recommended language to query RDF data. Recently, Rohde [5] proposed to annotate the query results with the results from the validation for more transparency.
The goal of this thesis is to define new SPARQL filters that are capable of filtering query results given a shape and desired validation result. The implementation part of the thesis will be based on an already existing prototype for annotating the query results with the validation result.Voraussetzungen
-
Immatrikulation an einer deutschen Universität
-
Gute Englischkenntnisse in Schrift und Wort
-
Gute Programmierkenntnisse in Python
Hilfreiche Lehrveranstaltungen
-
Grundlagen der Datenbanksysteme
-
Datenstrukturen und Algorithmen
-
Knowledge Engineering und Semantic Web
-
Komplexität von Algorithmen
Abgedeckte Themen
-
Big Data
-
Knowledge Graphs
-
Query Processing
-
Quality Assessment
Literatur
[1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/
[2] https://www.w3.org/TR/2017/REC-shacl-20170720/
[3] M. Figuera, P.D. Rohde, M.-E. Vidal: Trav-SHACL: Efficiently Validating Networks of SHACL Constraints. 2021.
[4] https://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/
[5] P.D. Rohde: SHACL Constraint Validation during SPARQL Query Processing. 2021.
-
-
Negative Sampling using Integrity Constraints
Exploiting results from integrity constraint validation in negative sampling.
-
Typ der Arbeit: Masterarbeit
-
Bearbeitungssprache: Englisch
-
Betreuer: Philipp D. Rohde, Emetis Niazmand, Maria-Esther Vidal
The Resource Description Framework (RDF) is the W3C standard for publishing and exchanging data on the Web. An RDF graph is also called a Knowledge Graph. Embeddings are a low-dimensional space into which high-dimensional vectors can be translated. The goal of this work is to use integrity constraints over the Knowledge Graph to create the negative samples.
Voraussetzungen
-
Immatrikulation an einer deutschen Universität
-
Gute Englischkenntnisse in Schrift und Wort
-
Gute Programmierkenntnisse in Python
Hilfreiche Lehrveranstaltungen
-
Knowledge Engineering und Semantic Web
-
Datenstrukturen und Algorithmen
-
Machine Learning for Graphs
-
(Statistical) Natural Language Processing
Abgedeckte Themen
-
Knowledge Graphs
-
Embeddings
-
-
On-the-fly Semantification for Querying Heterogeneous Sources with SPARQL
Extending a SPARQL query engine to other data formats using the SDM-RDFizer as a wrapper.
- Typ der Arbeit: Masterarbeit
- Bearbeitungssprache: Englisch
- Betreuer: Philipp D. Rohde, Enrique Iglesias, Maria-Esther Vidal
The Resource Description Framework (RDF) [1] is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. SPARQL [2] is the W3C recommended language to query RDF data. However, data on the Web are still available in many different formats. The SDM-RDFizer [3] is a tool that is able, with the use of mappings specified in the RDF Mapping Language (RML) [4], to semantify various data formats.
The goal of this thesis is to define an efficient approach to use on-the-fly semantification for non-RDF sources to answer SPARQL queries. The implementation part of the thesis will be based on an already existing SPARQL query engine which will be extended to collect data from non-RDF sources using the SDM-RDFizer as a wrapper.Voraussetzungen
- Immatrikulation an einer deutschen Universität
- Gute Englischkenntnisse in Schrift und Wort
- Gute Programmierkenntnisse in Python
Hilfreiche Lehrveranstaltungen
- Grundlagen der Datenbanksysteme
- Datenstrukturen und Algorithmen
- Knowledge Engineering und Semantic Web
- Komplexität von Algorithmen
- Scientific Data Management and Knowledge Graphs
Abgedeckte Themen
- Big Data
- Knowledge Graphs
- Query Processing
- Data Integration
Literatur
[1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/
[2] https://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/
[3] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana, M.-E. Vidal: SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. 2020. URL: https://doi.org/10.1145/3340531.3412881
[4] A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannes, R. Van de Walle: RML: A Generic Language for Integrated RDF Mappings of Hererogeneous Data. 2014. https://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf
-
Predicting the Neoadjuvant Treatment Outcome and Relevant Biomarkers of Breast Cancer Patients
Applying Knowledge Graph Embedding techniques to Breast Cancer data that is obtained from MHH to predict the outcome of the neoadjuvant treatment and finding out relevant biomarkers.
- Typ der Arbeit: Masterarbeit
- Bearbeitungssprache: Englisch
- Betreuer: Can Aykul, Maria-Esther Vidal
AI has revolutionized many industries including healthcare thanks to plethora of data that has been compiled in the recent years together with a fruitful symbiosis with high computational power. However, healthcare industry is a highly sensitive industry that cannot tolerate black-box models that make accurate but blind predictions. Therefore, there remains a high potential for investigating the application of state-of-the-art machine learning models on top of the healthcare data. We are in a close collaboration with Medical School Hannover (MHH) and hence are using a Breast Cancer data that is obtained through real-world clinical studies by MHH. Data is composed of three main categories: clinical, gene and socio-economic records of breast cancer patients.
Voraussetzungen
- Immatrikulation an einer deutschen Universität
- Gute Englischkenntnisse in Schrift und Wort
- Gute Programmierkenntnisse in Python
Hilfreiche Lehrveranstaltungen
- Knowledge Engineering und Semantic Web
- Maschinelles Lernen (Machine Learning)
- Labor: Artificial Intelligence
Literatur
[1] Bordes, Antoine, et al. "Translating embeddings for modeling multi-relational data." Advances in neural information processing systems 26 (2013).
[2] Trouillon, Théo, et al. "Complex embeddings for simple link prediction." International conference on machine learning. PMLR, 2016.
[3] Sun, Zhiqing, et al. "Rotate: Knowledge graph embedding by relational rotation in complex space." arXiv preprint arXiv:1902.10197 (2019).
[4] Zhang, Shuai, et al. "Quaternion knowledge graph embeddings." Advances in neural information processing systems32 (2019).
-
Query Processing Guided by Mined Rules
Make use of mined horn rules in query processing, i.e., query planning and execution.
- Typ der Arbeit: Masterarbeit
- Bearbeitungssprache: Englisch
- Betreuer: Disha Purohit, Philipp D. Rohde, Maria-Esther Vidal
The Resource Description Framework (RDF) [1] is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. SPARQL [2] is the W3C recommended language to query RDF data. In this context, rule mining is the process of discovering patterns between entities in a knowledge graph. The mined horn rules can be used during query processing.
The goal of this thesis is to define an algorithm that considers mined rules (and their metrics) during query decomposition, query optimization, and query execution.Voraussetzungen
- Immatrikulation an einer deutschen Universität
- Gute Englischkenntnisse in Schrift und Wort
- Gute Programmierkenntnisse in Python
Hilfreiche Lehrveranstaltungen
- Grundlagen der Datenbanksysteme
- Datenstrukturen und Algorithmen
- Knowledge Engineering und Semantic Web
- Komplexität von Algorithmen
- Scientific Data Management and Knowledge Graphs
Abgedeckte Themen
- Big Data
- Knowledge Graphs
- Query Processing
- Rule Mining
Literatur
-
Query Processing for SPARQL-star
Optimize a SPARQL query engine for the features of SPARQL-star.
- Typ der Arbeit: Masterarbeit
- Bearbeitungssprache: Englisch
- Betreuer: Philipp D. Rohde, Maria-Esther Vidal
The Resource Description Framework (RDF) [1] is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. SPARQL [2] is the W3C recommended language to query RDF data. Recently, RDF-star and SPARQL-star [3] were proposed in order to make statements about statements.
The goal of this thesis is to redefine the SPARQL operators to work for SPARQL-star. The implementation part of the thesis will be based on an already existing SPARQL query engine which will be extended to comply with SPARQL-star.Voraussetzungen
- Immatrikulation an einer deutschen Universität
- Gute Englischkenntnisse in Schrift und Wort
- Gute Programmierkenntnisse in Python
Hilfreiche Lehrveranstaltungen
- Grundlagen der Datenbanksysteme
- Datenstrukturen und Algorithmen
- Knowledge Engineering und Semantic Web
- Komplexität von Algorithmen
- Scientific Data Management and Knowledge Graphs
Abgedeckte Themen
- Big Data
- Knowledge Graphs
- Query Processing
Literatur
[1] https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/
[2] https://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/
[3] https://w3c.github.io/rdf-star/cg-spec/editors_draft.html
-
Translating SPARQL Queries to Native Query Languages of Various DB Models Supporting Virtual Knowledge Graph Creation
Translating SPARQL Queries to Native Query Languages of Various DB Models Supporting Virtual Knowledge Graph Creation
-
Typ der Arbeit: Masterarbeit
-
Bearbeitungssprache: Englisch
-
Betreuer: Philipp D. Rohde, Maria-Esther Vidal
The Resource Description Framework (RDF) is the W3C standard for publishing and exchanging data on the Web. RDF data sources are also referred to as knowledge graphs. The recommended language to query RDF data is SPARQL. Even though the number of publicly available knowledge graphs is increasing, many data sources are still available in classical formats like relational databases. In some cases it is not possible to transform the data models into one common format and integrate them all in one place. This thesis aims at virtual data integration by transforming the queries during query processing.
The goal of this thesis is to support virtual knowledge graph creation by transforming SPARQL queries into query languages that are natively supported by various database models. The new approach will be integrated into an existing query engine. The work also includes analyzing the state-of-the-art translators as well as comparing their performance with the proposed approach.Voraussetzungen
-
Immatrikulation an einer deutschen Universität
-
Gute Englischkenntnisse in Schrift und Wort
-
Gute Programmierkenntnisse in Python
Hilfreiche Lehrveranstaltungen
-
Grundlagen der Datenbanksysteme
-
Datenstrukturen und Algorithmen
-
Knowledge Engineering und Semantic Web
-
Komplexität von Algorithmen
Abgedeckte Themen
-
Big Data
-
Knowledge Graphs
-
Query Processing
-
Data Integration
-