Enhancing Data Harmonization with LLMs
Contact persons

Master thesis project
A Knowledge Graph (KG) is a structured representation of information that captures relationships between different entities in a way that is both human-readable and machine-interpretable. RDF (Resource Description Framework) is a standard model for data interchange on the web. Creation of RDF Knowledge Graphs can be quite a tedious task due to integrating heterogeneous data sources, data cleaning, ontology development, data alignment, data mapping and validation.
The goal of the thesis is to explore the use of Large Language Models (LLMs) to enhance data harmonization related to Knowledge Graphs (KGs).
Research topic focus
- Investigate how to enhance traditional Knowledge Graph creation and data alignment techniques using Large Language Models (LLMs).
- Analyse current tool approaches and identify relevant use cases within projects at SINTEF Digital.
- Set up tools to support data mapping, alignment and transformation, which enables selected use cases to exchange data according to the standard ontologies and data models identified.
- Enhance tools incorporating LLMs to support data harmonization tasks.
Expected results and learning outcome
After the thesis is successfully submitted, the student should have a better understanding and practical experience working with Large Language Models (LLMs) and Knowledge Graphs (KGs), and how LLMs can be used in data harmonization tasks.
Qualifications
Candidates should have good understanding of data engineering, vocabularies and ontologies, and semantic technologies such as RDF Knowledge Graphs.
References
- Hofer, Marvin, Johannes Frey, and Erhard Rahm. "Towards self-configuring knowledge graph construction pipelines using LLMs-a case study with RML." Fifth International Workshop on Knowledge Graph Construction@ ESWC2024. Vol. 3718. 2024. https://kg-construct.github.io/workshop/2024/resources/paper8.pdf
- Frey, Johannes, et al. "Benchmarking the abilities of large language models for rdf knowledge graph creation and comprehension: How well do llms speak turtle?." arXiv preprint arXiv:2309.17122 (2023).