Enhancing Data Harmonization with LLMs

The goal of the thesis is to explore the use of Large Language Models (LLMs) to enhance data harmonization related to Knowledge Graphs (KGs).

Contact persons

An Ngoc Lam

Research Scientist
Brian Elvesæter

Senior Research Scientist

Master thesis project

A Knowledge Graph (KG) is a structured representation of information that captures relationships between different entities in a way that is both human-readable and machine-interpretable. RDF (Resource Description Framework) is a standard model for data interchange on the web. Creation of RDF Knowledge Graphs can be quite a tedious task due to integrating heterogeneous data sources, data cleaning, ontology development, data alignment, data mapping and validation.

The goal of the thesis is to explore the use of Large Language Models (LLMs) to enhance data harmonization related to Knowledge Graphs (KGs).

Research topic focus

Investigate how to enhance traditional Knowledge Graph creation and data alignment techniques using Large Language Models (LLMs).
Analyse current tool approaches and identify relevant use cases within projects at SINTEF Digital.
Set up tools to support data mapping, alignment and transformation, which enables selected use cases to exchange data according to the standard ontologies and data models identified.
Enhance tools incorporating LLMs to support data harmonization tasks.

Expected results and learning outcome

After the thesis is successfully submitted, the student should have a better understanding and practical experience working with Large Language Models (LLMs) and Knowledge Graphs (KGs), and how LLMs can be used in data harmonization tasks.

Qualifications

Candidates should have good understanding of data engineering, vocabularies and ontologies, and semantic technologies such as RDF Knowledge Graphs.

References

Hofer, Marvin, Johannes Frey, and Erhard Rahm. "Towards self-configuring knowledge graph construction pipelines using LLMs-a case study with RML." Fifth International Workshop on Knowledge Graph Construction@ ESWC2024. Vol. 3718. 2024. https://kg-construct.github.io/workshop/2024/resources/paper8.pdf
Frey, Johannes, et al. "Benchmarking the abilities of large language models for rdf knowledge graph creation and comprehension: How well do llms speak turtle?." arXiv preprint arXiv:2309.17122 (2023).

Contact us

Our services

Career

Sustainability

Management and board

Institutes

Other units

About us

Follow us