To main content

Enhancing Data Harmonization with LLMs

The goal of the thesis is to explore the use of Large Language Models (LLMs) to enhance data harmonization related to Knowledge Graphs (KGs).

Contact persons

Ill.: Generated with AI DALL-E

Master thesis project

A Knowledge Graph (KG) is a structured representation of information that captures relationships between different entities in a way that is both human-readable and machine-interpretable. RDF (Resource Description Framework) is a standard model for data interchange on the web. Creation of RDF Knowledge Graphs can be quite a tedious task due to integrating heterogeneous data sources, data cleaning, ontology development, data alignment, data mapping and validation.

The goal of the thesis is to explore the use of Large Language Models (LLMs) to enhance data harmonization related to Knowledge Graphs (KGs).

Research topic focus

  • Investigate how to enhance traditional Knowledge Graph creation and data alignment techniques using Large Language Models (LLMs).
  • Analyse current tool approaches and identify relevant use cases within projects at SINTEF Digital.
  • Set up tools to support data mapping, alignment and transformation, which enables selected use cases to exchange data according to the standard ontologies and data models identified.
  • Enhance tools incorporating LLMs to support data harmonization tasks.

Expected results and learning outcome

After the thesis is successfully submitted, the student should have a better understanding and practical experience working with Large Language Models (LLMs) and Knowledge Graphs (KGs), and how LLMs can be used in data harmonization tasks.

Qualifications

Candidates should have good understanding of data engineering, vocabularies and ontologies, and semantic technologies such as RDF Knowledge Graphs.

References