Specs2Transform: from the specification of semantic annotations of tabular data to ETL pipelines for big data
This topic concerns the design and development of solutions to transform specifications of data enrichment pipelines into executable pipelines.
When annotations are specified using a web application, the specifications must be converted in data transformation workflows applied to large data sets. These pipelines must be executed in Big Data management frameworks, support specific features of the semantic data extension task, and therefore address related challenges such as the following:
- Preserve corrections in the data sample
- Support revision of links estimated by the algorithm (data management + ready for user interface)
- Support revision of subsequent data extensions operations for revised links (data management + ready for user interface)
- Support configuration of hyperparameters of the data enrichment algorithms (data management + ready for user interface)
- Support architecture for scalable (for Big Data) data extension framework based on linking
During the internship, the student will focus on one or more of the above mentioned problems, based on project priorities and the student’s skills/inclinations.