Implementing Dataset Distillation for Green AI: Case Studies in Health, Energy, Space, and Manufacturing
This thesis will delve into implementing dataset distillation specifically for time series data sourced from diverse sectors, including health, energy, space, and manufacturing, aiming to pave the way for more sustainable AI applications.
Contact persons
Master Project Description
The increasing environmental impact of artificial intelligence, particularly in terms of computational costs and energy consumption, has sparked an interest in "Green AI". One promising direction in this domain is the concept of dataset distillation - the process of creating smaller yet representative datasets that can efficiently train models with minimal compromise in performance.
Research Topic Focus
- Understanding the principles and methodologies behind dataset distillation.
- Investigating the specific challenges and opportunities presented by time series data in relation to dataset distillation.
- Implementing and experimenting with dataset distillation techniques on case studies from health, energy, space, and manufacturing sectors.
- Evaluating the performance of models trained on distilled datasets compared to those trained on original datasets in terms of accuracy, efficiency, and energy consumption.
- Topic is linked to Horizon Europe project ENFIELD on Green and Trustworthy AI
Expected Results
- A comprehensive understanding of dataset distillation's role in promoting Green AI.
- Effective distilled datasets for the chosen case studies, with demonstrable reduction in training costs and comparable model performance.
- Insights into the challenges and benefits of applying dataset distillation to time series data across various sectors.
Learning Outcomes
- Master the principles and techniques behind dataset distillation.
- Acquire hands-on experience in handling and distilling time series data.
- Develop a holistic understanding of the importance of Green AI and its potential benefits and challenges in real-world applications.
- Enhance problem-solving skills by tackling the complexities of diverse datasets in various sectors.
Qualifications
- Solid grounding in AI, machine learning principles, and time series analysis.
- Familiarity with relevant programming languages and tools (preferably Python).
- An analytical mindset with an interest in sustainability and Green AI.
- Previous exposure to datasets from any of the sectors of interest (health, energy, space, or manufacturing) would be advantageous.
References
- Wang, Tongzhou, et al. "Dataset distillation." arXiv preprint arXiv:1811.10959 (2018).
- Cazenavette, George, et al. "Dataset distillation by matching training trajectories." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
- Lei, Shiye, and Dacheng Tao. "A comprehensive survey to dataset distillation." arXiv preprint arXiv:2301.05603 (2023).
Contact persons/supervisors
Sagar Sen ( Arda Goknil (), Erik Johannes Husom (