Abstract
Predicting the crystal symmetry of a compound simply from chemical composition has remained challenging. Several machine-learning approaches can be employed, but the predictive value of popular crystallographic databases
is relatively modest due to data paucity and uneven distribution across the 230 space groups. In this work, we compiled virtually all crystallographic information available to science and used it to train and test multiple machine-learning models. Composition-driven random-forest classification relying on a large set of descriptors exhibited the best performance. Models predicting with high accuracy the crystal system, Bravais lattice, point group and space group of inorganic compounds are granted to the public domain.
is relatively modest due to data paucity and uneven distribution across the 230 space groups. In this work, we compiled virtually all crystallographic information available to science and used it to train and test multiple machine-learning models. Composition-driven random-forest classification relying on a large set of descriptors exhibited the best performance. Models predicting with high accuracy the crystal system, Bravais lattice, point group and space group of inorganic compounds are granted to the public domain.