Abstract
Encryption on the Internet is as pervasive as ever. This has protected communications and enhanced the privacy of users. Unfortunately, at the same time malware is also increasingly using encryption to hide its operation. The detection of such encrypted malware is crucial, but the traditional detection solutions assume access to payload data. To overcome this limitation, such solutions employ traffic decryption strategies that have severe drawbacks. This paper studies the usage of encryption for malicious and benign purposes using large datasets and proposes a machine learning based solution to detect malware using connection and TLS metadata without any decryption. The classification is shown to be highly accurate with high precision and recall rates by using a small number of features. Furthermore, we consider the deployment aspects of the solution and discuss different strategies to reduce the false positive rate.