Probe requests are management frames emitted by devices that perform active scanning to connect to Access Points nearby. These messages can be captured and analysed to implement device counting algorithms. However, using random MAC addresses to protect users' privacy challenges these algorithms, which must then perform address de-randomization (i.e., cluster the frames with the same source device by analysing valuable features). Datasets of labelled probe requests are needed to develop efficient de-randomization algorithms. Our dataset contains 20 min duration captures collected both in isolated and in "noisy" environments. Twenty-two different devices produced data in six different modes, including settings based on display status, Wi-Fi connection, and power saving. For each mode, we considered three channels contemporaneously for a total of 315 non-empty files. A Raspberry Pi captured the messages through a sniffing algorithm specifically designed to generate this dataset. We then filtered the data by deleting the messages from known sources and using power thresholds that exploit the burst structure of the probe requests. To the best of our knowledge, there are no other available datasets with labelled probe requests. This kind of dataset allows a more accurate analysis of the behaviour of individual devices in different modes and the training and test of algorithms for counting the number of devices through probe requests in the presence of random MAC addresses.
A dataset of labelled device Wi-Fi probe requests for MAC address de-randomization
Pintor, L;Atzori, L
2022-01-01
Abstract
Probe requests are management frames emitted by devices that perform active scanning to connect to Access Points nearby. These messages can be captured and analysed to implement device counting algorithms. However, using random MAC addresses to protect users' privacy challenges these algorithms, which must then perform address de-randomization (i.e., cluster the frames with the same source device by analysing valuable features). Datasets of labelled probe requests are needed to develop efficient de-randomization algorithms. Our dataset contains 20 min duration captures collected both in isolated and in "noisy" environments. Twenty-two different devices produced data in six different modes, including settings based on display status, Wi-Fi connection, and power saving. For each mode, we considered three channels contemporaneously for a total of 315 non-empty files. A Raspberry Pi captured the messages through a sniffing algorithm specifically designed to generate this dataset. We then filtered the data by deleting the messages from known sources and using power thresholds that exploit the burst structure of the probe requests. To the best of our knowledge, there are no other available datasets with labelled probe requests. This kind of dataset allows a more accurate analysis of the behaviour of individual devices in different modes and the training and test of algorithms for counting the number of devices through probe requests in the presence of random MAC addresses.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.