Large-Scale Human Mobility Simulations*
Abstract
The evaluation of privacy-preserving techniques for LBS is often based on simulations of mostly random user movements that only partially capture real deployment scenarios. Our results show that, compared to the context-aware simulator, the random user movement simulator leads to significantly different results for a spatial-cloaking algorithm, under-protecting in some cases, and over-protecting in others [1]. Indeed using a context simulator it is possible to design models for agents, places and context; for example, it is possible to define particular places of aggregation and make users dynamically choose which place to reach and how long to stay in that place. This behavior, which is not modeled by mostly random user movement simulators, can significantly affect the results of the spatial-cloaking algorithm.
In our research we created several datasets of simulated user movements using a personalized version of the SIAFU agent-based context-aware simulator [2]. Each simulation is specifically designed for a particular family of LBSs. In this page we briefly describe some of these datasets and we make them publicly available. For a more detailed description of the context-simulator we used to generate the datasets, please see [1].
The map and the other parameters common to all the simulations
We executed our simulations in the road network of Milan. Very detailed digital vector maps of the city have been generously provided by the municipality of Milan (Ufficio Sistema Informativo Territoriale del Comune di Milano).
The simulation includes a total of 30,000 home buildings, 10,000 office buildings and 1,000 entertainment places; the first two values are strictly related to the considered number of inhabitants of Milan, while the third is based on real data from public sources which also provide the geographical distribution of the places. Note that the distribution of home and buildings are Gaussian distributions in which home buildings are more concentrated in the outskirt of the city while office buildings are more concentrated in the center of the city.
Following the study reported in [3], we fixed the average speed for users moving by car to 20km/h. We also fixed the average speed of users moving on foot to 3,6km/h.
Parameter |
Value |
View |
Road network | Milan road network | Picture |
Area of the simulated environment | 17x17 Km | |
Map resolution | 1000x1000 | |
Geographic coordinates of the map | 17m | |
Number of home buildings |
North-East corner: 45.547086 9.283593 South-East corner: 45.382612 9.283593 South-West corner: 45.382612 9.054025 |
Google Maps |
Number of home buildings | 30,000 | |
Distribution of home buildings | Gaussian, more concentration in the outskirt | Picture |
Number of office buildings | 10,000 | |
Distribution of office buildings | Gaussian, more concentration in the center | Picture |
Number of entertainment places | 1,000 | |
Distribution of entertainment places | Derived from public sources | Picture |
Average speed by car | 20 Km/h | |
Average speed on foot | 3,6 Km/h |
The MilanoByNight simulation
In this simulation, we considered a typical deployment scenario for a friend-finder service: a large number of young people using the service on a weekend night in large city like Milan. We performed a deep study, using different sources, including on-line surveys, of the parameters characterizing this scenario.
All probabilities related to agents' choices are modeled with a probability distributions. For this specific data generation, some of the important parameters of the simulation are:
-
Source and destination. These are the locations essential to define movements. They may be homes or entertainment places. Some places in some districts are more popular than others.
-
StartingTime. The time at which a user leaves her home to go to the first entertainment place.
-
Permanence. How long will a user stay at one entertainment place?
-
NumPlaces. How many entertainment places will a user visit on one night?
In order to have a realistic model of these distributions, we prepared a survey to collect real users data. We are still collecting data, but the parameters used in the simulation are based on interviews of more than 300 people in our target category.
Available datasets:
-
DataSet 1
-
Number of agents: 100.000
-
Simulation duration: 6 hours (from 7pm to 1am)
-
Interval between two consecutive instants: 2 minutes
-
Bibliography
[1] Sergio Mascetti, Dario Freni, Claudio Bettini, Sean Wang, and Sushil Jajodia. On the Impact of User Movement Simulations in the Evaluation of LBS Privacy-Preserving Techniques. In Proc. of the 1st International Workshop on Privacy in Location-Based Applications (PiLBA). 2008.
[2] M. Martin., P. Nurmi. A generic large scale simulator for ubiquitous computing. In Proc. of the 3rd Annual International Conference on Mobile and Ubiquitous Systems, Networking & Services. IEEE Computer Society. 2006.
[3] Traffic characteristics for the estimation of pollutant emissions from road transport. Technical report. Institut national de recherche sur le transport et leur sécurité, 2006.
* This project was partially supported by National Science Foundation (NSF) under grant N. CNS-0716567, and by Italian MIUR under grant InterLink II04C0EC1D.