Large-Scale Human Mobility Simulations*


The evaluation of privacy-preserving techniques for LBS is often based on simulations of mostly random user movements that only partially capture real deployment scenarios. Our results show that, compared to the context-aware simulator, the random user movement simulator leads to significantly different results for a spatial-cloaking algorithm, under-protecting in some cases, and over-protecting in others [1]. Indeed using a context simulator it is possible to design models for agents, places and context; for example, it is possible to define particular places of aggregation and make users dynamically choose which place to reach and how long to stay in that place. This behavior, which is not modeled by mostly random user movement simulators, can significantly affect the results of the spatial-cloaking algorithm.

In our research we created several datasets of simulated user movements using a personalized version of the SIAFU agent-based context-aware simulator [2]. Each simulation is specifically designed for a particular family of LBSs. In this page we briefly describe some of these datasets and we make them publicly available. For a more detailed description of the context-simulator we used to generate the datasets, please see [1].

The map and the other parameters common to all the simulations

We executed our simulations in the road  network of Milan. Very detailed digital vector maps of the city have been generously provided by the municipality of Milan (Ufficio Sistema Informativo Territoriale del Comune di Milano).

The simulation includes a total of 30,000 home buildings, 10,000 office buildings and 1,000 entertainment places; the first two values are strictly related to the considered number of inhabitants of Milan, while the third is based on real data from public sources which also provide the geographical distribution of the places. Note that the distribution of home and buildings are Gaussian distributions in which home buildings are more concentrated in the outskirt of the city while office buildings are more concentrated in the center of the city.

Following the study reported in [3], we fixed the average speed for users moving by car to 20km/h. We also fixed the average speed of users moving on foot to 3,6km/h.




Road network Milan road network Picture
Area of the simulated environment  17x17 Km  
Map resolution  1000x1000  
Geographic coordinates of the map  17m  
Number of home buildings 

North-East corner: 45.547086 9.283593 

South-East corner: 45.382612 9.283593 

South-West corner: 45.382612 9.054025 

Google Maps

Google Maps

Google Maps
Number of home buildings  30,000   
Distribution of home buildings  Gaussian, more concentration in the outskirt  Picture
Number of office buildings  10,000   
Distribution of office buildings  Gaussian, more concentration in the center  Picture
Number of entertainment places  1,000   
Distribution of entertainment places  Derived from public sources  Picture
Average speed by car  20 Km/h   
Average speed on foot  3,6 Km/h   

The MilanoByNight simulation

In this simulation, we considered a typical deployment scenario for a friend-finder service: a large number of young people using the service on a weekend night in large city like Milan. We performed a deep study, using different sources, including on-line surveys, of the parameters characterizing this scenario.

All probabilities related to agents' choices are modeled with a probability distributions. For this specific data generation, some of the important parameters of the simulation are:

  • Source and destination. These are the locations essential to define movements. They may be homes or entertainment places. Some places in some districts are more popular than others.

  • StartingTime. The time at which a user leaves her home to go to the first entertainment place.

  • Permanence. How long will a user stay at one entertainment place?

  • NumPlaces. How many entertainment places will a user visit on one night?

In order to have a realistic model of these distributions, we prepared a survey to collect real users data. We are still collecting data, but the parameters used in the simulation are based on interviews of more than 300 people in our target category.

Available datasets:

  • DataSet 1

    • Number of agents: 100.000

    • Simulation duration: 6 hours (from 7pm to 1am)

    • Interval between two consecutive instants: 2 minutes


[1] Sergio Mascetti, Dario Freni, Claudio Bettini, Sean Wang, and Sushil Jajodia. On the Impact of User Movement Simulations in the Evaluation of LBS Privacy-Preserving Techniques. In Proc. of the 1st International Workshop on Privacy in Location-Based Applications (PiLBA). 2008.

[2] M. Martin., P. Nurmi. A generic large scale simulator for ubiquitous computing. In Proc. of the 3rd Annual International Conference on Mobile and Ubiquitous Systems, Networking & Services. IEEE Computer Society. 2006.

[3] Traffic characteristics for the estimation of pollutant emissions from road transport. Technical report. Institut national de recherche sur le transport et leur sécurité, 2006.


* This project was partially supported by National Science Foundation (NSF) under grant N. CNS-0716567, and by Italian MIUR under grant InterLink II04C0EC1D.