The paper "Towards Active Learning Interfaces for Multi-Inhabitant Activity Recognition" by Claudio Bettini and Gabriele Civitarese has been accepted at the 16th Workshop on Context and Activity Modeling and Recognition (CoMoReA), affiliated with the 18th IEEE PerCom conference which will take place in Austin, Texas from 23th to 27th of March.

Invited Lecture: Large Scale Data Storage and Processing on Google's Distributed Systems



17/04/2019 09:00, Aula Magna, Dipartimento di Informatica, Via Celoria 18, Milano, Italy

(This is an open lecture, but also part of the master course in "Distributed and Pervasive Systems" given by Prof. Claudio Bettini)


LecturerDario Freni, Google London




Organizing the world's information and making it universally accessible and useful requires technologies that are able to handle petabytes of data quickly and reliably. This talk focuses on three crucial aspects of Google's infrastructure: storage, processing and reliability. We will present popular technologies within Google, giving an overview of their principles and main use cases. We will cover distributed storage solutions including GFS [1] (distributed file system), Bigtable [2] (distributed multi-dimensional sorted map), Spanner [3] and F1 [4] (globally distributed databases). Processing solutions that will be covered include MapReduce [5], Flume [6] (distributed processing of batch data), and MillWheel [7] (distributed processing of streaming data). These technologies are the building blocks of the publicly available platform named Cloud Dataflow [8], which will also be covered during this talk.


All papers are available from


[1] Sanjay Ghemawat et al.: The Google file system. SOSP 2003: 29-43

[2] Fay Chang et al.: Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst. 26(2) (2008)

[3] James C. Corbett et al.: Spanner: Google's Globally Distributed Database. ACM Trans. Comput. Syst. 31(3): 8 (2013)

[4] Jeff Shute et al.: F1: A Distributed SQL Database That Scales. PVLDB 6(11): 1068-1079 (2013)

[5] Jeffrey Dean, Sanjay Ghemawat: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1): 107-113 (2008)

[6] Craig Chambers et al.: FlumeJava: easy, efficient data-parallel pipelines. PLDI 2010: 363-375

[7] Tyler Akidau et al.: MillWheel: Fault-Tolerant Stream Processing at Internet Scale. PVLDB 6(11): 1033-1044 (2013)

[8] Tyler Akidau et al.: The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing. PVLDB, 8(2): 1792–1803 (2015)




Dario Freni works at Google on the Android Framework, focusing on making the Android OS easier to update. Previously, he  was the lead of one of the teams that work on Play Console focusing on providing clear analytics metrics and app health tools to app developers. Prior to that was a tech lead of one of the Ads Site Reliability Engineering teams specializing on fast and reliable large-scale data processing pipelines.


Prior to joining Google in 2011, Dario completed his Ph.D. in computer science at Università degli Studi di Milano (Italy).

Page 3 of 5