# DataMod Programme and Pre-proceedings

Download the DataMod 2017 pre-proceedings as a comprehensive ZIP archive. Individual papers can be downloaded through the links below.

### Monday, 4 September 2017 [Preliminary Programme]

**10.15 – 10.30 Opening**

**10.30 – 11.30 – Invited Talk – Bruno Lepri – ** “Understanding and rewiring cities using big data” (abstract)

**11.30 – 12.00 – **Peter Carmichael and Charles Morisset: Learning Decision Trees from Synthetic Data Models for Human Security Behaviour (pre-proceedings paper)

**12.00 – 12.30 – **Michele D’Andreagiovanni, Fabrizio Baiardi, Jacopo Lipilini, Salvatore Ruggieri and Federico Tonelli: Sequential Pattern Mining for ICT Risk Assessment and Prevention (pre-proceedings paper)

**12.30 – 14.00 – Lunch**

**14.00 – 15.00 – Invited Talk – Siobhán Clarke – **“Exploring change planning in open, complex systems” (abstract)

**15.00 – 15.30 –** Sélinde van Engelenburg, Marijn Janssen and Bram Klievink: What belongs to context? A definition, a criterion and a method for deciding on what context-aware systems should sense and adapt to (pre-proceedings paper)

**15.30 – 16.00 –** Paul Griffioen, Rob Christiaanse and Joris Hulstijn: Controlling Production Variances in Complex Business Processes (pre-proceedings paper)

**16.00 – 16.30 – Coffee Break**

**16.30 – 17.00 – **Giovanna Broccia, Paolo Milazzo and Peter Csaba Ölveczky: An Algorithm for Simulating Human Selective Attention (pre-proceedings paper)

**17.00 – 17.30 – **Oana Andrei and Muffy Calder: Temporal Analytics for Software Usage Models (pre-proceedings paper)

### Tuesday, 5 September 2017 [Preliminary Programme]

**09.00 – 10.00 – Tutorial – Paolo Milazzo – **“On DataMod approaches to systems analysis” (abstract)

**10.00 – 10.30 – Coffee Break**

**10.30 – 11.00 – **Michael Backenköhler and Verena Wolf: Student performance prediction and optimal course selection: An MDP approach (pre-proceedings paper)

**11.00 – 11.30 – **Lucia Nasti and Paolo Milazzo: A computational model of Internet addiction phenomena in social networks (pre-proceedings paper)

**11.30 – 12.30 – Invited Talk – Simone Tini – **“Applications of weak behavioral metrics in probabilistic systems” (abstract)

**12.30 – 14.00 – Lunch**

**14.00 – 14.30 – Panel Discussion**

**14.30 – 15.00 – **Ilya Zakirzyanov, Anatoly Shalyto and Vladimir Ulyantsev: Finding all minimum-size DFA consistent with given examples: SAT-based approach (pre-proceedings paper)

**15.00 – 15.30 – **Alberto Castellini and Giuditta Franco: Multivariate time-series segmentation for immunological data analysis (presentation report)

**15.30 – 16.00 – **Vashti Galpin: Modelling security in Software Defined Networks (abstract)

**16.00 – 16.15 – Closure**

### Abstracts of Invited Talks and Tutorial

**Invited Talk – Bruno Lepri – “Understanding and rewiring cities using big data”**

**Abstract:** In the last decades, cities have been largely acknowledged as complex and emergent systems as opposed to top-down planned entities. Thus, a new city science is emerging that aims at an empirical analysis of urbanization processes. However, it is evident the lack of understanding of the dynamics that regulate people interactions, their relationship with urban characteristics, and their influence on socio-economic outcomes of cities. Nowadays, massive streams of human behavioural data and urban data combined with increased analytical capabilities are creating unprecedented possibilities for understanding global patterns of human behaviour and for helping researchers to better understand relevant problems for cities and also whole societies. For example, analysing the digital traces people leave every day (e.g., mobile phones and social media data, credit card transactions, etc.) researchers were able, among the other things, to estimate the socio-economic status of territories, to monitor the vitality of urban areas and to predict neighbourhood’s crime levels. In my keynote talk, I describe some recent works where we have leveraged data from public (e.g., national census, household surveys, cadastral data) and from commercial entities (e.g., Foursquare, mobile phone data, credit card transactions, Google Street View images, etc.) in order (i) to infer how vital and liveable a city is, (ii) to find the urban conditions (e.g., mixed land use, mobility, safety perception, etc.) that magnify and influence urban life, (iii) to study their relationship with societal outcomes such as poverty, criminality, innovation, segregation, and (iv) to envision data-driven guidelines for helping policy makers to respond to the demands of citizens. Our results open the door for a new research framework to study and to understand cities, and societies, by means of computational tools (i.e. machine learning approaches) and novel sources of data able to describe human life with an unprecedented breath, scale and depth. Finally, I also describe and discuss several potential policy applications of our results.

**Invited Talk – Siobhán Clarke – “Exploring change planning in open, complex systems”**

**Abstract:** Modern, complex systems are likely to execute in open environments (e.g., applications running over the Internet of Things), where changes are frequent and have the potential to cause significant negative consequences for the application. A better understanding of the data in the environment will enable applications to better plan for change and remain resilient in the face of loss of data sources through, for example, mobility or battery loss. This talk explores our recent work on models for change planning in in such open, complex systems. The approaches include static, multi-layer system and change modelling, through to multi-agent systems that learn and adapt to changes in the environment, and finally collaborative models for emergent behaviour detection, and for resource sharing. I discuss the work in the context of smart cities applications, such as transport, energy and emergency response.

**Invited Talk – Simone Tini – “Applications of weak behavioral metrics in probabilistic systems”**

**Abstract: **Large-scale distributed and concurrent systems require formal specification and verification methods which capture both their qualitative and quantitative properties. Quantitative (or random) phenomena occur whenever the behaviour of a system is not deterministic and the uncertainty can be quantified. They arise in nearly every system either by construction or from the physical properties of the system and its environment. Probability is one of the most important measures of uncertainty and has become indispensable in several areas, such as networks, data mining, security, artificial intelligence, embedded systems, bioinformatics and many more.

Probabilistic process algebras, such as probabilistic CCS and CSP, are languages that are employed to describe probabilistic concurrent communicating systems, or probabilistic processes for short. Typically, they are obtained as extensions of classical process algebras by adding suitable operators allowing us to express probability distributions over sets of possible events or behaviours.

The operational approach has been shown to be very useful for giving semantics of concurrent systems. The most general operational model employed to describe the behavior of probabilistic processes is that of nondeterministic probabilistic labeled transition systems, or PTSs for short, which were originally introduced as probabilistic automata by Segala. PTSs allow us to model the reactive system behaviour, nondeterministic choices and probabilistic choices.

The shortcoming of operational semantics is that it is too concrete, because a PTS may contain many states that intuitively should be approximately identified. Behavioural metric semantics provide formal notions to compare probabilistic systems. They give us a notion of behavioural distance that characterizes how far the behaviour of two systems is apart. Behavioural distances can be viewed as a refinement of the notions of behavioural equivalence, which only characterizes if two systems behave in the same way or not (according to what can be distinguished by an external observer). Bisimulation metrics proposed by Desharnais et al. are the quantitative analogue to probabilistic bisimulation equivalences and assign to each pair of processes a distance in the interval [0, 1] which measures the proximity of their quantitative properties. The distances form a pseudometric with bisimilar processes at distance 0. In particular, weak bisimulation metrics base on the weak bisimulation game: Two state s and s′ in a PTS are at distance ε ∈ [0, 1] iff a transition from process s to distribution π can be mimicked by a weak transition from s′ to distribution π′ such that the distance between the distributions π and π′ is at most ε, and conversely.

In this talk we consider the notion of weak simulation quasimetric as the asimmetric variant of weak bisimulation metric which maintains most of the properties of the original definition. However, our asymmetric version is particularly suitable to reason on protocols where the systems under consideration are not approximately equivalent. As a main application, we adopt our simulation theory in a simple probabilistic timed process calculus to derive an algebraic theory to evaluate the performances of gossip protocols.

In order to specify and verify systems in a compositional manner, it is necessary that the behavioral semantics is compatible with all operators of the language that describe these systems. In the probabilistic setting, the intuitive idea is that two systems that are close according to the considered notion of distance should be approximately inter-substitutable: Whenever a system s in a language context C[s] is replaced by a close system s′, the obtained context C[s′] should be close to C[s]. In other words, there should be some relation between the behavioral distance between s and s′ and the behavioral distance between C[s] and C[s′] so that any limited change in the behavior of a subcomponent s implies a smooth and limited change in the behavior of the composed system C[s]. We consider uniform continuity proposed by Gebler et al. as a property guaranteeing the compatibility of the metric semantics with language operators.

We investigate the compositionality of both weak bisimilarity metric and weak similarity quasimetric semantics with respect to a variety of standard process algebra operators. We show how these compositionality results can be successfully used to conduct compositional reasonings to estimate the performances of group key update protocols in a multicast setting.

**Tutorial – Paolo Milazzo – “On DataMod approaches to systems analysis”**

**Abstract: **In this tutorial we describe a framework for the classification of methodologies for the analysis of behavioral properties of complex dynamical systems. The considered approaches span from purely data-driven ones, in which systems are treated essentially as black-boxes, to model-based ones, in which the internal logic of the systems under study is completely known. Several different levels of knowledge of the systems behavioral mechanisms are included in this span, and different analysis techniques can be applied at different levels (e.g. data mining, machine learning, simulation and model checking). Then, we focus on methodologies making a synergistic use of information and/or of analysis techniques typical of different levels of knowledge (e.g. process mining, statistical model checking and applications of machine learning in formal verification). We show that the framework can be used to reason on the characteristics of such methodologies (that we call “DataMod” approaches).