Laboratory on Algorithms for Big Data a.a. 2015/16

Teacher: Rossano Venturini

CFU: 6

Period: First semester

Language: English

Lectures schedule: Tuesday 1416 (B) and Thursday 1618 (A1).

Question time: After lectures or by appointment.

Allowed Degree: Master degree in Computer Science, or Computer Science and Networking, or Business Informatics.
Goals and opportunities
The course introduces advanced algorithms and data structures of practical interest. The students will encouraged in their projects to implement, test and compare these techniques on real datasets.
The course will provide the opportunity of

facing with difficult algorithmic problems of practical interest involving big data;

evaluating the impact of efficient algorithmic solutions in the design of software managing big data;

implementing advanced software by using powerful and sophisticated libraries;

getting in touch with some companies for internships, scholarships, or thesis proposals.
Exam
The exam consists of two parts: Project and its presentation 70% – Written/oral test (or a serious attempt to compete in any online (algorithmic) contest) 30%.
Students can attend the written/oral test before the presentation/development of the project.
Background
If you wish to refresh your mind on basic Algorithms and Data Structures, I suggest you to look at the wellknown book Introduction to Algorithms by CormenLeisersonRivestStein, third edition.
References
 Introduction to Algorithms, 3rd Edition, Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, The MIT Press, 2009 (Amazon) [CCLR]
 Algorithms, 4th Edition, Robert Sedgewick, Kevin Wayne, AddisonWesley Professional, 2011 (Amazon) [RS]
 Algorithms, Sanjoy Dasgupta, Christos Papadimitriou, Umesh Vazirani, McGrawHill, 2006. (Amazon) [DPZ]
 Probability and Computing: Randomized Algorithms and Probabilistic Analysis, Michael Mitzenmacher and Eli Upfal, Cambridge University Press, 2005 (Amazon) [MU]
 Programming Challenges: The Programming Contest Training Manual, Steven S. Skiena, Miguel A. Revilla, Springer, 2003 (Amazon)
 The C++ Standard Library: A Tutorial and Reference (2nd Edition), Nicolai M. Josuttis, AddisonWesley Professional, 2012 (Amazon)
 The C++ Programming Language, 4th Edition, Bjarne Stroustrup, AddisonWesley Professional, 2013 (Amazon)
Interesting links
 Erik D Demaine, Srini Devadas, Nancy Ann Lynch, “MIT Design and Analysis of Algorithms”, MIT Algorithm course which includes video lectures
 Erik D Demaine, “Advanced data structures”, includes video lectures
 Chandler Carruth, “Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!”, CppCon 2015
 Agner Fog, “Optimizing software in C++”
 A list of links to several libraries and useful resources
Lectures
Most of the content of the course will be covered by notes, sometime we’ll use parts of papers/books.
Date  Lecture  References 

24/09/2015  Introduction  Slides 
25/09/2015  Efficiency and profiling  Slides and references therein 
01/10/2015  Trie, compact trie, patricia trie, and ternary search tree.  Slides. Sections 7.1, 7.4 and 7.5 of these notes and this Dr.Dobb’s article. 
02/10/2015  Finding topk and Range Maximum Query.  Slides. RMQ: paper and Wikipedia. 
06/10/2015  Succinct data structures: rank/select, EliasFano representation, and LOUDS.  Slides. 
08/10/2015  Succinct tree representations: LOUDS, BP, and DFUDS. Dictionary problem: Hashing.  Slides and Slides. Hashing Chapter 11 [CCLR]. 
13/10/2015  Dictionary problem: Cuckoo hashing and perfect hashing.  Slides. Perfect Hashing Chapter 11.5 [CCLR]. Cuckoo Hashing. 
15/10/2015  Dictionary problem: bloom filter. Predecessor problem: BST, Xfast and Yfast tries.  Slides and Slides. Bloom Filter. Xfast trie. Yfast trie. 
20/10/2015  Predecessor problem: Interpolation Search, Btrees, and Fractal tree index.  Slides. Interpolation Search. Btrees Chapter 18 in [CCLR]. Fractal tree index. 
22/10/2015  Predecessor problem: Scapegoat Tree. Overview on SIMD vector extensions.  Slides and Slides (latter slides by Markus Püschel). Notes on Scapegoat tree. Take a look at the lecture on vectorization by Marco Danelutto (SPM course). 
27/10/2015  Overview on SIMD vector extensions.  Slides. 
29/10/2015  Orthogonal range query: Priority search tree, Range tree, and Fractional cascading.  Slides. Priority search tree. Range tree. Fractional cascading: 1 and 2. 
10/11/2015  Dynamic programming: Fibonacci and Knapsack problem.  Notes. Knapsack. 
12/11/2015  Dynamic programming: Unweighted shortest path and unweighted longest simple path on DAG, longest increasing subsequence, largest independent set on tree, and finding the maximum rectangle of all 1s in a binary matrix.  Chapter 6 in [DPZ]. Video. 
17/11/2015  Programming challenges: Rod cutting and Weighted interval scheduling.  Slides. 
19/11/2015  Programming challenges: Pirellone and Stair.  Slides. 
24/11/2015  Programming challenges: “Is bigger smarter?” and “weights and measures”.  Slides. 
26/11/2015  Programming challenges: “Vito’s family” and “bicoloring”.  Slides. 
01/12/2015  Programming challenges: “Apple stocks” and “find duplicates”.  Slides. 
03/12/2015  Programming challenges: “Common permutation” and “Morgan and a string”.  Slides. 