Laboratory on Algorithms for Big Data a.a. 2015/16

 
  • CFU: 6
  • Period: First semester
  • Language: English
  • Lectures schedule: Tuesday 14-16 (B) and  Thursday 16-18 (A1).
  • Question time: After lectures or by appointment.
  • Allowed Degree: Master degree in Computer Science, or Computer Science and Networking, or Business Informatics.

Goals and opportunities

The course introduces advanced algorithms and data structures of practical interest. The students will encouraged in their projects to implement, test and compare these techniques on real datasets.

The course will provide the opportunity of

  • facing with difficult algorithmic problems of practical interest involving big data;
  • evaluating the impact of efficient algorithmic solutions in the design of software managing big data;
  • implementing advanced software by using powerful and sophisticated libraries;
  • getting in touch with some companies for internships, scholarships, or thesis proposals.

Exam

The exam consists of two parts: Project and its presentation 70% – Written/oral test (or a serious attempt to compete in any online (algorithmic) contest) 30%.
Students can attend the written/oral test before the presentation/development of the project.

Background

If you wish to refresh your mind on basic Algorithms and Data Structures, I suggest you to look at the well-known book Introduction to Algorithms by Cormen-Leiserson-Rivest-Stein, third edition.

References

  • Introduction to Algorithms,  3rd Edition, Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, The MIT Press, 2009 (Amazon) [CCLR]
  • Algorithms, 4th Edition, Robert Sedgewick, Kevin Wayne, Addison-Wesley Professional, 2011 (Amazon) [RS]
  • Algorithms, Sanjoy Dasgupta, Christos Papadimitriou, Umesh Vazirani, McGraw-Hill, 2006. (Amazon) [DPZ]
  • Probability and Computing: Randomized Algorithms and Probabilistic Analysis, Michael Mitzenmacher and Eli Upfal, Cambridge University Press, 2005 (Amazon) [MU]
  • Programming Challenges: The Programming Contest Training Manual, Steven S. Skiena, Miguel A. Revilla, Springer, 2003 (Amazon)
  • The C++ Standard Library: A Tutorial and Reference (2nd Edition), Nicolai M. Josuttis, Addison-Wesley Professional, 2012 (Amazon)
  • The C++ Programming Language, 4th Edition, Bjarne Stroustrup, Addison-Wesley Professional, 2013 (Amazon)

Interesting links

Lectures

Most of the content of the course will be covered by notes, sometime we’ll use parts of papers/books.

Date Lecture References
24/09/2015 Introduction Slides
25/09/2015 Efficiency and profiling Slides and references therein
01/10/2015 Trie, compact trie, patricia trie, and ternary search tree. Slides. Sections 7.1, 7.4 and 7.5 of these notes and this Dr.Dobb’s article.
02/10/2015 Finding top-k and Range Maximum Query. Slides. RMQ: paper and  Wikipedia.
06/10/2015 Succinct data structures: rank/select, Elias-Fano representation, and LOUDS. Slides.
08/10/2015 Succinct tree representations: LOUDS, BP, and DFUDS. Dictionary problem: Hashing. Slides and Slides. Hashing Chapter 11 [CCLR].
13/10/2015 Dictionary problem: Cuckoo hashing and perfect hashing. Slides. Perfect Hashing Chapter 11.5 [CCLR]. Cuckoo Hashing.
15/10/2015 Dictionary problem: bloom filter. Predecessor problem: BST, X-fast and Y-fast tries. Slides and SlidesBloom FilterX-fast trieY-fast trie.
20/10/2015 Predecessor problem: Interpolation Search, B-trees, and Fractal tree index. SlidesInterpolation Search. B-trees Chapter 18 in [CCLR]. Fractal tree index.
22/10/2015 Predecessor problem: Scapegoat Tree. Overview on SIMD vector extensions. Slides and Slides (latter slides by Markus Püschel). Notes on Scapegoat tree. Take a look at the lecture on vectorization  by Marco Danelutto (SPM course).
27/10/2015 Overview on SIMD vector extensions. Slides.
29/10/2015 Orthogonal range query: Priority search tree, Range tree, and Fractional cascading. Slides. Priority search tree. Range tree. Fractional cascading: 1 and 2.
10/11/2015 Dynamic programming: Fibonacci and Knapsack problem. Notes. Knapsack.
12/11/2015 Dynamic programming: Unweighted shortest path and unweighted longest simple path on DAG, longest increasing subsequence, largest independent set on tree, and finding the maximum rectangle of all 1s in a binary matrix. Chapter 6 in [DPZ]. Video.
17/11/2015 Programming challenges: Rod cutting and Weighted interval scheduling. Slides.
19/11/2015 Programming challenges: Pirellone and Stair. Slides.
24/11/2015 Programming challenges: “Is bigger smarter?” and “weights and measures”. Slides.
26/11/2015 Programming challenges: “Vito’s family” and “bicoloring”. Slides.
01/12/2015 Programming challenges: “Apple stocks” and “find duplicates”. Slides.
03/12/2015 Programming challenges: “Common permutation” and “Morgan and a string”. Slides.