Laboratory on Algorithms for Big Data a.a. 2016/17

 
  • CFU: 6
  • Period: First semester
  • Language: English
  • Lectures schedule: Tuesday 14-16 (A1) and  Thursday 16-18 (A1).
  • Question time: After lectures or by appointment.
  • Allowed Degree: Master degree in Computer Science, or Computer Science and Networking, or Business Informatics.

Goals and opportunities

The course introduces advanced algorithms and data structures of practical interest. The students will encouraged in their projects to implement, test and compare these techniques on real datasets.

The course will provide the opportunity of

  • facing with difficult algorithmic problems of practical interest involving big data;
  • evaluating the impact of efficient algorithmic solutions in the design of software managing big data;
  • implementing advanced software by using powerful and sophisticated libraries;
  • getting in touch with some companies for internships, scholarships, or thesis proposals.

Exam

The exam consists of two parts: Project and its presentation 70% – Written/oral test (or a serious attempt to compete in any online (algorithmic) contest) 30%.
Students can attend the written/oral test before the presentation/development of the project.

Background

If you wish to refresh your mind on basic Algorithms and Data Structures, I suggest you to look at the well-known book Introduction to Algorithms by Cormen-Leiserson-Rivest-Stein, third edition.

References

  • Introduction to Algorithms,  3rd Edition, Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, The MIT Press, 2009 (Amazon) [CCLR]
  • Algorithms, 4th Edition, Robert Sedgewick, Kevin Wayne, Addison-Wesley Professional, 2011 (Amazon) [RS]
  • Algorithms, Sanjoy Dasgupta, Christos Papadimitriou, Umesh Vazirani, McGraw-Hill, 2006. (Amazon) [DPZ]
  • Probability and Computing: Randomized Algorithms and Probabilistic Analysis, Michael Mitzenmacher and Eli Upfal, Cambridge University Press, 2005 (Amazon) [MU]
  • Programming Challenges: The Programming Contest Training Manual, Steven S. Skiena, Miguel A. Revilla, Springer, 2003 (Amazon)
  • The C++ Standard Library: A Tutorial and Reference (2nd Edition), Nicolai M. Josuttis, Addison-Wesley Professional, 2012 (Amazon)
  • The C++ Programming Language, 4th Edition, Bjarne Stroustrup, Addison-Wesley Professional, 2013 (Amazon)

Interesting links

Project: Time series indexing

Description and Datasets: small and large.

Lectures

Most of the content of the course will be covered by notes, sometime we’ll use parts of papers/books.

Date Lecture References
20/09/2016 Introduction Slides
22/09/2016 Efficiency and profiling Slides and references therein. Video CppCon 2016. Video CppCon 2015.
27/09/2016 Dictionary problem: Hashing. Slides. Hashing Chapter 11 [CCLR].
29/09/2016 Dictionary problem: Cuckoo hashing and perfect hashing. Slides. Perfect Hashing Chapter 11.5 [CCLR]. Cuckoo Hashing.
04/10/2016 Dictionary problem: Minimal perfect hashing. Slides. C++ implementation of Minimal perfect hash functions.
06/10/2016 Succinct data structures: rank/select and Elias-Fano representation. SlidesC++ code.
11/10/2016 Autocompletion system. Slides. SDSL library.
13/10/2016 Succinct tree representations: LOUDS, BP, and DFUDS. Slides.
18/10/2016 Range Maximum Query and TopK retrieval. Slides.
20/10/2016 Discussion about the project.
08/11/2016 Overview on SIMD vector extensions. Slides.
10/11/2016 Problem solving: Sorting I. Problems: Towers, Little Robber Girl’s Zoo, Finding Team Member, and Megacity.
15/11/2016 Problem solving: Sorting II. Problems: Find Pair, Two Heapsand Inversion CountSolutions.
17/11/2016 Problem solving: (Static) Prefix sum. Problems: Ilya and QueriesAlice, Bob and ChocolateNumber of Ways, and Little Girl and Maximum Sum. Solutions.
22/11/2016 Problem solving: (Static) Prefix sum II.
24/11/2016 Problem solving: Prefix sum I. Fenwick tree: descriptiontutorial,  video, and code.
Problem: Update the array.
29/11/2016 Problem solving: Prefix sum II. Problems: Pashmak and Parmida’s problemNested Segments, and Propagating tree.
01/12/2016 Problem solving: Prefix sum III. Solutions
06/12/2016 Problem solving: Graph algorithms. [CCLR] Chapter 22.
Problems: Fox and Names and Learning Languages.
07/12/2016 Problem solving: Dynamic Programming I. [CCLR] Chapter 15.
Problems: 0-1 Knapsack Problem, IWGBS – 0110SS, and Woodcutters.
13/12/2016 Problem solving: Dynamic Programming II. Unweighted shortest path and unweighted longest simple path on DAG, longest increasing subsequence and largest independent set on tree.