Machine Learning
A.A. 2020/21 (First Semester)
Course Teacher – Alessio Micheli
Seminar Classes on Bayesian Learning
Teacher – Davide Bacciu
Aims – Introduction to Bayesian Learning: maximum likelihood hypothesis, MAP and Bayesian hypotheses. Representing (conditional) independence between random variables: Bayesian networks and plate notation. Parameter learning in Bayesian networks: ML and Expectation Maximization (EM). Application examples.
Students’ Office Hours – Tue. 1416 (email contact for confirmation)
Course Book
[AIMA] Russell, S. and Norvig, N. Artificial Intelligence: A Modern Approach. Prentice Hall Series in Artificial Intelligence, 2003.
Chapter 20 “Statistical Learning Methods” – Available online here
[MML] Mitchell, T . Machine Learning, McGraw Hill.1997.
Chapter 6 “Bayesian Learning”
A freely available online book that can be used both as a reference course book as well as to deepen course contents is
David Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press, 2012.
Chapters that are of interest for this seminars are n. 1 to 5 and n. 8 to 12.
All the official material for the seminars (including slides) can be found on the ML course website on Moodle. On this page you can find a complement of such information. For each class, it is provided a selection of additional readings in addition to the references to the course books.
Software
David Barber’s book is distributed with Matlab code (BRML toolbox) showing examples of Bayesian learning. An archive of the most recent software distribution can be downloaded here. It should run seamlessly also in Octave, an open source porting of Matlab environment.
An excellent Matlab package that allows to rapidly build Bayesian models and networksis the Bayes Net Toolbox (BNT) by Kevin Murphy.
Further Matlab demo are provided as additional material in the class calendar section.
Lecture calendar
Lecture  Topic  Course book  Additional materials and further readings  
1  Introduction to Bayesian Learning  [AIMA] Sect. 20.1 [MML] Sect. 6.16.3, Sect. 6.56.9 
Further readings [5] Chapt 13Software [BRML toolbox] Functions demoBurglar.m and demoChestClinic.m show demos of probabilistic inference on Bayesian networks (cf. examples 3.1 and Fig. 3.15 in [5]) 

2  Parameters Learning in Bayesian Network: Learning with Complete Data  [AIMA] Sect. 20.220.3 [MML] Sect. 6.4, Sect. 6.5, 6.10, 6.12 
Further readings [1] Generative Vs Discriminative [2] Tutorial on maximum likelihood estimation with Matlab code [5] Sect. 8.8, 9.19.3. Chapt 10A stepbystep derivation of the NB prior learning rule can be found here.For those interested in getting to know more about Lagrange Multipliers here is a good technical source. Software [demoNB] A demo showing Naive Bayes learning on the 20 Newsgroup dataset. 

3  Parameters Learning in Bayesian Network: the EM Algorithm  [MML] Sect. 6.11  Further readings [3] EM algorithm [5] Chapt 11 [4] Comparative analysis of structure learning algorithms Software [BoW Demo] Tutorial code with bagofwords application to images using Naive Bayes and Probabilistic Latent Semantic Analysis (PLSA). 
Bibliography
[1] Pernkopf, F. and Bilmes, J., Discriminative versus generative parameter and structure learning of Bayesian network classifiers. Proceedings of the 22nd international conference on Machine learning. ACM. 2005.
[2] I. J. Myung, Tutorial on maximum likelihood estimation, Journal of Mathematical Psychology, Vol. 47, No. 1. (2003), pp. 90100.
[3] Bilmes, J.A., A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models, Technical Report, 1998.
[4] Ioannis Tsamardinos, Laura E. Brown, Constantin F. Aliferis: The maxmin hillclimbing Bayesian network structure learning algorithm. Machine Learning 65(1): 3178, 2006.
[5] David Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press, 2012.