Vai al Contenuto Vai alla navigazione del sito

APPRENDIMENTO AUTOMATICO ED ESTRAZIONE DELL'INFORMAZIONE DAI DATI (222MI)

A.A. 2018 / 2019

Periodo 
Primo semestre
Crediti 
9
Durata 
72
Tipo attività formativa 
Affine/Integrativa
Percorso 
[IN20+1+ - Ord. 2016] applicazioni informatiche
Mutuazione 
Mutuato: IN20 - 585SM - MACHINE LEARNING AND DATA ANALYTICS
Syllabus 
Lingua insegnamento 

English

Obiettivi formativi 

Knowledge and understanding.
Know main kinds of problems which can be tackled with ML, DM, and EC and those ones concerning text and natural language and recommendation
Know main ML and DM techniques; know the high-level working scheme of EAs.
Know design, development, and assessment phases of a ML system; know main assessment metrics and procedures suitable for a ML system.

Applying knowledge and understanding.
Formulate a formal problem statement for simple practical problems in order to tackle them with ML, DM, or EC techniques.
Develop simple end-to-end ML or DM systems.
Experimentally assess a simple end-to-end ML or DM system.

Making judgements.
Judge the technical soundness of a ML or DM system.
Judge the technical soundness of the assessment of a ML or DM system.

Communication skills.
Describe, both in written and oral form, the motivations behind choices in the design, development, and assessment of a ML or DM system, possibly exploiting simple plots.

Learning skills.
Retrieve information from scientific publications about ML, DM or EC techniques not explicitly presented in this course.

Prerequisiti 

Basics of statistics: basic graphical tools of data exploration; summary measures of variable distribution (mean, variance, quantiles); fundamentals of probability and of univariate and multivariate distribution of random variables; basics of linear regression analysis.
Basics of linear algebra: vectors, matrices, matrix operations; diagonalization and decomposition in singular values.
Basics of programming and data structures: algorithm, data types, loops, recursion, parallel execution, tree.

Contenuti 

Part T1
Introduction to data science; data analytics, machine learning and statistical learning approaches: common and distinctive aspects (more and more different in name only).
Recap. of main concepts and tools of probability and statistical inference.
Elements of statistical learning; regression function; assessing model accuracy and the bias-variance trade-off; cross-validation methods.
Supervised learning and linear models; model validation and selection; hints to regularization and extensions.

Part M1
Definitions of Machine Learning and Data Mining; why ML and DM are hot topics; examples of applications of ML; phases of design, development, and assessment of a ML system; terminology.
Elements of data visualization.
Supervised learning.
Tree-based methods.
Decision and regression trees: learning and prediction; role of the parameter and overfitting.
Trees aggregation: bagging, Random Forest, boosting.
Supervised learning system assessment: cross-fold validation; accuracy and other metrics; metrics for binary classification (FPR, FNR, EER, AUC) and ROC.
Support Vector Machines (SVM).
Separating hyperplane: maximal margin classifier; support vectors; learning as an optimization problem; maximal margin classifier limitations.
Soft margin classifier: learning, role of the parameter C.
Non linearly separable problems; kernel: brief background and main options (linear, polynomial, radial); intuition behind radial kernel; SVM,
Multiclass classification with SVM.

Part T2
Supervised learning for classification.
Training and test error rate; the Bayes classifier.
Logistic regression.
Linear and quadratic discriminant analysis.
The K-nearest neighbors classifier.
Unupervised learning.
Dimensionality reduction methods: principal component analysis; biplot.
Cluster analysis: hierarchical methods, partitional methods (k-means algorithm).

Part M2
Text and natural language appliactions (text mining)
Sentiment analysis; features for text mining; common pre-processing steps; topic modeling.
Recommending systems.
Content-based filtering; collaborative filtering.
Assessment metrics: precision, recall, accuracy@K, diversity, serendipity.
Evolutionary Computation (EC).
High-level working scheme of an Evolutionary Algorithm (EA); terminology.
Generational model; selection criteria; exploration/exploitation trade-off; genetic operators with examples; fitness function; multi-objective optimization and Pareto dominance; debugging of an evolutionary search; EA issues (diversity, variational inheritance, expressiveness); fitness landscape.
Examples of common EAs: GA, GP, GE.

Metodi didattici 

Frontal lessons with blackboard and slide projection; exercises, under teacher supervision, in dealing with simple problems with ML or DM techniques.

Modalità di verifica dell'apprendimento 

Final exam according to one of the following two options (student’s choice):

Written test + project (the final mark is the average of the two marks).
Written test with questions on theory and application with short open answers.
Project (home assignment) in which the student chooses a problem among a closed, teacher-defined set of problems and proposes a solution based on ML, DM, or EC techniques. The expected outcome is a written document (few pages) including: the problem statement; one or more performance indexes able to capture any solution ability to solve the problem; a description of the proposed solution from the algorithmic point of view; the results and a discussion about the experimental assessment of the solution with, if applicable, information about used data. Student may form groups for the project. The project will be evaluated according also to clarity.

Written test only.
Written test with questions on theory and application with medium- and short-length open answers.

Testi di riferimento 

Kenneth A. De Jong. Evolutionary computation: a unified approach. MIT press, 2006
Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning: Data Mining, Inference, and Prediction. Springer, Berlin: Springer Series in Statistics, 2009.
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. An Introduction to Statistical Learning, with applications in R. Springer, Berlin: Springer Series in Statistics, 2014.


Torna all'elenco insegnamenti