Programming For Data Science

Oggetto:

Programming For Data Science

Oggetto:

Programming For Data Science

Oggetto:

Academic year 2021/2022

Course ID

NEU0264C

Teacher

Prof.ssa Elena Grassi (Lecturer)

Modular course

DataScience (NEU0264)

Year

1st year

Teaching period

First semester

Type

Distinctive

Credits/Recognition

Course disciplinary sector (SSD)

INF/01 - informatica

Delivery

Formal authority

Language

English

Attendance

Obligatory

Type of examination

Written and oral (optional)

Oggetto:

The course aims to introduce how to effectively operate on command line interfaces (with the linux shell) and the basic concepts of computer programming (with R). It will also explain how to structure pipelines, made from single steps implemented with either approaches. It will focus on some of the technical aspects of scientific reproducibility (package management systems and containers). These fundamental skills will be leveraged as the starting point for the Bioinformatics lessons.

Il corso mira a introdurre come utilizzare in maniera efficace le interfacce a linea di comando (in particolare la shell di linux) e i concetti basilari della programmazione (in particolare del linguaggio R). Descriverà anche come strutturare analisi più complesse composte da singoli passaggi implementati con entrambi gli approcci. Saranno analizzati alcuni aspetti tecnici relativi al problema della riproducibilità scientifica (sistemi di gestione dei pacchetti e container). Gli argomenti affrontati costituiranno le basi delle lezioni di Bioinformatica.

Oggetto:

Results of learning outcomes

English
Italiano

At the end of the course the students will be able to independently work on the linux shell and write simple R scripts to perform basic data wrangling, selecting the most appropriate data structures to easily visualize and interpret aspects of different datasets. They will also be able to approach higher level problems dividing them in simpler steps, that can be tackled with basic programming constructs, and structuring them in small pipelines. They will know the basic concepts of software containers and how to use them in the context of reproducible science.

Alla fine del corso gli studenti saranno in grado di usare indipendentemente la shell di linux e scrivere semplici script R per l’analisi dei dati, scegliendo le migliori strutture dati per visualizzare ed interpretare facilmente diversi dataset. Saranno anche in grado di affrontare problemi di alto livello dividendoli in sotto-problemi più semplici e strutturare le soluzioni tramite piccole pipeline. Conosceranno i concetti alla base dei container e come usarli nel contesto della riproducibilità scientifica.

Oggetto:

Course delivery

English
Italiano

Classroom lectures and practical sessions in computer room.

Lezioni frontali e sessioni pratiche in aula informatica.

Oggetto:

Learning assessment methods

English
Italiano

Written test; optional oral test.

Esame scritto, orale opzionale.

Oggetto:

Program

English
Italiano

Linux shell: traversing the file system and listing files, effective operations via metacharacters. Basic operations on textual files, piping commands and redirection
Imperative programming: variables, control flow (if/else/loops) and functions. Visualizing flowcharts and simple sorting algorithms
Focus on R: vectors, lists and data frames. Vectorization and the apply functions instead of loops. Object oriented programming in R and a guide of its ecosystem of libraries. Plotting with ggplot and notebooks
Pipelines: organizing principles and main pipeline management systems. Snakemake as an example
Reproducibility at the software level: introduction to docker and package management systems

Shell di linux: muoversi nel file system ed elencare i file, operazioni rapide tramite metacaratteri. Operazioni di base su file testuali, usare i comandi in pipe e redirezione
Programmazione imperativa: variabili, controllo di flusso (if/else/cicli) e funzioni. Visualizzare i diagrammi di flusso e semplici algoritmi di ordinamento
Approfondimento su R: vettori, liste e data frame. Vettorizzazione e le funzioni apply al posto dei loop. Programmazione a oggetti in R e guida al suo ecosistema di librerie. Grafici con ggplot e utilizzo dei notebook
Pipeline: principi organizzativi e principali strumenti di gestione delle pipeline. Snakemake come caso di studio
Riproducibilità a livello di software: introduzione a docker e ai sistemi di gestione dei pacchetti

Descrizione

Biotechnology for Neuroscience

Programming For Data Science

Programming For Data Science

Academic year 2021/2022

Course objectives

Results of learning outcomes

Course delivery

Learning assessment methods

Program

Suggested readings and bibliography

Programming For Data Science

Programming For Data Science

Academic year 2021/2022

Sommario del corso

Course objectives

Results of learning outcomes

Course delivery

Learning assessment methods

Program

Suggested readings and bibliography