Vai al contenuto principale
Oggetto:

Applied bioinformatics

Oggetto:

Applied bioinformatics

Oggetto:

Academic year 2023/2024

Course ID
NEU0293
Teachers
Ivan Molineris (Lecturer)
Davide Marnetto (Lecturer)
Year
2nd year
Teaching period
Second semester
Type
Related or integrative
Credits/Recognition
4
Course disciplinary sector (SSD)
BIO/11 - molecular biology
Delivery
Formal authority
Language
English
Attendance
Optional
Type of examination
Practice test
Prerequisites
Theoretical knowledge of molecular biology concepts and high-throughput analyses such as DNA and RNA sequencing. Basic knowledge of programming notions, such as: file system, commands, variables, control flow (if/else/loops), lists, functions.

It is necessary to master the concepts seen in the modules of Programming and Bioinformatic of the Data Science teacing.

Oggetto:

Sommario del corso

Oggetto:

Course objectives

The aim of the course is to provide the students with the tools necessary to autonomously run computational analyses, with a specific focus on methods for the analysis of Next-Generation Sequencing big data. It is designed therefore as a natural prosecution of the "Programming for Data Science" and “Bioinformatics” modules from the “Data Science” course. The first module will cover the bash textual interface, which is the most commonly used environment in bioinformatics, covering basic bash tools and using NGS bioinformatics tools as case-study. In the second module the students will learn how to integrate such tools in a computational pipeline, managed by Python Snakemake, completing the necessary competences to reach the course objectives.

Oggetto:

Results of learning outcomes

Knowledge of linux/unix bash textual interface, fundamental for most big data analyses especially but not exclusively in bioinformatics. Understanding of the principles behind the integration of modular steps in a computational analysis to build a complex pipeline. Knowledge of the Snakemake workflow management framework, basic concepts of Conda and Python. Knowledge of commonly used bioinformatics tools for the analysis of NGS data.

Ability to apply and integrate this knowledge to build bioinformatics pipelines to solve biological questions using NGS data, and to apply this knowledge to other problems. Ability to organize and develop independently computational pipelines for the analysis of bology-derived big data, making judgements about the available and necessary computational resources. Autonomy in the usage and integration of computational tools to analyze NGS data.

Knowledge of the vocabulary necessary to communicate with informatics professionals within the scope of the covered topics, ability to formulate biological problems within a computational perspective and to communicate algorithmic solutions.

Improved ability to learn new coding languages thanks to a basic knowledge of underlying principles and thanks to the analogy with known languages, frameworks and conditions.

Oggetto:

Program

Module 1

  1. Computer science concepts (reviewed from the Programming for Data Science module):
    1. Computer architecture
    2. Process
    3. The file system
    4. Interface and API concept
    5. Structure of a linux/unix system
    6. Exchange of data and services, servers
    7. Encoding: everything in bioinformatics is text
  2. The shell and commands
    1. Navigate the filesystem
    2. Filesystem permission system
  3. Unix power tools and basic programming principles
    1. awk
    2. Principles and application of parallel computing
  4. Next generation sequencing data analysis
    1. The fasta and fastq files
    2. Fastqc
    3. Analysis of overrepresented sequences
    4. Annotation of genomes and GTF
    5. Mapping with STAR or bowtie
    6. The bam format and its display
    7. Expression quantification or peack-calling
  5. Error controls and quality assessment

Module 2

  • Pipeline organizing principles, introduction to Python Snakemake. Conda environments and portability. Installation of Conda and Snakemake.
  • Introduction to rules (input, output, shell), rule dependency. First pipeline of 2 example rules.
  • Snakemake options and wildcards. Testing and debugging the example pipeline.
  • Pipeline automatization, wildcards, expand, “all” rules. Fastq quality control rules.
  • Pipeline generalization, configuration files. Rules to map fastqs and obtain bam.
  • Advanced pipelines with parameters, output attributes, rule priorities. Aligment quality control rules
  • Exploiting computational resources: parallelization, Memory resources. expression quantification rules
  • Snakemake is Python. Python basics, functions as input. Rules for the analysis of gene expression
Oggetto:

Course delivery

The course will be entirely held in computer room, alternating short frontal lectures with long hands-on practical sessions to implement what explained.

Oggetto:

Learning assessment methods

Practical tests in which the students will analyze data using pipelines of bioinformatics and UNIX power tools (assigned at home).

Report describing and commenting the practical test (procedures and results).

The code and the report produced should be turned it few days before the exam.

The exam will consist of oral discussion of the code and the report.



Suggested readings and bibliography



Oggetto:

Teaching Modules

Oggetto:
Last update: 11/04/2024 11:21
Location: https://www.biotechnologyneuroscience.unito.it/robots.html
Non cliccare qui!