- Oggetto:
Next generation sequencing data analysis using linux power tools
- Oggetto:
Next generation sequencing data analysis using linux power tools
- Oggetto:
Academic year 2023/2024
- Course ID
- NEU0293A
- Teacher
- Ivan Molineris (Lecturer)
- Year
- 1st year
- Teaching period
- First semester
- Type
- Related or integrative
- Credits/Recognition
- 2
- Course disciplinary sector (SSD)
- BIO/11 - molecular biology
- Delivery
- Formal authority
- Language
- English
- Attendance
- Optional
- Type of examination
- Practice test
- Type of learning unit
- modulo
- Modular course
- Applied bioinformatics (NEU0293)
- Prerequisites
- Theoretical knowledge of molecular biology concepts and high-throughput analyses such as DNA and RNA sequencing. Basic knowledge of programming notions, such as: file system, commands, variables, control flow (if/else/loops), lists, functions.
It is necessary to master the concepts seen in the modules of Programming and Bioinformatic of the Data Science teacing. - Oggetto:
Sommario del corso
- Oggetto:
Course objectives
The aim of the course is to provide the students with the knowledge and competences necessary to autonomously run computational analyses in a UNIX environment, with a specific focus on methods for the analysis of Next-Generation Sequencing big data.
- Oggetto:
Results of learning outcomes
Understanding of the computing processes and basic idea of computer architecture.
Familiarity with the UNIX environment.
Be able to proficeintly run bioinformaitcs tool from the command line.
Understanding of parallel computing principles and basic application.
- Oggetto:
Program
- Computer science concepts (reviewed from the Programming for Data Science module):
- Computer architecture
- Process
- The file system
- Interface and API concept
- Structure of a linux/unix system
- Exchange of data and services, servers
- Encoding: everything in bioinformatics is text
- The shell and commands
- Navigate the filesystem
- Filesystem permission system
- Unix power tools and basic programming principles
- awk
- Principles and application of parallel computing
- Next generation sequencing data analysis
- The fasta and fastq files
- Fastqc
- Analysis of overrepresented sequences
- Annotation of genomes and GTF
- Mapping with STAR
- The bam format and its display
- Expression quantification
- Introduction to pseudo-alignments
- Error controls and quality assessment
- Computer science concepts (reviewed from the Programming for Data Science module):
- Oggetto:
Course delivery
All the lesson will be done in the informatic Lab, and are a blend of frontal teacinh and practical activities.
- Oggetto:
Learning assessment methods
The learning assesment is ingertated with the other module of the curse. See the relavite page.
The students should produce one integrated report concering both modules and there will be one intergated oral examination.
Suggested readings and bibliography
- Oggetto: