Name: Computational Biology
Author: Peter Dawyndt

Computational Biology (2016–2017)

Peter Dawyndt · Universiteit Gent

Welcome to the Dodona course for the UGent module on Computational Biology (Master of Science in Computer Science and Master of Science in Statistical Data Analysis). The course contains Python3 programming exercises that support automated feedback.

You are not a member of this course.

Learning path

What's next in this course

Graph algorithms

An algorithm requiring the interpretation and manipulation of networks, or graphs. Rosalind

Status	Status	Type	Title	Class progress
			Overlap graphs
			Completing a tree
			Introduction to pattern matching
			Encoding suffix trees
			Identifying maximal repeats
			Finding the longest multiple repeat

Phylogeny

A tree modeling the evolutionary scenario deriving a collection of taxa from their proposed ancestors. Rosalind

Status	Status	Type	Title	Class progress
			Completing a tree
			Culling the forest
			Creating a distance matrix
			Paths in trees
			Inferring genotype from a pedigree

BWT and FM-indices for pattern matching

Status	Type	Title	Class progress	Status	Actions
		Construction of Burrows-Wheeler transformed string
		BWT-based read mapping

Hidden Markov Models

Hidden Markov models. Rosalind

Solve the exercises of the HMM track on Rosalind. We will use the matrix visualization developed during the project to add the remaining exercises to Dodona with newly generated test cases and sometimes with some slight modifications to the problem statement. Solutions to these exercises will be presented by the students in class. You might find some inspiration about using HMM in the following Jupyter Notebook demos:

Status	Type	Title	Class progress	Status	Actions
		Probability of a hidden path
		Probability of an outcome given a hidden path
		Viterbi decoding
		Outcome likelihood

Alignment

read chapter How do we compare DNA sequences? in the book by Compeau & Pevzner
check the Eyeless demo (Jupyter Notebook)

Status	Type	Title	Class progress	Status	Actions
		Counting point mutations
		Transitions and transversions
		Creating a distance matrix
		Edit distance
		Edit distance alignment
		Counting optimal alignments
		Global alignment with scoring matrix
		Global alignment with constant gap penalty
		Local alignment with scoring matrix
		Maximizing the gap symbols of an optimal alignment
		Multiple alignment
		Global alignment with scoring matrix and affine gap penalty
		Overlap alignment
		Semiglobal alignment
		Local alignment with scoring matrix and affine gap penalty
		Isolating symbols in alignments
		Finding a motif with modifications
		Pairwise global alignment
		Suboptimal local alignment
		Global multiple alignment

Dynamic programming

Find or design a (non-classic) dynamic programming problem that can be added to Dodona. No problems related to alignment / edit distance. Bonuses for problems that are applications in computational biology. Submit a ZIP-file to Indianio containing:

problem description (introduction - assignment - examples)
README file containing notes and references
- where did you find the assignment
- generating test cases
- non-trivial testing of solutions
your sample solution

Status	Type	Title	Class progress	Status	Actions
		Rabbits and recurrence relations
		Mortal Fibonacci rabbits
		Let's go to the movies
		Subset sum problem
		Boltzmann's tombstone
		Throwing dice
		Minimum number of jumps
		Accordion folding
		Change problem
		Matrix chain multiplication
		Bytelandian gold coins
		Counting heads
		Longest palindromic subsequence

Gene finding

read chapter 02 in the textbook: All the sequences’ men
check the Mycoplasma demo (Jupyter Notebook)

Status	Type	Title	Class progress	Status	Actions
		Transcribing DNA into RNA
		Translating RNA into protein
		Introduction to protein databases
		Finding genes with ORFs
		Protein translation

String algorithms

An algorithm involving the manipulation and properties of chains of symbols.

Status	Type	Title	Class progress	Status	Actions
		Finding a shared motif
		Perfect matchings and RNA secondary structures
		Finding a spliced motif
		Catalan numbers and RNA secondary structures
		Speeding up motif finding
		Finding a shared spliced motif
		Ordering strings of varying length lexicographically
		Maximum matchings and RNA secondary structures
		Motzkin numbers and RNA secondary structures
		Interleaving two motifs
		Introduction to pattern matching
		Finding disjoint motifs in a gene
		Encoding suffix trees
		Linguistic complexity of a genome
		Identifying maximal repeats
		Finding all similar motifs

Probability

The mathematical study of the chance of occurrence of random events, or the chance with which a specific event will occur. In preparation:

read chapter 01 in the textbook: The first look at a genome
check the Haemophilus demo (Jupyter Notebook)

Status	Type	Title	Class progress	Status	Actions
		Mendel's first law
		Introduction to random strings
		Calculating expected offspring
		Independent alleles
		Matching random motifs
		Expected number of restriction sites
		Independent segregation of chromosomes
		Counting disease carriers
		Inferring genotype from a pedigree
		Sex-linked inheritance
		The Wright-Fisher model of genetic drift
		Wright-Fisher's expected behavior
		The founder effect and genetic drift

Bioinformatics tools

If you are an accomplished coder, then you can write a separate program for every new task you encounter. In practice, these programs only need to be written once and packaged into modules, where those of us who are not great coders can use them quickly and efficiently. In this first series of exercises, we will familiarize ourselves with a sampling of some of the more popular bioinformatics tools taken from “out of the box” software. In preparation:

read chapter 00 in the textbook: Prologue
get yourself familiar with the BioPython module

Status	Type	Title	Class progress	Status	Actions
		Counting DNA nucleotides
		Transcribing DNA into RNA
		Translating RNA into protein
		Introduction to protein databases
		Finding genes with ORFs
		Protein translation
		Complementing a strand of DNA
		Computing GC-content
		Finding a motif in DNA
		Consensus and profile
		Locating restriction sites
		RNA splicing
		Enumerating k-mers lexicographically
		k-Mer composition

A warm-up challenge

Status	Type	Title	Class progress	Status	Actions
		Simple genome assembly