An Introduction To Applied Bioinformatics
[edit]
¶
Table of Contents
Getting started
Reading An Introduction to Applied Bioinformatics
Who should read IAB?
How to read IAB
Using the IPython Notebook
Reading list
Getting started with Biology
Getting started with Computer Science and programming
Philosophy of biology and popular science books
Need help?
Contributing to IAB
About the author
Acknowledgements
Fundamentals
Pairwise sequence alignment
What is a sequence alignment?
A simple procedure for aligning a pair of sequences
Step 1: Create a blank matrix where the rows and columns represent the positions in the sequences.
Step 2: Add values to the cells in the matrix.
Step 3: Identify the longest diagonals.
Step 4: Transcribe some of the possible alignments that arise from this process.
Why this simple procedure is too simplistic
Differential scoring of matches and mismatches
A better approach for global pairwise alignment using the Needleman-Wunsch algorithm
Stepwise Needleman-Wunsch alignment
Step 1: Create blank matrices.
Step 2: Compute $F$ and $T$.
Step 3: Transcribe the alignment.
Automating Needleman-Wunsch alignment with Python
A note on computing $F$ and $T$
Global versus local alignment
Smith-Waterman local sequence alignment
Step 1: Create blank matrices.
Step 2: Compute $F$ and $T$.
Step 3: Transcribe the alignment.
Automating Smith-Waterman alignment with Python
Differential scoring of gaps
How long does pairwise sequence alignment take?
Comparing implementations of Smith-Waterman
Analyzing Smith-Waterman run time as a function of sequence length
Conclusions on the scalability of pairwise sequence alignment with Smith-Waterman
Sequence homology searching
Defining the problem
Loading annotated sequences
Defining the problem
A complete homology search function
Reducing the runtime for database searches
Heuristic algorithms
Random reference sequence selection
Composition-based reference sequence collection
GC content
kmer content
Further optimizing composition-based approaches by pre-computing reference database information
Determining the statistical significance of a pairwise alignment
Metrics of alignment quality
False positives, false negatives, p-values, and alpha
Interpreting alignment scores in context
Exploring the limit of detection of sequence homology searches
Generalized dynamic programming for multiple sequence alignment
Progressive alignment
Building the guide tree
Generalization of Needleman-Wunsch (with affine gap scoring) for progressive multiple sequence alignment
Putting it all together: progressive multiple sequence alignment
Progressive alignment versus iterative alignment
Phylogenetic reconstruction
Why build phylogenies?
How phylogenies are reconstructed
Some terminology
Simulating evolution
A cautionary word about simulations
Visualizing trees with ete3
Distance-based approaches to phylogenetic reconstruction
Distances and distance matrices
Alignment-free distances between sequences
Alignment-based distances between sequences
Jukes-Cantor correction of observed distances between sequences
Phylogenetic reconstruction with UPGMA
Applying UPGMA from SciPy
Understanding the name
Phylogenetic reconstruction with neighbor-joining
Limitations of distance-based approaches
Bootstrap analysis
Parsimony-based approaches to phylogenetic reconstruction
How many possible phylogenies are there for a given collection of sequences?
Statistical approaches to phylogenetic reconstruction
Bayesian methods
Maximum likelihood methods
Rooted versus unrooted trees
Acknowledgements
Sequence mapping and clustering
De novo clustering of sequences by similarity
Furthest neighbor clustering
Nearest neighbor clustering
Centroid clustering
Three different definitions of OTUs
Comparing properties of our clustering algorithms
Reference-based clustering to assist with parallelization
Applications
Studying Microbial Diversity
Getting started: the feature table
Terminology
Measuring alpha diversity
Observed species (or Observed OTUs)
A limitation of OTU counting
Phylogenetic Diversity (PD)
Even sampling
Measuring beta diversity
Distance metrics
Bray-Curtis
Unweighted UniFrac
Even sampling
Interpreting distance matrices
Distribution plots and comparisons
Hierarchical clustering
Ordination
Polar ordination
Determining the most important axes in polar ordination
Interpreting ordination plots
Tools for using ordination in practice: scikit-bio, pandas, and matplotlib
PCoA versus PCA: what's the difference?
Are two different analysis approaches giving me the same result?
Procrustes analysis
Where to go from here
Acknowledgements
Exercises
Local sequence alignment exercises
Purpose
Background
Goals
Hints
Getting started
Question 1
Question 2
Question 3
Question 4
More hints
Multiple sequence alignment exercises
Purpose
Goals
Hints
Functions that you will need to complete the exercise.
Question 1
Question 2
Question 3
Question 4
Question 5
Question 6
Question 7
Back Matter
Glossary
Pairwise alignment (noun)