5.2 Glossary [edit]

5.2.1 Pairwise alignment (noun) [edit]

A hypothesis about which bases or amino acids in two biological sequences are derived from a common ancestral base or amino acid. By definition, the aligned sequences will be of equal length with gaps (usually denoted with -, or . for terminal gaps) indicating hypothesized insertion deletion events. A pairwise alignment may be represented as follows:

ACC---GTAC
CCCATCGTAG

5.2.2 kmer (noun) [edit]

A kmer is simply a word (or list of adjacent characters) in a sequence of length k. For example, the overlapping kmers in the sequence ACCGTGACCAGTTACCAGTTTGACCAA are as follows:

In [1]:
import skbio
skbio.DNA('ACCGTGACCAGTTACCAGTTTGACCAA').kmer_frequencies(k=5, overlap=True)
Out[1]:
{'ACCAA': 1,
 'ACCAG': 2,
 'ACCGT': 1,
 'AGTTA': 1,
 'AGTTT': 1,
 'CAGTT': 2,
 'CCAGT': 2,
 'CCGTG': 1,
 'CGTGA': 1,
 'GACCA': 2,
 'GTGAC': 1,
 'GTTAC': 1,
 'GTTTG': 1,
 'TACCA': 1,
 'TGACC': 2,
 'TTACC': 1,
 'TTGAC': 1,
 'TTTGA': 1}

It is common for bioinformaticians to substitute the value of k for the letter k in the word kmer. For example, you might here someone say "we identified all seven-mers in our sequence", to mean they identified all kmers of length seven.