API Reference

Operations

BioAlignments.ismatchopFunction
ismatchop(op::Operation)

Test if op is a match operation (i.e. op ∈ (OP_MATCH, OP_SEQ_MATCH, OP_SEQ_MISMATCH)).

source

Alignments

BioAlignments.AlignmentType

Defines how to align a given sequence onto a reference sequence. The alignment is represented as a sequence of elementary operations (match, insertion, deletion etc) anchored to specific positions of the input and reference sequence.

source
BioAlignments.AlignmentMethod
Alignment(anchors::Vector{AlignmentAnchor}, check=true)

Create an alignment object from a sequence of alignment anchors.

source
BioAlignments.AlignmentMethod
Alignment(cigar::AbstractString, seqpos=1, refpos=1)

Make an alignment object from a CIGAR string.

seqpos and refpos specify the starting positions of two sequences.

source
BioAlignments.seq2refMethod
seq2ref(aln::Union{Alignment, AlignedSequence, PairwiseAlignment}, i::Integer)::Tuple{Int,Operation}

Map a position i from sequence to reference.

source
BioAlignments.ref2seqMethod
ref2seq(aln::Union{Alignment, AlignedSequence, PairwiseAlignment}, i::Integer)::Tuple{Int,Operation}

Map a position i from reference to sequence.

source
BioAlignments.cigarMethod
cigar(aln::Alignment)

Make a CIGAR string encoding of aln.

This is not entirely lossless as it discards the alignments start positions.

source

Substitution matrices

Pairwise alignments

Base.countMethod
count(aln::PairwiseAlignment, target::Operation)

Count the number of positions where the target operation is applied.

source
BioAlignments.AffineGapScoreModelType
AffineGapScoreModel(submat, gap_open, gap_extend)
AffineGapScoreModel(submat, gap_open=, gap_extend=)
AffineGapScoreModel(match=, mismatch=, gap_open=, gap_extend=)

Affine gap scoring model.

This creates an affine gap scroing model object for alignment from a substitution matrix (submat), a gap opening score (gap_open), and a gap extending score (gap_extend). A consecutive gap of length k has a score of gap_open + gap_extend * k. Note that both of the gap scores should be non-positive. As a shorthand of creating a dichotomous substitution matrix, you can write as, for example, AffineGapScoreModel(match=5, mismatch=-3, gap_open=-2, gap_extend=-1).

Example

using BioSequences
using BioAlignments

# create an affine gap scoring model from a predefined substitution
# matrix and gap opening/extending scores.
affinegap = AffineGapScoreModel(BLOSUM62, gap_open=-10, gap_extend=-1)

# run global alignment between two amino acid sequenecs
pairalign(GlobalAlignment(), aa"IDGAAGQQL", aa"IDGATGQL", affinegap)

See also: SubstitutionMatrix, pairalign, CostModel

source
BioAlignments.CostModelType
CostModel(submat, insertion, deletion)
CostModel(submat, insertion=, deletion=)
CostModel(match=, mismatch=, insertion=, deletion=)

Cost model.

This creates a cost model object for alignment from substitution matrix (submat), an insertion cost (insertion), and a deletion cost (deletion). Note that both of the insertion and deletion costs should be non-negative. As a shorthand of creating a dichotomous substitution matrix, you can write as, for example, CostModel(match=0, mismatch=1, insertion=2, deletion=2).

Example

using BioAlignments

# create a cost model from a substitution matrix and indel costs
cost = CostModel(ones(128, 128) - eye(128), insertion=.5, deletion=.5)

# run global alignment to minimize edit distance
pairalign(EditDistance(), "intension", "execution", cost)

See also: SubstitutionMatrix, pairalign, AffineGapScoreModel

source
BioAlignments.pairalignFunction
pairalign(type, seq, ref, model, [options...])

Run pairwise alignment between two sequences: seq and ref.

Available types are:

  • GlobalAlignment()
  • LocalAlignment()
  • SemiGlobalAlignment()
  • OverlapAlignment()
  • EditDistance()
  • LevenshteinDistance()
  • HammingDistance()

GlobalAlignment, LocalAlignment, SemiGlobalAlignment, and OverlapAlignment are problem that maximizes alignment score between two sequences. Therefore, model should be an instance of AbstractScoreModel (e.g. AffineGapScoreModel).

EditDistance, LevenshteinDistance, and HammingDistance are problem that minimizes alignment cost between two sequences. As for EditDistance, model should be an instance of AbstractCostModel (e.g. CostModel). LevenshteinDistance and HammingDistance have predefined a cost model, so users cannot specify a cost model for these alignment types.

When you pass the score_only=true or distance_only=true option to pairalign, the result of pairwise alignment holds alignment score/distance only. This may enable some algorithms to run faster than calculating full alignment result. Other available options are documented for each alignemnt type.

Example

using BioSequences
using BioAlignments

# create affine gap scoring model
affinegap = AffineGapScoreModel(
    match=5,
    mismatch=-4,
    gap_open=-5,
    gap_extend=-3
)

# run global alignment between two DNA sequences
pairalign(GlobalAlignment(), dna"AGGTAG", dna"ATTG", affinegap)

# run local alignment between two DNA sequences
pairalign(LocalAlignment(), dna"AGGTAG", dna"ATTG", affinegap)

# you cannot specify a cost model in LevenshteinDistance
pairalign(LevenshteinDistance(), dna"AGGTAG", dna"ATTG")

See also: AffineGapScoreModel, CostModel

source
BioAlignments.alignmentFunction
alignment(aligned_sequence)

Gets the Alignment of aligned_sequence.

source
alignment(pairwise_alignment)

Gets the underlying Alignment from pairwise_alignment.

source
alignment(alignment_result)

Return the alignment if any as a PairwiseAlignment. To get the Alignment, nest the function, e.g. alignment(alignment(alignment_result)). This function returns a PairwiseAlignment instead of an Alignment for backwards-compatibility reasons.

See also: hasalignment

source
Missing docstring.

Missing docstring for seq2ref(::PairwiseAlignment, ::Integer). Check Documenter's build log for details.

Missing docstring.

Missing docstring for ref2seq(::PairwiseAlignment, ::Integer). Check Documenter's build log for details.