FASTA formatted files

NB: First read the overview in the sidebar

FASTA is a text-based file format for representing biological sequences. A FASTA file stores a list of sequence records with name, description, and sequence.

The template of a sequence record is:

>{description}
{sequence}

Where the "identifier" is the first part of the description up to the first whitespace (or the entire description if there is no whitespace)

Here is an example of a chromosomal sequence:

>chrI chromosome 1
CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACC
CACACACACACATCCTAACACTACCCTAACACAGCCCTAATCTA

Here:

  • The identifier is "chrI"
  • The description is "chrI chromosome 1", containing the identifier
  • The sequence is the DNA sequence "CCACA..."

The FASTARecord

FASTA records are, by design, very lax in what they can contain. They can contain almost arbitrary byte sequences, including invalid unicode, and trailing whitespace on their sequence lines, which will be interpreted as part of the sequence. If you want to have more certainty about the format, you can either check the content of the sequences with a regex, or (preferably), convert them to the desired BioSequence type.

FASTX.FASTA.RecordType
FASTA.Record

Mutable struct representing a FASTA record as parsed from a FASTA file. The content of the record can be queried with the following functions: identifier, description, sequence.

FASTA records are un-typed, i.e. they are agnostic to what kind of data they contain.

See also: FASTA.Reader, FASTA.Writer

Examples

julia> rec = parse(FASTARecord, ">some header\nTAqA\nCC");

julia> identifier(rec)
"some"

julia> description(rec)
"some header"

julia> sequence(rec)
"TAqACC"

julia> typeof(description(rec)) == typeof(sequence(rec)) <: AbstractString
true
source

FASTAReader and FASTAWriter

FASTAWriter can optionally be passed the keyword width to control the line width. If this is zero or negative, it will write all record sequences on a single line. Else, it will wrap lines to the given maximal width.

Reference:

FASTX.FASTAModule
FASTA

Module under FASTX with code related to FASTA files.

source
FASTX.FASTA.ReaderType
FASTA.Reader(input::IO; index=nothing, copy::Bool=true)

Create a buffered data reader of the FASTA file format. The reader is a BioGenerics.IO.AbstractReader, a stateful iterator of FASTA.Record. Readers take ownership of the underlying IO. Mutating or closing the underlying IO not using the reader is undefined behaviour. Closing the Reader also closes the underlying IO.

See more examples in the FASTX documentation.

See also: FASTA.Record, FASTA.Writer

Arguments

  • input: data source
  • index: Optional random access index (currently fai is supported). index can be nothing, a FASTA.Index, or an IO in which case an index will be parsed from the IO, or AbstractString, in which case it will be treated as a path to a fai file.
  • copy::Bool: iterating returns fresh copies instead of the same Record. Set to false for improved performance, but be wary that iterating mutates records.

Examples

julia> rdr = FASTAReader(IOBuffer(">header\nTAG\n>another\nAGA"));

julia> records = collect(rdr); close(rdr);

julia> foreach(println, map(identifier, records))
header
another

julia> foreach(println, map(sequence, records))
TAG
AGA
source
FASTX.FASTA.WriterType
FASTA.Writer(output::IO; width=70)

Create a data writer of the FASTA file format. The writer is a BioGenerics.IO.AbstractWriter. Writers take ownership of the underlying IO. Mutating or closing the underlying IO not using the writer is undefined behaviour. Closing the writer also closes the underlying IO.

See more examples in the FASTX documentation.

See also: FASTA.Record, FASTA.Reader

Arguments

  • output: Data sink to write to
  • width: Wrapping width of sequence characters. If < 1, no wrapping.

Examples

julia> FASTA.Writer(open("some_file.fna", "w")) do writer
    write(writer, record) # a FASTA.Record
end
source
FASTX.FASTA.validate_fastaFunction
validate_fasta(io::IO) >: Nothing

Check if io is a valid FASTA file. Return nothing if it is, and an instance of another type if not.

Examples

julia> validate_fasta(IOBuffer(">a bc\nTAG\nTA")) === nothing
true

julia> validate_fasta(IOBuffer(">a bc\nT>G\nTA")) === nothing
false
source