Records

FASTX files are considered a sequence of Records, FASTA.Record for FASTA files and FASTQ.Record for FASTQ. For convenience, FASTARecord and FASTQRecord are aliases of FASTA.Record and FASTQ.Record.

A Record object represent the text of the FASTX record as it is, e.g the following FASTA record:

>some header here
TAGATGAA
AA

Is stored in a FASTA.Record object roughly as its constituent bytes, plus some metadata. There is no notion in the record object of being a DNA or RNA sequence - it's simply an array of bytes.

Records can be constructed from raw parts (i.e. description and sequence and, for FASTQ, quality), where

  • description::AbstractString
  • sequence::Union{AbstractString, BioSequence}
  • quality::Union{AbstractString, Vector{<:Number}}

Alternatively, they can be parsed directly from a string or an AbstractVector{UInt8}.

julia> record = parse(FASTARecord, ">abc\nAGCC\nCCGA");

julia> record2 = FASTARecord("abc", "AGCCCCGA");

julia> record == record2
true

Records can be queried for their information, namely identifier, description and sequence (and quality, for FASTQ). By default, this returns an AbstractString view into the Record's data:

julia> record = parse(FASTARecord, ">ident desc\nUGU\nGA");

julia> (identifier(record), description(record), sequence(record))
("ident", "ident desc", "UGUGA")

However, you can ask for getting the sequences as a String or any subtype of BioSequence:

julia> record = parse(FASTARecord, ">abc\nUGC\nCCA");

julia> using BioSequences # LongRNA defined in BioSequences.jl

julia> sequence(LongRNA{2}, record)
6nt RNA Sequence:
UGCCCA

julia> sequence(String, record)
"UGCCCA"

The number of bytes in the sequence of a Record can be queried using seqsize:

julia> record = parse(FASTARecord, ">abc\nUGC\nCCA");

julia> seqsize(record)
6

Reference:

FASTX.identifierFunction
identifier(record::Record)::AbstractString

Get the sequence identifier of record. The identifier is the description before any whitespace. If the identifier is missing, return an empty string. Returns an AbstractString view into the record. If the record is overwritten, the string data will be corrupted.

See also: description, sequence

Examples

julia> record = parse(FASTA.Record, ">ident_here some descr \nTAGA");

julia> identifier(record)
"ident_here"
source
FASTX.descriptionFunction
description(record::Record)::AbstractString

Get the description of record. The description is the entire header line, minus the leading > or @ symbols for FASTA/FASTQ records, respectively, including trailing whitespace. Returns an AbstractString view into the record. If the record is overwritten, the string data will be corrupted.

See also: identifier, sequence

Examples

julia> record = parse(FASTA.Record, ">ident_here some descr \nTAGA");

julia> description(record)
"ident_here some descr "
source
FASTX.sequenceFunction
sequence([::Type{S}], record::Record, [part::UnitRange{Int}])::S

Get the sequence of record.

S can be either a subtype of BioSequences.BioSequence, AbstractString or String. If elided, S defaults to an AbstractString subtype. If part argument is given, it returns the specified part of the sequence.

See also: identifier, description

Examples

julia> record = parse(FASTQ.Record, "@read1\nTAGA\n+\n;;]]");

julia> sequence(record)
"TAGA"

julia> sequence(LongDNA{2}, record)
4nt DNA Sequence:
TAGA
source
FASTX.seqsizeFunction
seqsize(::Record)::Int

Get the number of bytes in the sequence of a Record. Note that in the presence of non-ASCII characters, this may differ from length(sequence(record)).

See also: sequence

Examples

julia> seqsize(parse(FASTA.Record, ">hdr\nKRRLPW\nYHS"))
9

julia> seqsize(parse(FASTA.Record, ">hdr\nαβγδϵ"))
10
source