edit

BED

Description

BED is a text-based file format for representing genomic annotations like genes, transcripts, and so on. A BED file has tab-delimited and variable-length fields; the first three fields denoting a genomic interval are mandatory.

This is an example of RNA transcripts:

chr9    68331023    68424451    NM_015110   0   +
chr9    68456943    68486659    NM_001206   0   -

I/O tools for BED are provided from the GenomicFeatures.BED module, which exports following three types:

  • Reader type: BED.Reader
  • Writer type: BED.Writer
  • Element type: BED.Record

Examples

Here is a common workflow to iterate over all records in a BED file:

# Import the BED module.
using GenomicFeatures

# Open a BED file.
reader = open(BED.Reader, "data.bed")

# Iterate over records.
for record in reader
    # Do something on record (see Accessors section).
    chrom = BED.chrom(record)
    # ...
end

# Finally, close the reader.
close(reader)

If you repeatedly access records within specific ranges, it would be more efficient to construct an IntervalCollection object from a BED reader:

# Create an interval collection in memory.
icol = open(BED.Reader, "data.bed") do reader
    IntervalCollection(reader)
end

# Query overlapping records.
for interval in eachoverlap(icol, Interval("chrX", 40001, 51500))
    # A record is stored in the metadata field of an interval.
    record = metadata(interval)
    # ...
end

API

# GenomicFeatures.BED.ReaderType.

BED.Reader(input::IO; index=nothing)
BED.Reader(input::AbstractString; index=:auto)

Create a data reader of the BED file format.

The first argument specifies the data source. When it is a filepath that ends with .bgz, it is considered to be block compression file format (BGZF) and the function will try to find a tabix index file (.tbi) and read it if any. See http://www.htslib.org/doc/tabix.html for bgzip and tabix tools.

Arguments

  • input: data source
  • index: path to a tabix file

source

# GenomicFeatures.BED.WriterType.

BED.Writer(output::IO)

Create a data writer of the BED file format.

Arguments:

  • output: data sink

source

# GenomicFeatures.BED.RecordType.

BED.Record()

Create an unfilled BED record.

source

BED.Record(data::Vector{UInt8})

Create a BED record object from data.

This function verifies and indexes fields for accessors. Note that the ownership of data is transferred to a new record object.

source

BED.Record(str::AbstractString)

Create a BED record object from str.

This function verifies and indexes fields for accessors.

source

# GenomicFeatures.BED.chromFunction.

chrom(record::Record)::String

Get the chromosome name of record.

source

# GenomicFeatures.BED.chromstartFunction.

chromstart(record::Record)::Int

Get the starting position of record.

Note that the first base is numbered 1.

source

# GenomicFeatures.BED.chromendFunction.

chromend(record::Record)::Int

Get the end position of record.

source

# GenomicFeatures.BED.nameFunction.

name(record::Record)::String

Get the name of record.

source

# GenomicFeatures.BED.scoreFunction.

score(record::Record)::Int

Get the score between 0 and 1000.

source

# GenomicFeatures.BED.strandFunction.

strand(record::Record)::GenomicFeatures.Strand

Get the strand of record.

source

# GenomicFeatures.BED.thickstartFunction.

thickstart(record::Record)::Int

Get the starting position at which record is drawn thickly.

Note that the first base is numbered 1.

source

# GenomicFeatures.BED.thickendFunction.

thickend(record::Record)::Int

Get the end position at which record is drawn thickly.

source

# GenomicFeatures.BED.itemrgbFunction.

itemrgb(record::Record)::ColorTypes.RGB

Get the RGB value of record.

The return type is defined in ColorTypes.jl.

source

# GenomicFeatures.BED.blockcountFunction.

blockcount(record::Record)::Int

Get the number of blocks (exons) in record.

source

# GenomicFeatures.BED.blocksizesFunction.

blocksizes(record::Record)::Vector{Int}

Get the block (exon) sizes of record.

source

# GenomicFeatures.BED.blockstartsFunction.

blockstarts(record::Record)::Vector{Int}

Get the block (exon) starts of record.

Note that the first base is numbered 1.

source