Indexing
Base.getindex
is defined for ReadDatastores:
julia> ds = open(PairedReads{DNAAlphabet{2}}, "ecoli-test-paired.prseq", "my-ecoli-pe")
Paired Read Datastore 'my-ecoli-pe': 20 reads (10 pairs)
julia> ds[5]
300nt DNA Sequence:
ACATGCACTTCAACGGCATTACTGGTGACCTCTTCGTCC…TCTATCAACGCAAAAGGGTTACACAGATAATCGTCAGCT
Indexing a read datastore creates a new sequence. If you want to load a sequence from a datastore and into an existing sequence, then you can use the load_sequence!
method.
julia> seq = LongSequence{DNAAlphabet{2}}()
0nt DNA Sequence:
< EMPTY SEQUENCE >
julia> load_sequence!(ds, 6, seq)
300nt DNA Sequence:
ATTACTGCGATTACTGCTGCGAATTTTTTCATGTTTATT…GTCCACTGGTTTACACAAGGTCGTAAGGGAAAAGAGGCG
julia> seq
300nt DNA Sequence:
ATTACTGCGATTACTGCTGCGAATTTTTTCATGTTTATT…GTCCACTGGTTTACACAAGGTCGTAAGGGAAAAGAGGCG
Iteration
The ReadDatastore types also support the Base.iterate
interface:
julia> collect(ds)
20-element Array{LongSequence{DNAAlphabet{2}},1}:
GGGCTTTAAAATCCACTTTTTCCATATCGATAGTCACGT…ATTTCTTCGATTCTTCTTTGTCACCGCAGCCAGCAAGAG
GTGGGTTTTTATCGGCTGGCACATGTGTTGGGACAATTT…GGCTTTCAATACGCTGTTTTCCCTCGTTGTTTCATCTGT
TGAACTCCACATCCTGCGGATCGTAAACCGTCACCTCTT…TCTTCCAGGCAGGCCGCCAGGGTATCACCTTCCAGACCA
GATGAATCTGGCGGTTATTAACGGTAACAATAACCAGCA…AGACGGCAAACCGGCTGCAGGCGGTAGGTTGTTGCAGGT
ACATGCACTTCAACGGCATTACTGGTGACCTCTTCGTCC…TCTATCAACGCAAAAGGGTTACACAGATAATCGTCAGCT
ATTACTGCGATTACTGCTGCGAATTTTTTCATGTTTATT…GTCCACTGGTTTACACAAGGTCGTAAGGGAAAAGAGGCG
CGGTTGAGTTCAAAGGCAAAGATTTGCTTGCGCTGTCGC…TTTTCCGGCGGCGAGAAAAAGCGCAACGATTTTTTGCAA
TTCGTCCCTGATATAGCACATGAACGTAATCAGGCTTGA…AATCTTCCGGCATCTTCAGGAGAGCGATTTTCTCTTCCA
ACGACACATTACCGGAAATTCAGGCCGACCCGGACAGGC…TTGAACAACACGGTGGTACAATTCAGGTCGCAAGCCAGG
TCCACCACCAGAATATCGATATTATCGTGCGTCATCCTT…TCACGCCCGCGCCGCTTTCGCTGGCCGTCACGCTAATCA
CGTAACTTTATTCATATCTCTTCCCCCTCCCTGTACTTC…CTGTTACCGCATGGCGGCAGTGCGCTGGTCGATATGACC
ATCGGGTAGGGGACGGAACGAATACGACAGTCAATATTC…AAGACTTTATCGTGCGGTCCGAACCGACTTTGTGGCGGC
GCCCTGGAGCTGGTGAAAGAAGGTCGAGCGCAAGCCTGT…CAATCCTCGCGTGGCGTTGCTCAATATTGGTGAAGAAGA
GAAAGGAACATCCTGACAACACCTTCCATCGTCTTTAAT…ATAAAGGCAAATTGCACCACCATGATGCTGTCCCAATCA
GTCTGGTGGTGCCTCTTTACTTAAGGAATTTCATCCTGT…AACGATGCCAGGCACCTGCGAAACTTTCCTGCACCAGCC
GACCGTTTTTCCCCAATCCGAGAACGCATAAATCCAGAC…TTTCTTCCCGGTAATGATACGTCACTATTGGAGTGGCCC
AGAGGCCACAGCGCGCCCATAATGGCGACTGAAAGCCAG…TTCACCGCGGTGACCGGAATCAGGGCAAATTCGACATGT
AAAAGGATCGCCGACCTTAACCATTCTGAATGTGATTGG…CTGGTGCCTGTCATATTTCGAACTCTGGGGGGACAGCAT
TGAGCAAATATGCCCGACCCAGCCTCATGACAGCGATAT…ACCGAAAAAAAAGTAATCGTCGGCATGTCCGGCGGTGTC
AGGCTTTAAATTTGATCTCTTTGTTGCACAGAATATCCG…GCCAGGAAGAAACGGAGGAACCGACACCGCCGGCCATGC
Buffers
When iterating over a ReadDatastore
either using Base.getindex
or load_sequence!
, you can sacrifice some memory for a buffer, to reduce the number of times the hard disk is read, and speed up sequential access and iteration. You can use the buffer
method to wrap a datastore in such a buffer.
julia> bds = buffer(ds)
Buffered Paired Read Datastore 'my-ecoli-pe': 20 reads (10 pairs)
julia> for i in eachindex(bds)
load_sequence!(bds, i, seq)
println(length(seq))
end
297
300
299
300
300
300
299
300
300
300
300
300
299
300
300
300
300
300
300
300