algorithm - Compressing sequence of unique sorted numbers -
this question has answer here:
- compression algorithm sorted integers 5 answers
i project working on, have sequence of numbers (around 2 billion). each number 4 bytes , unique. numbers sorted. goal read them ram asap in uncompressed format. don't care hdd space.
if store them uncompressed, need 2 billion*4 bytes = 8gb. take around 100 seconds read. can store data sequence of bits , require 2 billion/8 = 250mb. take around 3 seconds read.
i need read , uncompress them in 0.1-0.5 seconds (if possible) using normal hdd. don't care how long take compress data care how long takes decompress them , need done in few milliseconds.
the randomness of numbers not known.
the question is: kind of compression algorithm can compress numbers around 20-30mb decompression time of 100-200 milliseconds using i3-i5 cpu?
edit: maximum number in sequence 2 billion. that's why can store on bit array size of 250mb. size of sequence not 2 billion. can contain 1 2.000.000.000 numbers.
here 2 possible approaches:
the asker proposes storing sequence of numbers bit string. e.g.: if number i in sequence, ith bit of bit string set one, otherwise it's zero. natural first thing try apply standard compression algorithms bit string , see happens.
from phrasing of question, seems can treat numbers in sequence 4-byte ints. so, sequence stored represents around 2*109 out of possible 232 ints. means average difference between 2 successive numbers can't more ~2.147 = 232 / (2*109). so, maybe computing sequence of differences , try compressing that. since expect large fraction of successive differences going 1's , 2's, suspect sequence might compressible.
Comments
Post a Comment