algorithm - Compressing sequence of unique sorted numbers -


this question has answer here:

i project working on, have sequence of numbers (around 2 billion). each number 4 bytes , unique. numbers sorted. goal read them ram asap in uncompressed format. don't care hdd space.

if store them uncompressed, need 2 billion*4 bytes = 8gb. take around 100 seconds read. can store data sequence of bits , require 2 billion/8 = 250mb. take around 3 seconds read.

i need read , uncompress them in 0.1-0.5 seconds (if possible) using normal hdd. don't care how long take compress data care how long takes decompress them , need done in few milliseconds.

the randomness of numbers not known.

the question is: kind of compression algorithm can compress numbers around 20-30mb decompression time of 100-200 milliseconds using i3-i5 cpu?

edit: maximum number in sequence 2 billion. that's why can store on bit array size of 250mb. size of sequence not 2 billion. can contain 1 2.000.000.000 numbers.

here 2 possible approaches:

  1. the asker proposes storing sequence of numbers bit string. e.g.: if number i in sequence, ith bit of bit string set one, otherwise it's zero. natural first thing try apply standard compression algorithms bit string , see happens.

  2. from phrasing of question, seems can treat numbers in sequence 4-byte ints. so, sequence stored represents around 2*109 out of possible 232 ints. means average difference between 2 successive numbers can't more ~2.147 = 232 / (2*109). so, maybe computing sequence of differences , try compressing that. since expect large fraction of successive differences going 1's , 2's, suspect sequence might compressible.


Comments

Popular posts from this blog

yii2 - Yii 2 Running a Cron in the basic template -

asp.net - 'System.Web.HttpContext' does not contain a definition for 'GetOwinContext' Mystery -

c# - MSDN OneNote Api: Navigate to never before opened page without opening a OneNote Application Window -