Skip to main content

Thread: [Python] Unzipping a large list


i'm working large data files (several gigabytes) store interleaved data several channels: first value channel a, first value channel b, second value channel a, second value channel b, etc. also, there may more 2 channels sometimes. data stored unsigned 16-bit integers, byte order this:
php code:
[a10a11b10b11a20a21b20b21...] 
and on millions of data values. goal separate channels , save them individual files. have couple questions.

first.
separate out each channel on own, get:
php code:
[a10a11a20a21...]
[
b10b11b20b21...] 
this easy numpy (using numpy.reshape() function, , slicing result). think involves operation of interpreting binary values. want separate channels , save them individual files.

know loop, seems terribly inefficient given how data have go through.

way pull apart array this?

second.
since reading whole file @ once not option, going read in piecemeal, keep target files open whole time, , use file.flush() method on each target file after each block has been processed , written.

.flush() free memory used file buffer?

i'm working in python 2.7 on windows machine.

quote posted erdaron view post

easy numpy (using numpy.reshape() function, , slicing result). think involves operation of interpreting binary values. want separate channels , save them individual files.
numpy operates on views , not data, reshape nothing data, changes how interpreted.
approach looks efficient way can in python.
why should there interpretation of binary values required? numpy arrays can pretty binary representation wish (fortran/c order, big/little endian, every basic datatype etc)

using reshape, slicing , memory maps should simple , archive performance (for python).

e.g.:
php code:
import numpy as np
#200mb
100 1024**2

inp 
np.memmap("input.data"dtype=np.uint16mode='w+'shape=(n,))
inp[:] = np.zeros((n,))
inp[::4] = 1
inp
[1::4] = 1
inp
.flush()
del inp

import time
time.time()
inp np.memmap("input.data"dtype=np.uint16mode='r'shape=(n,))
np.memmap("a.data"dtype=np.uint16mode='w+'shape=(n/2,))
np.memmap("b.data"dtype=np.uint16mode='w+'shape=(n/2,))
tmp inp.reshape((n/2,2))
a[:] = tmp[::2,:].reshape(((n/2,))
b[:] = tmp[1::2,:].reshape(((n/2,))
a.flush()
b.flush()

print 
"%g mb/s" % ((21024**2) / (time.time() - )) 
reaches 50mb/s write speed quite ok, disk manages 65-70mb.
(note input cached in memory if small, throughput under realistic scenarios lower)

if need support 32 bit operating systems have use sliding memory maps or normal file io.

since reading whole file @ once not option, going read in piecemeal, keep target files open whole time, , use file.flush() method on each target file after each block has been processed , written.

.flush() free memory used file buffer?
the file object has underlying fixed size buffers flushed when full. should work file size. should not need worry this.


Forum The Ubuntu Forum Community Ubuntu Specialised Support Development & Programming Programming Talk [SOLVED] [Python] Unzipping a large list


Ubuntu

Comments

Popular posts from this blog

Could not place because the source rectangle is empty

Thread: Using smartcard reader with vpnc

Adobe Font Folio 7.0 or just 7?