AtMega

March 24, 2014

i'm working large data files (several gigabytes) store interleaved data several channels: first value channel a, first value channel b, second value channel a, second value channel b, etc. also, there may more 2 channels sometimes. data stored unsigned 16-bit integers, byte order this:

php code:

[a10, a11, b10, b11, a20, a21, b20, b21...]

and on millions of data values. goal separate channels , save them individual files. have couple questions.

first.
separate out each channel on own, get:

php code:

[a10, a11, a20, a21...] [b10, b11, b20, b21...]

this easy numpy (using numpy.reshape() function, , slicing result). think involves operation of interpreting binary values. want separate channels , save them individual files.

know loop, seems terribly inefficient given how data have go through.

way pull apart array this?

second.
since reading whole file @ once not option, going read in piecemeal, keep target files open whole time, , use file.flush() method on each target file after each block has been processed , written.

.flush() free memory used file buffer?

i'm working in python 2.7 on windows machine.

posted erdaron

easy numpy (using numpy.reshape() function, , slicing result). think involves operation of interpreting binary values. want separate channels , save them individual files.

numpy operates on views , not data, reshape nothing data, changes how interpreted.
approach looks efficient way can in python.
why should there interpretation of binary values required? numpy arrays can pretty binary representation wish (fortran/c order, big/little endian, every basic datatype etc)

using reshape, slicing , memory maps should simple , archive performance (for python).

e.g.:

php code:

import numpy as np #200mb n = 100 * 1024**2 inp = np.memmap("input.data", dtype=np.uint16, mode='w+', shape=(n,)) inp[:] = np.zeros((n,)) inp[::4] = 1 inp[1::4] = 1 inp.flush() del inp import time s = time.time() inp = np.memmap("input.data", dtype=np.uint16, mode='r', shape=(n,)) a = np.memmap("a.data", dtype=np.uint16, mode='w+', shape=(n/2,)) b = np.memmap("b.data", dtype=np.uint16, mode='w+', shape=(n/2,)) tmp = inp.reshape((n/2,2)) a[:] = tmp[::2,:].reshape(((n/2,)) b[:] = tmp[1::2,:].reshape(((n/2,)) a.flush() b.flush() print "%g mb/s" % ((2* n / 1024**2) / (time.time() - s ))

reaches 50mb/s write speed quite ok, disk manages 65-70mb.
(note input cached in memory if small, throughput under realistic scenarios lower)

if need support 32 bit operating systems have use sliding memory maps or normal file io.

since reading whole file @ once not option, going read in piecemeal, keep target files open whole time, , use file.flush() method on each target file after each block has been processed , written.

.flush() free memory used file buffer?

the file object has underlying fixed size buffers flushed when full. should work file size. should not need worry this.

Forum The Ubuntu Forum Community Ubuntu Specialised Support Development & Programming Programming Talk [SOLVED] [Python] Unzipping a large list

Ubuntu

Search This Blog

AtMega

Thread: [Python] Unzipping a large list

Comments

Post a Comment

Popular posts from this blog

Thread: Firefox print dialog doesn't remember settings

Error 400 - Photoshop services are not available

After Effects error:creating resource file on Windows