streamio package

Submodules

streamio.sort module

sort

streamio.sort.merge(*iterables, **kwargs)

Take a list of ordered iterables; return as a single ordered generator.

Parameters:key – function, for each item return key value

Directly borrowed from: http://stackoverflow.com/questions/5023266/merge-join-two-generators-in-python

streamio.sort.mergesort(filename, output=None, key=None, maxitems=1000000.0, progress=True)

Given an input file sort it by performing a merge sort on disk.

Parameters:
  • filename (str or py._path.local.LocalPath) – Either a filename as a str or a py._path.local.LocalPath instance.
  • output (str or py._path.local.LocalPath or None) – An optional output filename as a str or a py._path.local.LocalPath instance.
  • key (function or None) – An optional key to sort the data on.
  • maxitems (int) – Maximum number of items to hold in memory at a time.
  • progress (bool) – Whether or not to display a progress bar

This uses py._path.local.LocalPath.make_numbered_dir to create temporry scratch space to work with when splitting the input file into sorted chunks. The mergesort is processed iteratively in-memory using the ~merge function which is almost identical to ~heapq.merge but adds in the support of an optional key function.

streamio.stat module

stat

streamio.stat.minmax(xs)

Return the min and max values for the given iterable

Parameters:xs (Any iterable of single numerical values.) – An iterable of values

This function returns both the min and max of the given iterable by computing both at once and iterating/consuming the iterable once.

streamio.stream module

stream

streamio.stream.stream(filename, encoding='utf-8', skipblank=True, strip=True, stripchars='\r\n\t ')

Stream every line in the given file.

Parameters:
  • encoding (str or None) – A str indicating the charset/encoding to use.
  • filename (str, py._path.local.LocalPath or file.) – A str filename, A py._path.local.LocalPath instance or open file instnace.
  • skipblank (bool) – Whehter to skip blank lines (sometimes undesirable)
  • strip (bool) – Whehter to strip lines of surrounding whitespace (sometimes undesirable)
  • stripchars (list, tuple or str) – An iterable of characters to strip from the surrounding line. line.strip(...) is used.

Each line in the file is read, stripped of surrounding whitespace and returned iteratively. Blank lines are ignored. If they keyword argument encoding is provided and is not None each line in the input strema will be decoded using the given encoding, if None will disable unicode decoding.

streamio.stream.csvstream(filename, encoding='utf-8', stripchars='\r\n')

Stream every line in the given file interpreting each line as CSV.

Parameters:
  • filename (str, py._path.local.LocalPath or file.) – A str filename, A py._path.local.LocalPath instance or open file instnace.
  • encoding (str) – A str indicating the charset/encoding to use.
  • stripchars (list, tuple or str) – An iterable of characters to strip from the surrounding line. line.strip(...) is used.

This is a wrapper around stream where the stream is treated as CSV.

streamio.stream.csvdictstream(filename, encoding='utf-8', fields=None, stripchars='\r\n')

Stream every line in the given file interpreting each line as a dictionary of fields to items.

Parameters:
  • filename (str, py._path.local.LocalPath or file.) – A str filename, A py._path.local.LocalPath instance or open file instnace.
  • encoding (str) – A str indicating the charset/encoding to use.
  • stripchars (list, tuple or str) – An iterable of characters to strip from the surrounding line. line.strip(...) is used.

This is a wrapper around csvstream where the stream is treated as dict of field(s) to item(s).

streamio.stream.jsonstream(filename, encoding='utf-8')

Stream every line in the given file interpreting each line as JSON.

Parameters:
  • filename (str, py._path.local.LocalPath or file.) – A str filename, A py._path.local.LocalPath instance or open file instnace.
  • encoding (str) – A str indicating the charset/encoding to use.
  • stripchars (list, tuple or str) – An iterable of characters to strip from the surrounding line. line.strip(...) is used.

This is a wrappedaround stream except that it wraps each line in a dumps call essentially treating each line as a piece of valid JSON.

streamio.stream.compress(iterable, level=9, encoding='utf-8')

Compress the given iterable of bytes using zlib compressin

Parameters:
  • iterable (An iterable of bytes (If str will be encoded)) – An iterable of bytes to compress using zlib (ZIP)
  • level (int (Default: 9)) – An optional Compression Level
  • encoding (str (Default: utf-8)) – An optional encoding to use when dealing with an iterable of str
Returns:

An iterable compressed with zlib

Return type:

iterable stream of bytes

streamio.version module

Version Module

So we only have to maintain version information in one place!

Module contents

streamio - reading, writing and sorting large files.

streamio is a simple library of functions designed to read, write and sort large files using iterators so that the operations will successfully complete on systems with limited RAM.

copyright:CopyRight (C) 2013 by James Mills
streamio.minmax(xs)

Return the min and max values for the given iterable

Parameters:xs (Any iterable of single numerical values.) – An iterable of values

This function returns both the min and max of the given iterable by computing both at once and iterating/consuming the iterable once.

streamio.merge(*iterables, **kwargs)

Take a list of ordered iterables; return as a single ordered generator.

Parameters:key – function, for each item return key value

Directly borrowed from: http://stackoverflow.com/questions/5023266/merge-join-two-generators-in-python

streamio.mergesort(filename, output=None, key=None, maxitems=1000000.0, progress=True)

Given an input file sort it by performing a merge sort on disk.

Parameters:
  • filename (str or py._path.local.LocalPath) – Either a filename as a str or a py._path.local.LocalPath instance.
  • output (str or py._path.local.LocalPath or None) – An optional output filename as a str or a py._path.local.LocalPath instance.
  • key (function or None) – An optional key to sort the data on.
  • maxitems (int) – Maximum number of items to hold in memory at a time.
  • progress (bool) – Whether or not to display a progress bar

This uses py._path.local.LocalPath.make_numbered_dir to create temporry scratch space to work with when splitting the input file into sorted chunks. The mergesort is processed iteratively in-memory using the ~merge function which is almost identical to ~heapq.merge but adds in the support of an optional key function.

streamio.stream(filename, encoding='utf-8', skipblank=True, strip=True, stripchars='\r\n\t ')

Stream every line in the given file.

Parameters:
  • encoding (str or None) – A str indicating the charset/encoding to use.
  • filename (str, py._path.local.LocalPath or file.) – A str filename, A py._path.local.LocalPath instance or open file instnace.
  • skipblank (bool) – Whehter to skip blank lines (sometimes undesirable)
  • strip (bool) – Whehter to strip lines of surrounding whitespace (sometimes undesirable)
  • stripchars (list, tuple or str) – An iterable of characters to strip from the surrounding line. line.strip(...) is used.

Each line in the file is read, stripped of surrounding whitespace and returned iteratively. Blank lines are ignored. If they keyword argument encoding is provided and is not None each line in the input strema will be decoded using the given encoding, if None will disable unicode decoding.

streamio.csvstream(filename, encoding='utf-8', stripchars='\r\n')

Stream every line in the given file interpreting each line as CSV.

Parameters:
  • filename (str, py._path.local.LocalPath or file.) – A str filename, A py._path.local.LocalPath instance or open file instnace.
  • encoding (str) – A str indicating the charset/encoding to use.
  • stripchars (list, tuple or str) – An iterable of characters to strip from the surrounding line. line.strip(...) is used.

This is a wrapper around stream where the stream is treated as CSV.

streamio.jsonstream(filename, encoding='utf-8')

Stream every line in the given file interpreting each line as JSON.

Parameters:
  • filename (str, py._path.local.LocalPath or file.) – A str filename, A py._path.local.LocalPath instance or open file instnace.
  • encoding (str) – A str indicating the charset/encoding to use.
  • stripchars (list, tuple or str) – An iterable of characters to strip from the surrounding line. line.strip(...) is used.

This is a wrappedaround stream except that it wraps each line in a dumps call essentially treating each line as a piece of valid JSON.

streamio.csvdictstream(filename, encoding='utf-8', fields=None, stripchars='\r\n')

Stream every line in the given file interpreting each line as a dictionary of fields to items.

Parameters:
  • filename (str, py._path.local.LocalPath or file.) – A str filename, A py._path.local.LocalPath instance or open file instnace.
  • encoding (str) – A str indicating the charset/encoding to use.
  • stripchars (list, tuple or str) – An iterable of characters to strip from the surrounding line. line.strip(...) is used.

This is a wrapper around csvstream where the stream is treated as dict of field(s) to item(s).

streamio.compress(iterable, level=9, encoding='utf-8')

Compress the given iterable of bytes using zlib compressin

Parameters:
  • iterable (An iterable of bytes (If str will be encoded)) – An iterable of bytes to compress using zlib (ZIP)
  • level (int (Default: 9)) – An optional Compression Level
  • encoding (str (Default: utf-8)) – An optional encoding to use when dealing with an iterable of str
Returns:

An iterable compressed with zlib

Return type:

iterable stream of bytes

Table Of Contents

Previous topic

API Documentation

Next topic

TODO

This Page