sort
Take a list of ordered iterables; return as a single ordered generator.
Parameters: | key – function, for each item return key value |
---|
Directly borrowed from: http://stackoverflow.com/questions/5023266/merge-join-two-generators-in-python
Given an input file sort it by performing a merge sort on disk.
Parameters: |
|
---|
This uses py._path.local.LocalPath.make_numbered_dir to create temporry scratch space to work with when splitting the input file into sorted chunks. The mergesort is processed iteratively in-memory using the ~merge function which is almost identical to ~heapq.merge but adds in the support of an optional key function.
stat
Return the min and max values for the given iterable
Parameters: | xs (Any iterable of single numerical values.) – An iterable of values |
---|
This function returns both the min and max of the given iterable by computing both at once and iterating/consuming the iterable once.
stream
Stream every line in the given file.
Parameters: |
|
---|
Each line in the file is read, stripped of surrounding whitespace and returned iteratively. Blank lines are ignored. If they keyword argument encoding is provided and is not None each line in the input strema will be decoded using the given encoding, if None will disable unicode decoding.
Stream every line in the given file interpreting each line as CSV.
Parameters: |
|
---|
This is a wrapper around stream where the stream is treated as CSV.
Stream every line in the given file interpreting each line as a dictionary of fields to items.
Parameters: |
|
---|
This is a wrapper around csvstream where the stream is treated as dict of field(s) to item(s).
Stream every line in the given file interpreting each line as JSON.
Parameters: |
|
---|
This is a wrappedaround stream except that it wraps each line in a dumps call essentially treating each line as a piece of valid JSON.
Compress the given iterable of bytes using zlib compressin
Parameters: |
|
---|---|
Returns: | An iterable compressed with zlib |
Return type: | iterable stream of bytes |
Version Module
So we only have to maintain version information in one place!
streamio - reading, writing and sorting large files.
streamio is a simple library of functions designed to read, write and sort large files using iterators so that the operations will successfully complete on systems with limited RAM.
copyright: | CopyRight (C) 2013 by James Mills |
---|
Return the min and max values for the given iterable
Parameters: | xs (Any iterable of single numerical values.) – An iterable of values |
---|
This function returns both the min and max of the given iterable by computing both at once and iterating/consuming the iterable once.
Take a list of ordered iterables; return as a single ordered generator.
Parameters: | key – function, for each item return key value |
---|
Directly borrowed from: http://stackoverflow.com/questions/5023266/merge-join-two-generators-in-python
Given an input file sort it by performing a merge sort on disk.
Parameters: |
|
---|
This uses py._path.local.LocalPath.make_numbered_dir to create temporry scratch space to work with when splitting the input file into sorted chunks. The mergesort is processed iteratively in-memory using the ~merge function which is almost identical to ~heapq.merge but adds in the support of an optional key function.
Stream every line in the given file.
Parameters: |
|
---|
Each line in the file is read, stripped of surrounding whitespace and returned iteratively. Blank lines are ignored. If they keyword argument encoding is provided and is not None each line in the input strema will be decoded using the given encoding, if None will disable unicode decoding.
Stream every line in the given file interpreting each line as CSV.
Parameters: |
|
---|
This is a wrapper around stream where the stream is treated as CSV.
Stream every line in the given file interpreting each line as JSON.
Parameters: |
|
---|
This is a wrappedaround stream except that it wraps each line in a dumps call essentially treating each line as a piece of valid JSON.
Stream every line in the given file interpreting each line as a dictionary of fields to items.
Parameters: |
|
---|
This is a wrapper around csvstream where the stream is treated as dict of field(s) to item(s).
Compress the given iterable of bytes using zlib compressin
Parameters: |
|
---|---|
Returns: | An iterable compressed with zlib |
Return type: | iterable stream of bytes |