Table Of Contents

Previous topic

Generic utilities

This Page

Core objects

Bag object

class Bag(obj=None, **kw)

Generic extension to dictionaries, where elements can be accessed both by keys and attributes.

The parameters required to create a Bag can be:

  • A regular dictionary or another Bag.
  • A list of tuples (key, value).
  • A list of optional arguments.

As a subclass of dict, a Bag object inherits its attributes and methods.5

Examples

>>> bag = Bag(a=1, b=2, c=3)
{'a': 1, 'c': 3, 'b': 2}
>>> bag = Bag(bag)
{'a': 1, 'c': 3, 'b': 2}
>>> bag = Bag([('a',1), ('b',2), ('c',3)])
{'a': 1, 'c': 3, 'b': 2}

Methods

clear
copy
fromkeys
get
has_key
items
iteritems
iterkeys
itervalues
keys
pop
popitem
set_defaults
setdefault
update
values

Cluster object

class Cluster(darray, increment=1, operator=<ufunc 'less_equal'>)

Groups consecutive data from an array according to a clustering condition.

A cluster is defined as a group of consecutive values differing by at most the increment value.

Missing values are not handled: the input sequence must therefore be free of missing values.

Parameters:

darray : ndarray

Input data array to clusterize.

increment : {float}, optional

Increment between two consecutive values to group. By default, use a value of 1.

operator : {function}, optional

Comparison operator for the definition of clusters. By default, use numpy.less_equal.

Examples

>>> A = [0, 0, 1, 2, 2, 2, 3, 4, 3, 4, 4, 4]
>>> klust = cluster(A,0)
>>> [list(_) for _ in klust.clustered]
[[0, 0], [1], [2, 2, 2], [3], [4], [3], [4, 4, 4]]
>>> klust.uniques
array([0, 1, 2, 3, 4, 3, 4])
>>> x = [ 1.8, 1.3, 2.4, 1.2, 2.5, 3.9, 1. , 3.8, 4.2, 3.3, 
...       1.2, 0.2, 0.9, 2.7, 2.4, 2.8, 2.7, 4.7, 4.2, 0.4]
>>> Cluster(x,1).starts
array([ 0,  2,  3,  4,  5,  6,  7, 10, 11, 13, 17, 19])
>>> Cluster(x,1.5).starts
array([ 0,  6,  7, 10, 13, 17, 19])
>>> Cluster(x,2.5).starts
array([ 0,  6,  7, 19])
>>> Cluster(x,2.5,greater).starts
array([ 0,  1,  2,  3,  4,  5,  8,  9, 10, 
...    11, 12, 13, 14, 15, 16, 17, 18])
>>> y = [ 0, -1, 0, 0, 0, 1, 1, -1, -1, -1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0]
>>> Cluster(y,1).starts
array([ 0,  1,  2,  5,  7, 10, 12, 16, 18])

Attributes

inishape   Shape of the argument array (stored for resizing).
inisize   Size of the argument array.
uniques sequence List of unique cluster values, as they appear in chronological order.
slices sequence List of the slices corresponding to each cluster of data.
starts ndarray Array of the indices at which the clusters start.
clustered list List of clustered data.

Methods

grouped_limits
grouped_slices
mark_greaterthan
markonsize

Initializes instance.

Parameters:

darray : ndarray

Input data array to clusterize.

increment : {float}, optional

Increment between two consecutive values to group. By default, use a value of 1.

operator : {function}, optional

Comparison operator for the definition of clusters. By default, use numpy.less_equal

Methods

grouped_limits
grouped_slices
mark_greaterthan
markonsize

Methods

Cluster.grouped_limits()

Returns a dictionary with the unique values of self as keys, and a list of tuples (starting index, ending index) for the corresponding values.

Cluster.grouped_slices()

Returns a dictionary with the unique values of self as keys, and a list of slices for the corresponding values.

See also

Cluster.grouped_limits
that does the same thing
Cluster.mark_greaterthan(sizemin)

Shortcut for markonsize(greater_equal,sizemin). Thus, the command outputs False for clusters larger than sizemin, and True for clusters smaller than sizemin.

Parameters:

sizemin : int

Minimum size of the clusters.

See also

markonsize
Creates a mask for the clusters that do not meet a size requirement.
Cluster.markonsize(operator, sizethresh)

Creates a mask for the clusters that do not meet a size requirement. Thus, outputs False if the size requirement is met, True otherwise.

Parameters:

operator : function

Comparison operator

sizethresh : float

Requirement for the sizes of the clusters