HashTable

class hirola.HashTable(max, dtype, almost_full=(0.9, 'warn'))

The raw core behind a set or a dictionary’s keys.

A hash table resembles a dictionary where its keys are the keys array but the values are an just an enumeration. It’s core API:

  • Use add to add new keys if they’ve not been already added.

  • The keys lists all keys that have been added in the order that they were added.

  • Use get to retrieve indices of keys in keys.

__init__(max, dtype, almost_full=(0.9, 'warn'))[source]
Parameters
  • max (Number) – An upper bound for the number of keys which can fit in this table. Sets the max attribute.

  • dtype (Union[dtype, generic, type, str, list, tuple]) – The data type for the table’s keys. Sets the dtype attribute.

  • almost_full – The handling of almost full hash tables. Sets the almost_full attribute.

The max parameter is silently normalised to int and clipped to a minimum of 1 if it is less than 1.

add(keys)[source]

Add keys to the table.

Any key which is already in keys is not added again. Returns the index of each key in keys similarly to get.

Raises
Warns

exceptions.AlmostFull – If the table becomes nearly full and is configured to warn (set by the almost_full attribute).

Return type

ndarray

contains(keys)[source]

Check if a key or keys are in the table.

Parameters

keys – Elements to check for.

Return type

Union[bool, ndarray]

Returns

Either true or false for each key in keys.

This function is equivalent to but faster than table.get(keys) != -1. To check only one key you may also use key in table.

copy(usable=True)[source]

Deep copy this table.

Parameters

usable – If set to false and this table has called destroy, then the destroyed state is propagated to copies.

Return type

HashTable

Returns

Another HashTable with the same size, dtype and content.

destroy()[source]

Release a writeable version of keys and permanently disable this table.

Return type

ndarray

Returns

A writeable shallow copy of keys.

Modifying keys would cause an internal meltdown and is therefore blocked by setting the writeable flag to false. However, if you no longer need this table then it is safe to do as you please with the keys array. This function grants you full access to keys but blocks you from adding to or getting from this table in the future. If you want both a writeable keys array and functional use of this table then use table.keys.copy().

get(keys, default=- 1)[source]

Lookup indices of keys in keys.

Parameters
  • keys – Elements to search for.

  • default – Returned inplace of a missing key. May be any object.

Return type

ndarray

Returns

The index/indices of keys in this table’s keys. If a key is not there, returns -1 in its place.

Raises
resize(new_size, in_place=False)[source]

Copy the contents of this table into a new HashTable of a different size.

Parameters
  • new_size – The new value for max.

  • in_place – If true resize this table. Otherwise make a modified copy.

Return type

HashTable

Returns

Either a new hash table with attributes keys and length matching those from this table or this table.

Raises

ValueError – If requested size is too small (new_size < len(table)).

Changed in version 0.3.0: Add the in_place option.

property almost_full

The response to an almost full hash table. Hash tables become dramatically slower, the closer they get to being full. Hirola’s default behaviour is to warn if this happens but can be configured to ignore the warning, raise an error or automatically make a new, larger table.

This is an overloaded parameter.

  • almost_full = None:

    Disable the almost full warning entirely.

  • almost_full = (0.8, "warn"):

    Issue a hirola.exceptions.AlmostFull warning if the table reaches 80% full.

  • almost_full = (0.8, "raise"):

    Raise a hirola.exceptions.AlmostFull exception if the table reaches 80% full.

  • almost_full = (0.7, 2):

    Whenever the table reaches 70% full, double the table size.

For reference, Python’s dict grows 8-fold when two thirds full. To mimic this behaviour, set table.almost_full = (2 / 3, 8).

property dtype: numpy.dtype

The data type for the table’s keys.

Use a structured numpy.dtype to indicate that several numbers make up a single key.

Examples

  • Individual float keys: HashTable(100, np.float64).

  • Triplets of floats keys: HashTable(100, (np.float64, 3)).

  • Strings of up to 20 characters: HashTable(100, (str, 20)).

  • Mixed types records data: HashTable(100, [("firstname", str, 20), ("lastname", str, 20), ("age", int)]).

Return type

dtype

property key_size: int

The number of bytes per key.

Return type

int

property keys: numpy.ndarray

The unique elements in the table.

Unlike Python’s builtin dict, this keys is a property rather than a method. Keys must be immutable hence this array is a readonly view. See destroy to release the writable version.

Return type

ndarray

property length: int

The number of elements currently in this table. Aliased via len(table).

Return type

int

property max: int

The maximum number of elements allowed in this table.

Adding keys to exceed this maximum will trigger a HashTableFullError. Nearly full tables will add and get much slower. For best performance, choose a maximum size which is 25-50% larger than you expect it to get.

Return type

int