cminject.utils package

A generic utility module. Contains code for parsing arguments, data visualization and analysis, high-performance interpolation, performance utilities, global configuration pub-sub, …

Submodules

cminject.utils.args module

Utility functions for handling commandline arguments.

class cminject.utils.args.SetupArgumentParser(*args, **kwargs)

Bases: argparse.ArgumentParser

An argparse.ArgumentParser subclass specifically suited to be returned by a Setup subclass’s get_parser() method. Doesn’t print that it has its own -h option (since that’s never exposed), and uses the SetupHelpFormatter to avoid printing the usage.

class cminject.utils.args.SetupHelpFormatter(prog, indent_increment=2, max_help_position=24, width=None)

Bases: argparse.MetavarTypeHelpFormatter

A help formatter for argparse.ArgumentParsers that are returned by a Setup subclass’s get_parser() method. Omits the “usage: <progname> arg1 [arg2] …” line at the start.

cminject.utils.args.distribution_description(x)

Defines a custom argparse type for a distribution description. :param x: The string value to parse. :return: A Distribution instance.

cminject.utils.args.natural_number(x)

Defines a custom argparse type for a natural number (excluding 0). :param x: The string value to parse. :return: An int value containing the natural number. :raises: argparse.ArgumentTypeError

cminject.utils.args.parse_dimension_description(s: str) Union[str, Callable[[numpy.array], Any]]

Parses dimension descriptions that can be of the following format:

  • x, y, z, vx, vy, vz refer to the first (x), second (y) and last (z) component of the position and velocity components

  • p<i> refers to the i-th entry of the position component

  • v<i> refers to the i-th entry of the velocity component

  • <some_name><i> refers to the i-th entry of the component called some_name

  • <some_name> refers to the component called some_name as a whole

Parameters

s – The string to parse

Returns

A Callable that will return the appropriate component from a given np.array, when given an array with a structured dtype that contains the given component. This Callable will throw an error when called if it can not execute the access the user wants on the given np.array.

cminject.utils.args.parse_multiple_dimensions_description(s: str, n: Optional[Union[int, Iterable[int]]] = None) -> (typing.Union[str, typing.Callable[[<built-in function array>], typing.Any]], <class 'int'>)
Parses a dimension description from a string, which is either one name of a dimension like

parse_dimension_description() accepts, or multiple of such names separated by commata.

Parameters
  • s – The string to parse as a dimension description.

  • n – The number of expected/allowed components in the description. Either an integer or an iterable of integers. If None, no checks are performed.

Returns

A Callable that will return the appropriate component(s) from a given np.array. If multiple components were given as ‘s’, this callable will return a corresponding k-tuple of components.

cminject.utils.cython_interpolation module

Efficient Cython implementations of interpolation code at a single point.

cminject.utils.cython_interpolation.interp2d()

Interpolates a n-dimensional vector field bilinearly based on a 2D regular data grid. Found to be much more efficient than scipy.interpolate.RegularGridInterpolator for a single 2D point.

Parameters
  • v – The data array. (nx, ny, nd)-shaped, where nx and ny are the number of points in each dimension of the regular grid, and nd is the number of dimensions that are interpolated (the size of the output vector). MUST be C-contiguous.

  • x – The x position to interpolate at.

  • y – The y position to interpolate at.

  • nd – As described in v.

  • nx – As described in v.

  • ny – As described in v.

  • out – The interpolated output vector, which is an interpolated (nd,)-shaped np.array.

Returns

Nothing

cminject.utils.cython_interpolation.interp3d()

Interpolates a n-dimensional vector field trilinearly based on a 3D regular data grid. Found to be much more efficient than scipy.interpolate.RegularGridInterpolator for a single 3D point.

Parameters
  • v – The data array. (nx, ny, nz, nd)-shaped, where nx, ny, nz are the number of points in each dimension of the regular grid, and nd is the number of dimensions that are interpolated (the size of the output vector). MUST be C-contiguous.

  • x – The x position to interpolate at.

  • y – The y position to interpolate at.

  • z – The z position to interpolate at.

  • nd – As described in v.

  • nx – As described in v.

  • ny – As described in v.

  • nz – As described in v.

  • out – The interpolated output vector, which is an interpolated (nd,)-shaped np.array.

Returns

Nothing

cminject.utils.distributions module

  • A collection of classes that serve as samplers of random distributions.

  • Utility functions for parsing human-readable strings representing distributions, into instances of those classes.

class cminject.utils.distributions.DiracDeltaDistribution(value)

Bases: cminject.utils.distributions.Distribution

The dirac delta distribution delta(x - value), samples of which are always the same constant value.

generate(n: int) numpy.array

Return n samples of this distribution as a NumPy array.

class cminject.utils.distributions.Distribution

Bases: abc.ABC

An abstract random 1D distribution that can generate and return a number of samples. Subclasses must implement the generate() method to be instantiable.

abstract generate(n: int) numpy.array

Return n samples of this distribution as a NumPy array.

class cminject.utils.distributions.GaussianDistribution(mu: float, sigma: float)

Bases: cminject.utils.distributions.Distribution

A Gaussian distribution with a fixed mean (mu) and standard deviation (sigma). See np.random.normal.

generate(n: int) numpy.array

Return n samples of this distribution as a NumPy array.

class cminject.utils.distributions.LinearDistribution(min: float, max: float)

Bases: cminject.utils.distributions.Distribution

A linear “distribution” between a minimum and a maximum value, i.e. generates a np.linspace.

generate(n: int) numpy.array

Return n samples of this distribution as a NumPy array.

class cminject.utils.distributions.NormOfDistributions(*distributions: cminject.utils.distributions.Distribution, p: int = 2)

Bases: cminject.utils.distributions.Distribution

Returns the distribution of the p-norm of samples generated by n multiple other distributions. Can be used to, for example, generate the r distribution given the x and y distributions.

Akin to a generalised chi distribution that is not necessarily based on normally distributed variables.

generate(n)

Return n samples of this distribution as a NumPy array.

class cminject.utils.distributions.UniformDistribution(min: float, max: float)

Bases: cminject.utils.distributions.Distribution

A uniform distribution between a minimum and a maximum value. See np.random.uniform.

generate(n: int) numpy.array

Return n samples of this distribution as a NumPy array.

cminject.utils.distributions.constant

alias of cminject.utils.distributions.DiracDeltaDistribution

cminject.utils.distributions.parse_distribution(s: str) cminject.utils.distributions.Distribution

Parse a string that describes a distribution and return a Distribution object. The allowed formats are given on the left, the corresponding Distribution subclass on the right.

Additionally, the following format is allowed:

  • norm(X1[,X2[,X3[,…]]])) -> NormOfDistributions, where all Xi must match any of the formats specified here.

Parameters

s – The string to parse.

Returns

The constructed Distribution object, if parsing was successful. Otherwise raises an error.

Raises

pp.ParseException

cminject.utils.global_config module

Code for storing and distributing global configuration values, i.e., values that multiple objects may or may not be interested in, for example the numerical time-step or the number of spatial dimensions.

Objects can explicitly ask for stored values for a certain configuration key, or subscribe to updates of the values of a certain configuration key. They may raise exceptions when they notice an incompatibility of the new configuration value with their own implementation, to prevent ill-defined simulations being run.

class cminject.utils.global_config.ConfigKey(value)

Bases: enum.Enum

An enumeration defining the configuration keys that GlobalConfig subscribers can subscribe to.

NUMBER_OF_DIMENSIONS = 1
TIME_STEP = 2
class cminject.utils.global_config.ConfigSubscriber

Bases: abc.ABC

A class that can subscribe to GlobalConfig’s changes.

abstract config_change(key: cminject.utils.global_config.ConfigKey, value: Any) None

Will be called whenever the value of any subscribed key changes. Will be called once at the time of subscribing, IF the value for the subscribed key(s) is not None.

Parameters
  • key – The ConfigKey that the change occurred for.

  • value – The new value of the configuration value stored for the key key.

Returns

Nothing (unused).

cminject.utils.interpolation module

Utility code for interpolation.

class cminject.utils.interpolation.Interp2D(grid, data)

Bases: object

A fast linear interpolator on a regular 2D grid with interpolation implemented in Cython, with a construction and call API like scipy.interpolate.RegularGridInterpolator, except that it accepts a (2,)-shaped np.array directly at call and not a 2-tuple of (x,y) coordinate arrays. Wraps cython_interpolation.pyx.

__call__(coord)

Return the interpolated data at the 3D coordinate coord.

Parameters

coord – The coordinate to interpolate at, as a (2,)-shaped np.array.

Returns

An (nd,)-shaped np.array with the interpolated results.

class cminject.utils.interpolation.Interp3D(c, data)

Bases: object

Like Interp2D, but on a 3D regular grid.

__call__(coord: numpy.array)

Return the interpolated data at the 3D coordinate coord.

Parameters

coord – The coordinate to interpolate at, as a (3,)-shaped np.array.

Returns

An (nd,)-shaped np.array with the interpolated results.

class cminject.utils.interpolation.InterpND(*args, **kwargs)

Bases: object

A simple wrapper for n-dimensional n-linear interpolation on a regular grid. Wraps scipy.interpolate.RegularGridInterpolator directly except for an additional tuple() call around the passed coordinate to interpolate at. This is to make the API coherent with Interp2D and Interp3D, which directly accept NumPy arrays, as this is faster for single-coordinate interpolation which Interp2D/Interp3D are optimized for.

__call__(t)

Call self as a function.

cminject.utils.interpolation.get_regular_grid_interpolator(grid, data)

Returns an appropriate regular grid interpolator instance based on the dimensionality. This is either an Interp2D, an Interp3D, or a RegularGridInterpolator instance, and Interp2D/Interp3D are much faster for 2D or 3D.

The returned objects follow mostly the same API as scipy’s RegularGridInterpolator, with the restriction that the coordinate argument must be a single coordinate and not a vector of coordinates.

Parameters
  • grid – The points defining the regular grid in n dimensions. Tuple/list of ndarray of float, with shapes (m1,), …, (mn,)

  • data – The data on the regular grid in n dimensions. Shape (m1, …, mn, D). Interpolated data vectors with shape (D,) will be returned by the interpolator.

Returns

An interpolator instance that can be called like a function.

cminject.utils.interpolation.split_at_inflections(a: numpy.array) Tuple[List[numpy.array], numpy.array]

Splits a np.array into multiple pieces based on where the pairwise differences of the elements change sign, i.e. the inflection points of a curve going through the points. The resulting pieces are either monotonically increasing or decreasing, but not necessarily strictly (i.e. there are no splits at the same value occurring twice).

Useful for interpolating through a curve since multiple interpolating functions for one curve can be generated from the result.

Parameters

a – The array to split as described.

Returns

A 2-tuple of: - A list of arrays, each being monotonically increasing or decreasing (not necessarily strictly). - The indices where the splits occurred.

cminject.utils.perf module

Utility functions for improving performance.

class cminject.utils.perf.cached_property(func)

Bases: object

Copied verbatim from the Python 3.8 source code. See the documentation for functools.cached_property there. The reason it lives here is that this is the only part in CMInject currently requiring Python >3.6, and by copying over this self-contained implementation, we can keep the version requirement at >3.6.

cminject.utils.perf.numpy_method_cache(*args, **kwargs)

LRU cache (functools.lru_cache) based caching implementation for functions whose SECOND parameter is a numpy array. Mostly useful for instance methods (whose first parameter is self) with one np.array as their second argument.

Based on https://gist.github.com/Susensio/61f4fee01150caaac1e10fc5f005eb75, this is a more specialised and thus simpler and faster implementation for 1D arrays that can be converted into tuples without recursing down. More general ways might need to be derived depending on the needs.

cminject.utils.structured_txt_hdf5 module

Code for working with (interpolation) fields stored in .txt and .hdf5 files:
  • Conversion code for txt->hdf5.

  • Code to read HDF5, for hdf5->pandas.DataFrame, and hdf5->datagrid (a tuple of coordinates and values)

Used by the tool cminject_txt-to-hdf5.

cminject.utils.structured_txt_hdf5.data_frame_to_data_grid(df: pandas.core.frame.DataFrame) Tuple[List[numpy.array], numpy.array]

Turns a DataFrame (as constructed by hdf5_to_data_frame) into a data grid that can be used with the scipy.interpolate.RegularGridInterpolator. The result can be passed on like: RegularGridInterpolator((x,y,z), data_grid).

Parameters

df – The pandas DataFrame. Must be constructed by or adhere to the format that hdf5_to_data_frame defines.

Returns

A 2-tuple like ([x, y, …], data_grid), where the first component is a list of all index arrays, and the second component is the data grid matching this index.

cminject.utils.structured_txt_hdf5.hdf5_to_data_frame(filename: str) pandas.core.frame.DataFrame

Reads in an HDF5 file (as written by txt_to_hdf5) and returns a pandas DataFrame matching it. Note that not all metadata stored in the HDF5 file is also attached to the DataFrame as of now.

Parameters

filename – The HDF5 file to read in. Must be written by or adhere to the format that txt_to_hdf5 defines.

Returns

A pandas DataFrame matching the data stored in the HDF5 file.

cminject.utils.structured_txt_hdf5.hdf5_to_data_grid(filename: str) Tuple[List[numpy.array], numpy.array]

Shortcut function to read in an HDF5 file (as written by txt_to_hdf5) and turn it into a data grid. Refer to hdf5_to_data_frame and data_frame_to_data_grid for further info; this function is defined purely in terms of the composition of the two.

Parameters

filename – The HDF5 file to read in.

Returns

A 4-tuple of numpy arrays: (x, y, z, data_grid).

cminject.utils.structured_txt_hdf5.txt_to_hdf5(infile_name: str, outfile_name: str, dimensions: int = 3, mirror: bool = False, mirror_antisym: Optional[List[int]] = None, attributes: Optional[Dict] = None) None

A function that reads a structured .txt file and stores a sparse specially constructed HDF5 file, while keeping metadata from the original file. The general structure of the .txt input file must be as follows (example for 3D):

% <Optional meta information lines...>
% X  Y  Z  A (m)  B (s)  C ...
0.0  0.0  0.0  1.0  2.0  3.0 ...
0.1  0.0  0.0  4.0  5.0  6.0 ...
...

This matches the .txt output format of COMSOL for n-D grids, where n is given via the dimensions parameter. The number of columns after X/Y/Z is arbitrary.

All but the last line starting with % at the beginning of the file are considered to be a general textual description of the file (metadata about the file itself), and are stored. The last line starting with % is considered the header of the columns, where each column name has to be separated by at least two spaces from the other column names. A single space is considered to still be part of the same column, and the second part of the column name is - if present - considered to be the unit of the column. This will be stored as metadata too.

The constructed HDF5 file will have the following entries:

  • index: Each of these is the set of indices found for each dimension, in order of occurrence

    • x

    • y

    • z, if present in the original file

  • data: Each of these is one full row of values, in X*Y*Z order (X and Z are flipped wrt. the original txt file!)

    • <one for each column, matching the column name from the file>

The metadata (the lines starting with %) is stored as a string attribute on the HDF5 root. Columns specifying a unit (separated by one space) have this information attached as metadata on data/<…>.

Parameters
  • infile_name – The input file name. Must be a .txt file matching the format described above, otherwise the behavior will be undefined.

  • outfile_name – The output file name.

  • dimensions – The number of spatial dimensions the txt file is defined in. 3 by default, will be 2 or 3 for most cases.

  • mirror – If True, enforce symmetry around the zero-axis of the first dimension, by mirroring the field around that axis. If True, the first output quantity (column) is antisymmetrically mirrored around this axis by default. See the param mirror_antisym to extend this behavior to other quantities, or to turn it off.

  • mirror_antisym – Column indices of the output quantities that should change sign around the symmetry axis. Useful for, e.g., v_x/v_r. The default value is [0], i.e., assuming that the first quantity is v_x/v_r. This option only has any effect if mirror=True.

  • attributes – Optionally, a dictionary of attributes to attach to the output HDF5 file (as HDF5 attributes on the root node). Useful to store global properties of the field or anything else that tools or people who later read this file might need.

Returns

None.