topologic/notebooks
Dwayne Pryce 0bcb35c252
Distance module refactor (#25)
* Distance module refactor.

The distance module was once placed at the top level, and was used
primarily as a quick alias to the scipy functions that are the actual
spatial distance functions that we tend to use most frequently after
generating our embeddings.

Re-implementing the documentation for these functions really had no
value, and how we used these distance functions was really the valuable
bit - yet wasn't included as general functions.  We changed that with
this refactor, adding two main functions:
- vector_distance, which takes in 2 vectors and either a string or
distance function and executes a distance calculation. The primary use
case for this is in the string context, mostly such that our distance
functions can be toggled based on a configuration value vs. a code
change. It also allows us to better support mahalanobis from a usage
perspective, in that the actual comparison happens only on the vector to
vector level, but it does require some initial setup in the manner of an
inverse covariance matrix representing a set or representative sample of
the full set of vertices' vectors. A curried mahalanobis function
returns a Callable that only takes in two vectors, but uses the initial
inverse covariance matrix provided at the first call to provide that to
scipy, thus meeting our vector to vector calculation requirement for the
vector_distance function. Also, magic strings can be error prone, so you
can use the actual function itself and have an IDE or mypy or other
linter catch a problem with the spelling of euclidean (for instance) if
you use the functions instead of the strings (not useful in
configuration, but useful for people who AREN'T doing configuration
based distance calculations)
- embedding_distances_for - this function takes in the vector we're
comparing against all other vectors in the embedding. The embedding can
be either an EmbeddingContainer or an np.ndarray, and will return a
corresponding 1d np.ndarray of distances (with the same distance method parameter
and behavior as described for vector_distance). In most circumstances
this will also include a distance to itself.

* Double imported the clustering and metric packages
2020-02-19 18:05:21 -08:00
..
test_data Initial commit 2020-02-10 11:30:24 -08:00
README.md Initial commit 2020-02-10 11:30:24 -08:00
bipartite.ipynb Removed metadata type registry 2020-02-10 18:31:28 -08:00
complex_io.ipynb Distance module refactor (#25) 2020-02-19 18:05:21 -08:00
embeddings.ipynb Distance module refactor (#25) 2020-02-19 18:05:21 -08:00

README.md

topologic jupyter notebooks

This folder contains a few notebooks that illustrate some common usages of the topologic library.

To run, you will need to have topologic in the same python environment that jupyter is installed in.

The following notebooks use our test data from our unit test folders and is located ../tests/test_data. Each notebook is committed with results inline so they can be used as tutorials, but we hope you will use these as a basis to explore topologic on your own.