зеркало из https://github.com/microsoft/topologic.git
0bcb35c252
* Distance module refactor. The distance module was once placed at the top level, and was used primarily as a quick alias to the scipy functions that are the actual spatial distance functions that we tend to use most frequently after generating our embeddings. Re-implementing the documentation for these functions really had no value, and how we used these distance functions was really the valuable bit - yet wasn't included as general functions. We changed that with this refactor, adding two main functions: - vector_distance, which takes in 2 vectors and either a string or distance function and executes a distance calculation. The primary use case for this is in the string context, mostly such that our distance functions can be toggled based on a configuration value vs. a code change. It also allows us to better support mahalanobis from a usage perspective, in that the actual comparison happens only on the vector to vector level, but it does require some initial setup in the manner of an inverse covariance matrix representing a set or representative sample of the full set of vertices' vectors. A curried mahalanobis function returns a Callable that only takes in two vectors, but uses the initial inverse covariance matrix provided at the first call to provide that to scipy, thus meeting our vector to vector calculation requirement for the vector_distance function. Also, magic strings can be error prone, so you can use the actual function itself and have an IDE or mypy or other linter catch a problem with the spelling of euclidean (for instance) if you use the functions instead of the strings (not useful in configuration, but useful for people who AREN'T doing configuration based distance calculations) - embedding_distances_for - this function takes in the vector we're comparing against all other vectors in the embedding. The embedding can be either an EmbeddingContainer or an np.ndarray, and will return a corresponding 1d np.ndarray of distances (with the same distance method parameter and behavior as described for vector_distance). In most circumstances this will also include a distance to itself. * Double imported the clustering and metric packages |
||
---|---|---|
.. | ||
test_data | ||
README.md | ||
bipartite.ipynb | ||
complex_io.ipynb | ||
embeddings.ipynb |
README.md
topologic jupyter notebooks
This folder contains a few notebooks that illustrate some common usages of the topologic library.
To run, you will need to have topologic
in the same python environment that jupyter
is installed in.
The following notebooks use our test data from our unit test folders and is located ../tests/test_data
. Each notebook is committed with results inline so they can be used as tutorials,
but we hope you will use these as a basis to explore topologic on your own.