Update README.md

2018-10-26 20:28:59 +02:00 · 2018-10-26 20:28:59 +02:00 · d97d7cfefb
--- a/README.md
+++ b/README.md
@ -13,19 +13,46 @@ Features
 |---------|---------|-----------|---------|-------------|---------|
 | 2.5mio  | 35k     | 100mio    | 1.3h    | Databricks, 8 workers, [Azure Standard DS3 v2](https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/) | |

-# Jupyter Notebook Setup
+# Usage

-# Spark Setup
+```python
+import pandas as pd
+from pysarplus import SARPlus

-One must set the crossJoin property to enable calculation of the similarity matrix.
+# spark dataframe with user/item/rating/optional timestamp tuples
+train_df = spark.createDataFrame(
+      pd.DataFrame({
+        'user_id': [1, 1, 2, 3, 3],
+        'item_id': [1, 2, 1, 1, 3],
+        'rating':  [1, 1, 1, 1, 1],
+    }))
+   
+# spark dataframe with user/item tuples
+test_df = spark.createDataFrame(
+      pd.DataFrame({
+        'user_id': [1, 3],
+        'item_id': [1, 3],
+        'rating':  [1, 1],
+    }))
+    
+model = SARPlus(spark, col_user='user_id', col_item='item_id', col_rating='rating', col_timestamp='timestamp')
+model.fit(train_df, similarity_type='jaccard')

-```
-spark.sql.crossJoin.enabled true
+model.recommend_k_items(test_df, 'sarplus_cache', top_k=3).show()
 ```

-# Databricks Setup
+# Jupyter Notebook

-One must set the crossJoin property to enable calculation of the similarity matrix.
+# PySpark Shell
+
+```bash
+pip install pysarplus
+pyspark --packages eisber:sarplus:0.2.1 --conf spark.sql.crossJoin.enabled=true
+```
+
+# Databricks
+
+One must set the crossJoin property to enable calculation of the similarity matrix (Clusters / <Cluster> / Configuration / Spark Config)

 ```
 spark.sql.crossJoin.enabled true
@ -59,6 +86,7 @@ On [Spark](https://spark.apache.org/) one can install all 3 components (C++, Pyt
 3. Upload the zipped Scala package to [Spark Package](https://spark-packages.org/) through a browser. [sbt spPublish](https://github.com/databricks/sbt-spark-package) has a few [issues](https://github.com/databricks/sbt-spark-package/issues/31) so it always fails for me. Don't use spPublishLocal as the packages are not created properly (names don't match up, [issue](https://github.com/databricks/sbt-spark-package/issues/17)) and furthermore fail to install if published to [Spark-Packages.org](https://spark-packages.org/).  

 ```bash
+cd scala
 sbt spPublish
 ```

@ -68,7 +96,6 @@ To test the python UDF + C++ backend

 ```bash
 cd python 
-
 python setup.py install && pytest -s tests/
 ```

@ -76,7 +103,6 @@ To test the Scala formatter

 ```bash
 cd scala
-
 sbt test
 ```