Docs : Fixed minor typos (#387)
This commit is contained in:
Родитель
fc031a8c2c
Коммит
2d64b08600
|
@ -63,7 +63,7 @@ Deepen your understanding of clustering techniques in this [Learn module](https:
|
|||
>
|
||||
> 🎓 ['Distances'](https://web.stanford.edu/class/cs345a/slides/12-clustering.pdf)
|
||||
>
|
||||
> Clusters are defined by their distance matrix, e.g. the distances between points. This distance can be measured a few ways. Euclidean clusters are defined by the average of the point values, and contain a 'centroid' or center point. Distances are thus measured by the distance to that centroid. Non-Euclidean distances refer to 'clustroids', the point closest to other points. Clustroids in turn can be defined in various ways.
|
||||
> Clusters are defined by their distance matrix, e.g. the distances between points. This distance can be measured in a few ways. Euclidean clusters are defined by the average of the point values, and contain a 'centroid' or center point. Distances are thus measured by the distance to that centroid. Non-Euclidean distances refer to 'clustroids', the point closest to other points. Clustroids in turn can be defined in various ways.
|
||||
>
|
||||
> 🎓 ['Constrained'](https://wikipedia.org/wiki/Constrained_clustering)
|
||||
>
|
||||
|
@ -321,7 +321,7 @@ In preparation for the next lesson, make a chart about the various clustering al
|
|||
|
||||
## Review & Self Study
|
||||
|
||||
Before you apply clustering algorithms, as we have learned, it's a good idea to understand the nature of your dataset. Read more onn this topic [here](https://www.kdnuggets.com/2019/10/right-clustering-algorithm.html)
|
||||
Before you apply clustering algorithms, as we have learned, it's a good idea to understand the nature of your dataset. Read more on this topic [here](https://www.kdnuggets.com/2019/10/right-clustering-algorithm.html)
|
||||
|
||||
[This helpful article](https://www.freecodecamp.org/news/8-clustering-algorithms-in-machine-learning-that-all-data-scientists-should-know/) walks you through the different ways that various clustering algorithms behave, given different data shapes.
|
||||
|
||||
|
|
|
@ -71,7 +71,7 @@ print(blob.translate(to="fr"))
|
|||
|
||||
It can be argued that TextBlob's translation is far more exact, in fact, than the 1932 French translation of the book by V. Leconte and Ch. Pressoir:
|
||||
|
||||
"C'est une vérité universelle qu'un celibataire pourvu d'une belle fortune doit avoir envie de se marier, et, si peu que l'on sache de son sentiment à cet egard, lorsqu'il arrive dans une nouvelle residence, cette idée est si bien fixée dans l'esprit de ses voisins qu'ils le considèrent sur-le-champ comme la propriété légitime de l'une ou l'autre de leurs filles."
|
||||
"C'est une vérité universelle qu'un célibataire pourvu d'une belle fortune doit avoir envie de se marier, et, si peu que l'on sache de son sentiment à cet egard, lorsqu'il arrive dans une nouvelle résidence, cette idée est si bien fixée dans l'esprit de ses voisins qu'ils le considèrent sur-le-champ comme la propriété légitime de l'une ou l'autre de leurs filles."
|
||||
|
||||
In this case, the translation informed by ML does a better job than the human translator who is unnecessarily putting words in the original author's mouth for 'clarity'.
|
||||
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
# Sentiment analysis with hotel reviews
|
||||
|
||||
Now that you have a explored the dataset in detail, it's time to filter the columns and then use NLP techniques on the dataset to gain new insights about the hotels.
|
||||
Now that you have explored the dataset in detail, it's time to filter the columns and then use NLP techniques on the dataset to gain new insights about the hotels.
|
||||
## [Pre-lecture quiz](https://white-water-09ec41f0f.azurestaticapps.net/quiz/39/)
|
||||
|
||||
### Filtering & Sentiment Analysis Operations
|
||||
|
@ -101,7 +101,7 @@ Clean the data just a bit more. Add columns that will be useful later, change th
|
|||
|
||||
### Tag columns
|
||||
|
||||
The `Tag` columns is problematic as it is a list (in text form) stored in the column. Unfortunately the order and number of sub sections in this column are not always the same. It's hard for a human to identify the correct phrases to be interested in, because there are 515,000 rows, and 1427 hotels, and each has slightly different options a reviewer could choose. This is where NLP shines. You can scan the text and find the most common phrases, and count them.
|
||||
The `Tag` column is problematic as it is a list (in text form) stored in the column. Unfortunately the order and number of sub sections in this column are not always the same. It's hard for a human to identify the correct phrases to be interested in, because there are 515,000 rows, and 1427 hotels, and each has slightly different options a reviewer could choose. This is where NLP shines. You can scan the text and find the most common phrases, and count them.
|
||||
|
||||
Unfortunately, we are not interested in single words, but multi-word phrases (e.g. *Business trip*). Running a multi-word frequency distribution algorithm on that much data (6762646 words) could take an extraordinary amount of time, but without looking at the data, it would seem that is a necessary expense. This is where exploratory data analysis comes in useful, because you've seen a sample of the tags such as `[' Business trip ', ' Solo traveler ', ' Single Room ', ' Stayed 5 nights ', ' Submitted from a mobile device ']` , you can begin to ask if it's possible to greatly reduce the processing you have to do. Luckily, it is - but first you need to follow a few steps to ascertain the tags of interest.
|
||||
|
||||
|
@ -176,7 +176,7 @@ Removing these tags is step 1, it reduces the total number of tags to be conside
|
|||
| Stayed 9 nights | 1293 |
|
||||
| ... | ... |
|
||||
|
||||
There are a huge variety of rooms, suites, studios, apartments and so on. They all mean the roughly the same thing and not relevant to you, so remove them from consideration.
|
||||
There are a huge variety of rooms, suites, studios, apartments and so on. They all mean roughly the same thing and not relevant to you, so remove them from consideration.
|
||||
|
||||
| Type of room | Count |
|
||||
| ----------------------------- | ----- |
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
![Summary of reinforcement in machine learning in a sketchnote](../../sketchnotes/ml-reinforcement.png)
|
||||
> Sketchnote by [Tomomi Imura](https://www.twitter.com/girlie_mac)
|
||||
|
||||
Reinforcement learning involves three important concepts: the agent, some states, and a set of actions per state. By executing an action in a specified state, the agent is given a reward. Again imagine the computer game Super Mario. You are Mario, you are in a game level, standing next to a cliff edge. Above you is a coin. You being Mario, in a game level, at a specific position ... that's your state. Moving one step to the right (an action) will take you over the edge, and that would give you a low numerical score. However, pressing the jump button would let score a point and you would stay alive. That's a positive outcome and that should award you a positive numerical score.
|
||||
Reinforcement learning involves three important concepts: the agent, some states, and a set of actions per state. By executing an action in a specified state, the agent is given a reward. Again imagine the computer game Super Mario. You are Mario, you are in a game level, standing next to a cliff edge. Above you is a coin. You being Mario, in a game level, at a specific position ... that's your state. Moving one step to the right (an action) will take you over the edge, and that would give you a low numerical score. However, pressing the jump button would let you score a point and you would stay alive. That's a positive outcome and that should award you a positive numerical score.
|
||||
|
||||
By using reinforcement learning and a simulator (the game), you can learn how to play the game to maximize the reward which is staying alive and scoring as many points as possible.
|
||||
|
||||
|
@ -192,7 +192,7 @@ Here γ is the so-called **discount factor** that determines to which extent you
|
|||
|
||||
## Learning Algorithm
|
||||
|
||||
Given the equation above, we can now write pseudo-code for our leaning algorithm:
|
||||
Given the equation above, we can now write pseudo-code for our learning algorithm:
|
||||
|
||||
* Initialize Q-Table Q with equal numbers for all states and actions
|
||||
* Set learning rate α ← 1
|
||||
|
@ -280,7 +280,7 @@ walk(m,qpolicy_strict)
|
|||
|
||||
> **Task 1:** Modify the `walk` function to limit the maximum length of path by a certain number of steps (say, 100), and watch the code above return this value from time to time.
|
||||
|
||||
> **Task 2:** Modify the `walk` function so that it does not go back to the places where is has already been previously. This will prevent `walk` from looping, however, the agent can still end up being "trapped" in a location from which it is unable to escape.
|
||||
> **Task 2:** Modify the `walk` function so that it does not go back to the places where it has already been previously. This will prevent `walk` from looping, however, the agent can still end up being "trapped" in a location from which it is unable to escape.
|
||||
|
||||
## Navigation
|
||||
|
||||
|
|
Загрузка…
Ссылка в новой задаче