Skip to content

Conversation

@NickDamiano
Copy link

Training 4.45 seconds,
Prediction: 4.72 seconds
Accuracy: 96%

My algorithm to identify the category for an unknown book is:

  1. Count the occurrences of the words - similar to our win-loss prework exercise - and store it in a hash with word as key and frequency as value.
  2. Sort the Hash by frequency and grab the top 2500 most frequently occurring words - storing them into an 2-dimensional array. The array looks something like this [["nick", 10], ["is", 2], ["rad", 9]].
  3. Run .map on that array - grabbing only the first of the inner array and store that in a variable so now we have an array of just the top 100 words.
    4)Repeat the above steps for each of the tokens passed in during the predictions.
    5)Subtract the token word list array from the known subject array and what we get is an array containing all of the words that didn't match between the two.
    6)Find the number of elements, the smaller the number the more successful matches were made and the more likely the subject matched against is the correct subject.
  4. Test each differences array count against the existing smallest one. If the differences array element count is smaller, replace the variables storing the up-to-then most likely subject with the new subject, and the previous size of differences array with the new number of elements in differences array.
  5. Return the most-likely subject.

…ly used words and compare that to the unknown books to see category. Right now I have two arrays of words and I'm going to subtract them and then see which count is the smallest. the smallest count is the one with the most matching words and the right category.
…over 6 characters with a 73% accuracy, prediction time of 5.08 seconds, and a training time of 4.75 seconds
… Changed the comparison arrays so that the known subject array top words is 2500 instead of 100. 96%
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant