Get A Dictionary Of Incorrect Spelling Words In A Dataframe
Am working on sentiment analysis problem. Tried to use autocorrect
but that requires a lot computing power which I don't have access to because of the size of corpus. So came up with a different approach of solving the problem by creating a dictionary of {key = 'incorrect', value = 'correct'}
and then manually correcting all words.
The problem is that how should I get that dictionary of miss-spelled words in the dictionary. Is this link same as the solution to my problem?(Rather than misspelled words should I look for OOV words?)
And if not, please suggest some better method.
Code used for autocorrect
:
!pip install autocorrect
from autocorrect import spell
train['text'] = [' '.join([spell(i) for i in x.split()]) for x in train['text']]
Answer
How many times can you spell a word correctly? Only 1.
Now, how many times can you spell a word incorrectly? I should say infinite.
This answers your question:
Rather than misspelled words should I look for OOV words?
- Sure, especially if your misspells are not neologisms or commonly used misspells that repeat often.
Now, how then can you get the features if they are misspelled? One way is to use "Levenstein Distance" (or minimum edit distance), which compares a misspelled word to your word dictionary, checking whether the distance from it to any of your words is small. That is probably what is behind the autocorrect package. You can check some more information about it in this link.
So, in short, probably you have to either discard OOV words or employ some computational resources on them, since computers are not able to "guess" without doing some computation on top of it.
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module