![]() ![]() Wordfreq provides estimates of the frequencies of words in many languages, loading its data from efficiently-compressed data structures so it can give you word frequencies down to 1 occurrence per million without having to access an external database. We looked for a while and couldn’t find it. If you’re wondering what the deal is with John Kasich, a headline that mentions “Kasich” will be much more likely to be what you’re looking for than one that merely mentions “John”.įor purposes like these, it would be nice if we could just import a Python package that could tell us whether one word was more common than another, in general, based on a wide variety of text. For example, a product review might contain the word “use” and the word “defective”, and the word “defective” carries way more information. ![]() In many cases, you want to assign more significance to uncommon words. Let’s talk briefly about why word frequencies are important. Often, in NLP, you need to answer the simple question: “is this a common word?” It turns out that this leaves the computer to answer a more vexing question: “What’s a word?” We recommend reading a more recent post anyway. We blame WordPress, which we don’t use anymore. Unfortunately, the images in this post have been lost to history. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |