Classify A Noun Into Abstract Or Concrete Using Nltk Or Similar
Solution 1:
I would suggest training a classifier using pretrained word vectors.
You need two libraries: spacy
for tokenizing text and extracting word vectors, and scikit-learn
for machine learning:
import spacy
from sklearn.linear_modelimportLogisticRegressionimport numpy as np
nlp = spacy.load("en_core_web_md")
Distinguishing concrete and abstract nouns is a simple task, so you can train a model with very few examples:
classes = ['concrete', 'abstract']
# todo: add more examplestrain_set = [
['apple', 'owl', 'house'],
['agony', 'knowledge', 'process'],
]
X = np.stack([list(nlp(w))[0].vector for part in train_set for w in part])
y = [label for label, part in enumerate(train_set) for _ in part]
classifier = LogisticRegression(C=0.1, class_weight='balanced').fit(X, y)
When you have a trained model, you can apply it to any text:
for token in nlp("Have a seat in that chair with comfort and drink some juice to soothe your thirst."):
if token.pos_ == 'NOUN':
print(token, classes[classifier.predict([token.vector])[0]])
The result looks satisfying:
# seat concrete# chair concrete# comfort abstract# juice concrete# thirst abstract
You can improve the model by applying it to different nouns, spotting the errors and adding them to the training set under the correct label.
Solution 2:
Try to use WordNet via NLTK and explore the hypernym tree of the words you are interested in. WordNet is a lexical database that organises words in a tree-like structure based on their abstraction level. You can use this to get more abstract versions of your target word.
For example, the following example code tells you that that the word "chair" belongs to the category "seats", which belongs to the over-arching category "entity". The word "anger" on the other hand, belongs to the category "emotion".
from nltk.corpus import wordnet as wn
wn.synsets('chair')
wn.synset('chair.n.01').hypernyms()
# [Synset('seat.n.03')]
wn.synset('chair.n.01').root_hypernyms()
# [Synset('entity.n.01')]
wn.synsets('anger')
wn.synset('anger.n.01').hypernyms()
# [Synset('emotion.n.01')]
=> look at the NLTK WordNet documentation and play around with the hypernym trees to categorise words into abstract or concrete categories. You will have to define yourself what exactly you mean by "abstract" or "concrete" and which categories of words you want to put into these two buckets though.
Solution 3:
One simple way could be maintain a dictionary of abstract nouns and for unknown words look at suffixes — words with following suffixes are generally abstract nouns:
- -ship
- -ism
- -ity
- -ness
- -age
- -acy
- -ment
- -ability, etc.
Solution 4:
first, tokenize words by word_tokenize(string) then use pos_tag from nltk.
import nltk
from nltk import*
string="Have a seat in that chair."
words=nltk.word_tokenize(string)
nltk.pos_tag(words)
This is not tested, but I think it might be almost similar like this.
Post a Comment for "Classify A Noun Into Abstract Or Concrete Using Nltk Or Similar"