Questions tagged [stemming]

Stemming is a Natural Language process where words are reduced to their root by removing, usually the suffix.

5 questions
7
votes
3 answers

Is there a good German Stemmer?

What I tried: # -*- coding: utf-8 -*- from nltk.stem.snowball import GermanStemmer st = GermanStemmer() token_groups = [(["experte", "Experte", "Experten", "Expertin", "Expertinnen"], []), (["geh", "gehe", "gehst", "geht", "gehen",…
Martin Thoma
  • 19,540
  • 36
  • 98
  • 170
6
votes
2 answers

Python stemmer for Georgian

I am currently working with Georgian texts processing. Does anybody know any stemmers/lemmatizers (or other NLP tools) for Georgian that I could use with Python. Thanks in advance!
1
vote
1 answer

How does Snowball Stemmer work?

I have been reading about Snowball Stemmer. I wonder how does it work? Does it use rules to stem word or does it use machine learning to do that? I checked snowballstem.org but could not find the answer!
asmgx
  • 549
  • 2
  • 18
1
vote
2 answers

How to resolve country and nationality entities?

I've tried stemming and lemmatization on this but nothing has quite worked so far. How can I resolve country name and nationality as a singular entity? For example: Canada and Canadian should just be one entity: Canada Uganda and Ugandan should…
0
votes
1 answer

Faster preprocessing for Arabic texts

Background I'm analyzing a relatively large text-based Arabic dataset using Python (50,000 - 70,000 text files; total size ~5GB). I want to segment, stem, and POS tag the dataset. I am aware of two Python libraries that can do these 3 tasks:…
Alaa
  • 1