Regexp-Stemmer
Sat 17 May 2025
title: "Regexp Stemmer" author: "Rj" date: 2019-04-21 description: "-" type: technical_note draft: false
from nltk.stem import RegexpStemmer
re_stemmer = RegexpStemmer("ing$|s$|e$|able$", min=7)
words = [
"wheels",
"breaking",
"thrones",
"breakable"
]
words
['wheels', 'breaking', 'thrones', 'breakable']
result = [re_stemmer.stem(word) for word in words]
result
['wheels', 'break', 'throne', 'break']
As the minium length of the string is 7 in the RegexStemmer, 'wheels' is not stemmed properly
Score: 5
Category: textprocessing