Print-Filenames

Sat 17 May 2025

title: "Print Filenames" author: "Rj" date: 2019-04-21 description: "-" type: technical_note draft: false


from os import walk
from os import listdir
from os.path import isfile, join
path = '/tmp'
onlyfiles = [f for f in listdir(path) if isfile(join(path, f))]
onlyfiles
['.BBE72B41371180178E084EEAF106AED4F350939DB95D3516864A1CC62E7AE82F']
for fle in onlyfiles:
    print(fle)
.BBE72B41371180178E084EEAF106AED4F350939DB95D3516864A1CC62E7AE82F



Score …

Category: textprocessing

Read More

Product-Summary-Classification-Nb

Sat 17 May 2025

title: "Text Classification - Naive Bayes - Product Summary" author: "Rj" date: 2019-04-21 description: "-" type: technical_note draft: false


# Disclaimer: some code copied form this https://towardsdatascience.com/multi-class-text-classification-model-comparison-and-selection-5eb066197568
import logging
import pandas as pd
import numpy as np
from numpy import random
import gensim
import nltk
from sklearn.model_selection import train_test_split
from …

Category: textprocessing

Read More

Regexp-Stemmer

Sat 17 May 2025

title: "Regexp Stemmer" author: "Rj" date: 2019-04-21 description: "-" type: technical_note draft: false


from nltk.stem import RegexpStemmer
re_stemmer = RegexpStemmer("ing$|s$|e$|able$", min=7)
words = [
    "wheels",
    "breaking",
    "thrones",
    "breakable"
]
words
['wheels', 'breaking', 'thrones', 'breakable']
result = [re_stemmer.stem(word) for word in words]
result
['wheels', 'break', 'throne', 'break']

As the …

Category: textprocessing

Read More

Search-Text

Sat 17 May 2025

title: "Search Text" author: "Rj" date: 2019-04-21 description: "-" type: technical_note draft: false


from nltk.book import text1
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby …

Category: textprocessing

Read More

Simple-Text-Processing

Sat 17 May 2025

title: "Simple Text Processing" author: "Rj" date: 2019-04-20 description: "-" type: technical_note draft: false


import re
from nltk.tokenize import word_tokenize
from collections import Counter
from nltk.corpus import stopwords
text = ("""The cat is in the box. The cat likes the box. The box is over the cat.""")
tokens = [w for …

Category: textprocessing

Read More

Snowball-Stemmer

Sat 17 May 2025

title: "Snowball Stemmer" author: "Rj" date: 2019-04-21 description: "-" type: technical_note draft: false


from nltk.stem.snowball import SnowballStemmer
words = [
    "hunting",
    "bunnies",
    "thinking"
]
words
['hunting', 'bunnies', 'thinking']
stemmer = SnowballStemmer("english")
result = [stemmer.stem(word) for word in words]
result
['hunt', 'bunni', 'think']

Score: 5

Category: textprocessing

Read More

Speech-2-Text

Sat 17 May 2025

title: "Speech 2 Text" author: "Rj" date: 2019-04-21 description: "-" type: technical_note draft: false


import speech_recognition as sr
def startpy():

    # obtain audio from the microphone
    r = sr.Recognizer()
    d= ''
    while (d!='exit' and d!='quit'):
        with sr.Microphone() as source:
            print("Say something!")
            audio = r.listen(source)

    # recognize speech using Google …

Category: textprocessing

Read More

Stemmer-With-Stopwords

Sat 17 May 2025

title: "Stemmer with Stopwords" author: "Rj" date: 2019-04-21 description: "-" type: technical_note draft: false


from nltk.stem.snowball import SnowballStemmer
stemmer = SnowballStemmer("english")
print(stemmer.stem("having"))
have
stemmer2 = SnowballStemmer("english",  ignore_stopwords = True)
print(stemmer2.stem("having"))
having


Score: 5

Category: textprocessing

Read More

Stemming-And-Lemmatization

Sat 17 May 2025

title: "Stemming and Lemmatization" author: "Rj" date: 2019-04-21 description: "-" type: technical_note draft: false


  • Stemming and Lemmatization are Text Normalization techniques in Natural Language Processing that are used to prepare text, words, and documents for further text processing.

  • Text normalization sometimes called as Word Normalization

  • Stemming in the process of keeping …

Category: textprocessing

Read More

Text-Analysis-Cheatsheet

Sat 17 May 2025

title: "Template" author: "Rj" date: 2019-04-21 description: "-" type: technical_note draft: false


Basics

tokens

text1[0:100] - first 101 tokens

text2[5] - fifth token

concordance

text3.concordance(‘begat’) - basic keyword-in-context

text1.concordance(‘sea’, lines=100) - show other than default 25 lines

text1.concordance(‘sea’, lines=all) - show all results

text1.concordance …

Category: textprocessing

Read More
Page 3 of 5

« Prev Next »