Print-Filenames
title: "Print Filenames"
author: "Rj"
date: 2019-04-21
description: "-"
type: technical_note
draft: false
from os import walk
from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(path) if isfile(join(path, f))]
['.BBE72B41371180178E084EEAF106AED4F350939DB95D3516864A1CC62E7AE82F']
for fle in onlyfiles:
print(fle)
.BBE72B41371180178E084EEAF106AED4F350939DB95D3516864A1CC62E7AE82F
Score …
Read More
Product-Summary-Classification-Nb
title: "Text Classification - Naive Bayes - Product Summary"
author: "Rj"
date: 2019-04-21
description: "-"
type: technical_note
draft: false
# Disclaimer: some code copied form this https://towardsdatascience.com/multi-class-text-classification-model-comparison-and-selection-5eb066197568
import logging
import pandas as pd
import numpy as np
from numpy import random
import gensim
import nltk
from sklearn.model_selection import train_test_split
from …
Read More
Regexp-Stemmer
title: "Regexp Stemmer"
author: "Rj"
date: 2019-04-21
description: "-"
type: technical_note
draft: false
from nltk.stem import RegexpStemmer
re_stemmer = RegexpStemmer("ing$|s$|e$|able$", min=7)
words = [
"wheels",
"breaking",
"thrones",
"breakable"
]
['wheels', 'breaking', 'thrones', 'breakable']
result = [re_stemmer.stem(word) for word in words]
['wheels', 'break', 'throne', 'break']
As the …
Read More
Search-Text
title: "Search Text"
author: "Rj"
date: 2019-04-21
description: "-"
type: technical_note
draft: false
from nltk.book import text1
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby …
Read More
Simple-Text-Processing
title: "Simple Text Processing"
author: "Rj"
date: 2019-04-20
description: "-"
type: technical_note
draft: false
import re
from nltk.tokenize import word_tokenize
from collections import Counter
from nltk.corpus import stopwords
text = ("""The cat is in the box. The cat likes the box. The box is over the cat.""")
Read More
Snowball-Stemmer
title: "Snowball Stemmer"
author: "Rj"
date: 2019-04-21
description: "-"
type: technical_note
draft: false
from nltk.stem.snowball import SnowballStemmer
words = [
"hunting",
"bunnies",
"thinking"
]
['hunting', 'bunnies', 'thinking']
stemmer = SnowballStemmer("english")
result = [stemmer.stem(word) for word in words]
['hunt', 'bunni', 'think']
Score: 5
Read More
Speech-2-Text
title: "Speech 2 Text"
author: "Rj"
date: 2019-04-21
description: "-"
type: technical_note
draft: false
import speech_recognition as sr
def startpy():
# obtain audio from the microphone
r = sr.Recognizer()
d= ''
while (d!='exit' and d!='quit'):
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
# recognize speech using Google …
Read More
Stemmer-With-Stopwords
title: "Stemmer with Stopwords"
author: "Rj"
date: 2019-04-21
description: "-"
type: technical_note
draft: false
from nltk.stem.snowball import SnowballStemmer
stemmer = SnowballStemmer("english")
print(stemmer.stem("having"))
stemmer2 = SnowballStemmer("english", ignore_stopwords = True)
print(stemmer2.stem("having"))
Score: 5
Read More
Stemming-And-Lemmatization
title: "Stemming and Lemmatization"
author: "Rj"
date: 2019-04-21
description: "-"
type: technical_note
draft: false
-
Stemming and Lemmatization are Text Normalization techniques in Natural Language Processing that are used to prepare text, words, and documents for further text processing.
-
Text normalization sometimes called as Word Normalization
-
Stemming in the process of keeping …
Read More
Text-Analysis-Cheatsheet
title: "Template"
author: "Rj"
date: 2019-04-21
description: "-"
type: technical_note
draft: false
Basics
tokens
text1[0:100] - first 101 tokens
text2[5] - fifth token
concordance
text3.concordance(‘begat’) - basic keyword-in-context
text1.concordance(‘sea’, lines=100) - show other than default 25 lines
text1.concordance(‘sea’, lines=all) - show all results
text1.concordance …
Read More