Text-2-Dictionary

Fri 14 November 2025

title: "Text 2 Dictionary" author: "Raja CSP Raman" date: 2019-04-20 description: "-" type: technical_note draft: false


import gensim
from gensim import corpora
from pprint import pprint
# How to create a dictionary from a list of sentences?
documents = [
        """More than half of survey participants also reported clicking on a headline expecting to …

Category: gensim-samples

Read More

Text-Analysis-Cheatsheet

Fri 14 November 2025

title: "Template" author: "Rj" date: 2019-04-21 description: "-" type: technical_note draft: false


Basics

tokens

text1[0:100] - first 101 tokens

text2[5] - fifth token

concordance

text3.concordance(‘begat’) - basic keyword-in-context

text1.concordance(‘sea’, lines=100) - show other than default 25 lines

text1.concordance(‘sea’, lines=all) - show all results

text1.concordance …

Category: textprocessing

Read More

Text-Blob-Classifier

Fri 14 November 2025

title: "Text Blob Classifier" author: "Rj" date: 2019-04-21 description: "-" type: technical_note draft: false


from textblob.classifiers import NaiveBayesClassifier
train = [
     ('I love this sandwich.', 'pos'),
     ('this is an amazing place!', 'pos'),
     ('I feel very good about these beers.', 'pos'),
     ('this is my best work.', 'pos'),
     ("what an awesome view", 'pos'),
     ('I …

Category: textprocessing

Read More

Text-Classification-Nb

Fri 14 November 2025

title: "Text Classification - Naive Bayes - Stackoverflow Tags" author: "Rj" date: 2019-04-21 description: "-" type: technical_note draft: false


# Disclaimer: some code copied form this https://towardsdatascience.com/multi-class-text-classification-model-comparison-and-selection-5eb066197568
import logging
import pandas as pd
import numpy as np
from numpy import random
import gensim
import nltk
from sklearn.model_selection import train_test_split
from …

Category: textprocessing

Read More

Text-Decompose

Fri 14 November 2025

title: "Text Decompose" author: "Rj" date: 2019-04-20 description: "List Test" type: technical_note draft: false


import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup('<p>This is a slimy text and <i> I am slimer</i></p>')
soup.i.decompose()
print(soup.text)
This is a slimy text and


Score: 0

Category: webreader

Read More

Text-Diff

Fri 14 November 2025

title: "Text Diff" author: "Rj" date: 2019-04-20 description: "List Test" type: technical_note draft: false


import difflib
str1 = "I understand how customers do their choice. Difference"
str2 = "I understand how customers do their choice."
seq = difflib.SequenceMatcher(None, str1, str2)
d = seq.ratio()*100
d
88.65979381443299
def get_similarity(str1, str2 …

Category: basics

Read More

Text-File-2-Nltk-Text

Fri 14 November 2025

title: "Text File 2 NLTK Text" author: "Rj" date: 2019-04-21 description: "-" type: technical_note draft: false


import nltk
f =open('canola.txt','r')
raw = f.read()
raw
'OTTAWA—The federal Liberals promised Wednesday to give Canada’s canola farmers much-needed financial aid to help lessen the impact of China’s decision …

Category: textprocessing

Read More

Text-Index-And-Slicing

Fri 14 November 2025

title: "Text Index and Slicing" author: "Rj" date: 2019-04-21 description: "-" type: technical_note draft: false


import nltk
f =open('canola.txt','r')
raw = f.read()
raw
'OTTAWA—The federal Liberals promised Wednesday to give Canada’s canola farmers much-needed financial aid to help lessen the impact of China’s decision to …

Category: textprocessing

Read More

Text-Similarity

Fri 14 November 2025

title: "Text Similarity" author: "Raja CSP Raman" date: 2019-04-20 description: "-" type: technical_note draft: false


import nltk, string
from sklearn.feature_extraction.text import TfidfVectorizer
stemmer = nltk.stem.porter.PorterStemmer()
remove_punctuation_map = dict((ord(char), None) for char in string.punctuation)
def stem_tokens(tokens):
    return [stemmer.stem(item) for item in tokens]
'''remove …

Category: basics

Read More

Text-Similarity-Finder

Fri 14 November 2025

title: "Text Similarity Finder" author: "Raja CSP Raman" date: 2019-04-20 description: "-" type: technical_note draft: false


from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity  
def findSimilarity(param1, param2):
    documents = (
        param1,
        param2
    )
    tfidf_vectorizer = TfidfVectorizer()
    tfidf_matrix = tfidf_vectorizer.fit_transform(documents)
    cosine = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix)

    print(cosine)
findSimilarity("In …

Category: basics

Read More
Page 137 of 146

« Prev Next »