Text-2-Dictionary
title: "Text 2 Dictionary"
author: "Raja CSP Raman"
date: 2019-04-20
description: "-"
type: technical_note
draft: false
import gensim
from gensim import corpora
from pprint import pprint
# How to create a dictionary from a list of sentences?
documents = [
"""More than half of survey participants also reported clicking on a headline expecting to …
Read More
Text-Analysis-Cheatsheet
title: "Template"
author: "Rj"
date: 2019-04-21
description: "-"
type: technical_note
draft: false
Basics
tokens
text1[0:100] - first 101 tokens
text2[5] - fifth token
concordance
text3.concordance(‘begat’) - basic keyword-in-context
text1.concordance(‘sea’, lines=100) - show other than default 25 lines
text1.concordance(‘sea’, lines=all) - show all results
text1.concordance …
Read More
Text-Blob-Classifier
title: "Text Blob Classifier"
author: "Rj"
date: 2019-04-21
description: "-"
type: technical_note
draft: false
from textblob.classifiers import NaiveBayesClassifier
train = [
('I love this sandwich.', 'pos'),
('this is an amazing place!', 'pos'),
('I feel very good about these beers.', 'pos'),
('this is my best work.', 'pos'),
("what an awesome view", 'pos'),
('I …
Read More
Text-Classification-Nb
title: "Text Classification - Naive Bayes - Stackoverflow Tags"
author: "Rj"
date: 2019-04-21
description: "-"
type: technical_note
draft: false
# Disclaimer: some code copied form this https://towardsdatascience.com/multi-class-text-classification-model-comparison-and-selection-5eb066197568
import logging
import pandas as pd
import numpy as np
from numpy import random
import gensim
import nltk
from sklearn.model_selection import train_test_split
from …
Read More
Text-Decompose
title: "Text Decompose"
author: "Rj"
date: 2019-04-20
description: "List Test"
type: technical_note
draft: false
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup('<p>This is a slimy text and <i> I am slimer</i></p>')
soup.i.decompose()
print(soup.text)
Score: 0
Read More
Text-Diff
title: "Text Diff"
author: "Rj"
date: 2019-04-20
description: "List Test"
type: technical_note
draft: false
str1 = "I understand how customers do their choice. Difference"
str2 = "I understand how customers do their choice."
seq = difflib.SequenceMatcher(None, str1, str2)
def get_similarity(str1, str2 …
Read More
Text-File-2-Nltk-Text
title: "Text File 2 NLTK Text"
author: "Rj"
date: 2019-04-21
description: "-"
type: technical_note
draft: false
f =open('canola.txt','r')
raw = f.read()
'OTTAWA—The federal Liberals promised Wednesday to give Canada’s canola farmers much-needed financial aid to help lessen the impact of China’s decision …
Read More
Text-Index-And-Slicing
title: "Text Index and Slicing"
author: "Rj"
date: 2019-04-21
description: "-"
type: technical_note
draft: false
f =open('canola.txt','r')
raw = f.read()
'OTTAWA—The federal Liberals promised Wednesday to give Canada’s canola farmers much-needed financial aid to help lessen the impact of China’s decision to …
Read More
Text-Similarity
title: "Text Similarity"
author: "Raja CSP Raman"
date: 2019-04-20
description: "-"
type: technical_note
draft: false
import nltk, string
from sklearn.feature_extraction.text import TfidfVectorizer
stemmer = nltk.stem.porter.PorterStemmer()
remove_punctuation_map = dict((ord(char), None) for char in string.punctuation)
def stem_tokens(tokens):
return [stemmer.stem(item) for item in tokens]
Read More
Text-Similarity-Finder
title: "Text Similarity Finder"
author: "Raja CSP Raman"
date: 2019-04-20
description: "-"
type: technical_note
draft: false
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
def findSimilarity(param1, param2):
documents = (
param1,
param2
)
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(documents)
cosine = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix)
print(cosine)
Read More