Text to Vector

import nltk
content = "The Democrats — including more than 50 freshmen — are mindful that impeachment poses political risks that could endanger the seats of moderates and their majority, as well as strengthen Mr. Trump’s hand. "
content
'The Democrats — including more than 50 freshmen — are mindful that impeachment poses political risks that could endanger the seats of moderates and their majority, as well as strengthen Mr. Trump’s hand. '
tokens = nltk.word_tokenize(content)
tokens
['The',
 'Democrats',
 '—',
 'including',
 'more',
 'than',
 '50',
 'freshmen',
 '—',
 'are',
 'mindful',
 'that',
 'impeachment',
 'poses',
 'political',
 'risks',
 'that',
 'could',
 'endanger',
 'the',
 'seats',
 'of',
 'moderates',
 'and',
 'their',
 'majority',
 ',',
 'as',
 'well',
 'as',
 'strengthen',
 'Mr.',
 'Trump',
 '’',
 's',
 'hand',
 '.']
type(tokens)
list
Why text to vector?

To to machine learning on text, we need to transform our documents into vectors so we can apply numeric machine learning. This is called feature extraction or vectorization.