Skip to main content

C'est la Z

Tag: nltk

Working with texts part 3 - word chains

At this point, we've done a fair amount of playing with text so it's time for a fun little project. We're going to generate some text "in the style" of a source text. The technique we're going to use is usually called a Markov Chain text generator. Basically a model where the next state or word is based entirely on the current state. I don't dwell on the math under the hood but in case you're interested, here are a few links: Wikipedia, Explained Visually, UC Davis Math.

Working with texts part 2 - bag of words

Following up on a previous post, we're going to continue to talk about playing with text. This time, building and working with a bag of words from a text. A bag of words is a simple language processing model where you just consider individual words in a text. What they are and how many times they occur. This is a pretty simple model but you can still have a good bit of fun with your students with it.