Make a Markov bot

This is pretty neat, again from the egghead.io series it uses rita natural language toolkit. It also uses csv-parse as we're going to be reading out our Twitter archive to make the bot sound like us tweeting.

First of all, to set up the Twitter archive, you'll need to request your data from the Twitter settings page. You'll be emailed a link to download your archive, then when you have downloaded the archive extract out the tweets.csv file, we'll then put that in it's own folder, so from the root of your project:

cd src
mkdir twitter-archive

We'll move our tweets.csv there to be accessed by the bot we're going to go over now.

Use fs to set up a read stream...

const filePath = path.join(__dirname, './twitter-archive/tweets.csv')

const tweetData =
  fs.createReadStream(filePath)
  .pipe(csvparse({
    delimiter: ','
  }))
  .on('data', row => {
    console.log(row[5])
  })

When you run this from the console you should get the output from your Twitter archive.

Now clear out things like @ and RT to help with the natural language processing we'll set up two functions cleanText and hasNoStopWords

cleanText will tokenize the text delimiting it on space ' ' filter out the stop words then .join(' ') back together with a space and .trim() any whitespace that may be at the start of the text.

function cleanText(text) {
  return rita.RiTa.tokenize(text, ' ')
    .filter(hasNoStopWords)
    .join(' ')
    .trim()
}

The tokenized text can then be fed into the hasNoStopWords function to be sanitized for use in tweetData

function hasNoStopWords(token) {
  const stopwords = ['@', 'http', 'RT'];
  return stopwords.every(sw => !token.includes(sw))
}

Now that we have the data cleaned we can tweet it, so replace console.log(row[5]) with inputText = inputText + ' ' + cleanText(row[5]) then we can use rita.RiMarkov(3) the 3 being the number of words to take into consideration. Then use markov.generateSentences(1) with 1 being the number of sentences being generated. We'll also use .toString() and .substring(0, 140) to truncate the result down to 140 characters.

const tweetData =
  fs.createReadStream(filePath)
  .pipe(csvparse({
    delimiter: ','
  }))
  .on('data', function (row) {
    inputText = `${inputText} ${cleanText(row[5])}`
  })
  .on('end', function(){
    const markov = new rita.RiMarkov(3)
    markov.loadText(inputText)
    const sentence = markov.generateSentences(1)
      .toString()
      .substring(0, 140)
  }

Now we can tweet this with the bot using .post('statuses/update'... passing in the sentence variable as the status logging out when there is a tweet.

const tweetData =
  fs.createReadStream(filePath)
    .pipe(csvparse({
      delimiter: ','
    }))
    .on('data', row => {
      inputText = `${inputText} ${cleanText(row[5])}`
    })
    .on('end', () => {
      const markov = new rita.RiMarkov(3)
      markov.loadText(inputText)
      const sentence = markov.generateSentences(1)
        .toString()
        .substring(0, 140)
      bot.post('statuses/update', {
        status: sentence
      }, (err, data, response) => {
        if (err) {
          console.log(err)
        } else {
          console.log('Markov status tweeted!', sentence)
        }
      })
    })
}

If you want your sentences to be closer to the input text you can increase the words to consider in rita.RiMarkov(6) and if you want to make it gibberish then lower the number.

Here's the completed module:

const Twit = require('twit')
const fs = require('fs')
const csvparse = require('csv-parse')
const rita = require('rita')
const config = require('./config')
const path = require('path')

let inputText = ''

const bot = new Twit(config)

const filePath = path.join(__dirname, '../twitter-archive/tweets.csv')

const tweetData =
  fs.createReadStream(filePath)
    .pipe(csvparse({
      delimiter: ','
    }))
    .on('data', row => {
      inputText = `${inputText} ${cleanText(row[5])}`
    })
    .on('end', () => {
      const markov = new rita.RiMarkov(10)
      markov.loadText(inputText)
      const sentence = markov.generateSentences(1)
        .toString()
        .substring(0, 140)
      bot.post('statuses/update', {
        status: sentence
      }, (err, data, response) => {
        if (err) {
          console.log(err)
        } else {
          console.log('Markov status tweeted!', sentence)
        }
      })
    })
}

function hasNoStopWords(token) {
  const stopwords = ['@', 'http', 'RT']
  return stopwords.every(sw => !token.includes(sw))
}

function cleanText(text) {
  return rita.RiTa.tokenize(text, ' ')
    .filter(hasNoStopWords)
    .join(' ')
    .trim()
}

Previous: Tweet media files.

Next: Retrieve and tweet data from a Google sheet.

results matching ""

    No results matching ""