text processing

Is English Kurtotic?

Posted on February 7, 2020 | 6 minutes | Erin M. Buchanan

You ever have a random text that sent your brain to work? Here’s mine today: KD Text Followed up with examples that lol is bimodal, while loop is positively skewed, and enter is “almost normal”. The lovely K.D. posed this question to me earlier, and I already have procrastinated a lot today, so here’s to more! First, I typed out some fonts in Word to help me figure out how to code the two important parts for this question: width and height. [Read More]

psycholinguistics fun text processing

Getting Translations with rvest and Selenium

Posted on October 7, 2019 | 4 minutes | Erin Buchanan

In this guide, I’ll go over how you can use web scraping rvest and Selenium to get translations from Google Translate. Note: I encourage responsible scraping - I always try to do it with some space between requests. You can only do 5000 characters at a time with the free Google translate. I will say that I tried to do this with just rvest and the predictability of the links for Google translate - but I could not get rvest to pull the right data off the page, so here’s a slightly more difficult approach that appears to work. [Read More]

text processing rstudio how-to github guides

Gathering Text from the Web

Posted on May 7, 2018 | 6 minutes | Erin M. Buchanan

Hi everyone! I don’t really feel like working too hard today, so I decided to write a blog post about how my student Will and I used rvest to mine articles from several different news sources for a project. All the scripts and current ongoings of this project can be found on our OSF page - this project is also connected to the GitHub folder with the files. First, we picked four web sources to scrape - The New York Times, NPR, Fox News, and Breitbart because of their known political associations, and specifically, we focused on their political sections. [Read More]

rstudio rvest text processing

Working With Messy Text

Posted on March 6, 2018 | 6 minutes | Erin Buchanan

Heyo! I am doing my best to procrastinate here on a blustery Tuesday afternoon. So, I decided to share some code I’ve put together that solves problems in R that I used to do in perl. HTML or C++ was probably my first real language, but I love the heck out of perl. It’s never done me wrong (unlike you PHP). Anyways! The context of this project is that we are developing a dictionary of words to complement the work done by Jonathan Haidt and Jesse Graham - learn more. [Read More]

r rstudio text processing