text processing on Dr. Erin Buchanan

text processing on Dr. Erin Buchanan https://doomlab.github.io/tags/text-processing/ Recent content in text processing on Dr. Erin Buchanan Hugo -- gohugo.io en Fri, 07 Feb 2020 00:00:00 +0000 Is English Kurtotic? https://doomlab.github.io/post/is-english-kurtotic/ Fri, 07 Feb 2020 00:00:00 +0000 https://doomlab.github.io/post/is-english-kurtotic/ You ever have a random text that sent your brain to work? Here’s mine today: KD Text Followed up with examples that lol is bimodal, while loop is positively skewed, and enter is “almost normal”. The lovely K.D. posed this question to me earlier, and I already have procrastinated a lot today, so here’s to more! First, I typed out some fonts in Word to help me figure out how to code the two important parts for this question: width and height. Getting Translations with rvest and Selenium https://doomlab.github.io/post/getting-translations-with-rvest-and-selenium/ Mon, 07 Oct 2019 00:00:00 +0000 https://doomlab.github.io/post/getting-translations-with-rvest-and-selenium/ In this guide, I’ll go over how you can use web scraping rvest and Selenium to get translations from Google Translate. Note: I encourage responsible scraping - I always try to do it with some space between requests. You can only do 5000 characters at a time with the free Google translate. I will say that I tried to do this with just rvest and the predictability of the links for Google translate - but I could not get rvest to pull the right data off the page, so here’s a slightly more difficult approach that appears to work. Gathering Text from the Web https://doomlab.github.io/post/gathering-text-from-the-web/ Mon, 07 May 2018 00:00:00 +0000 https://doomlab.github.io/post/gathering-text-from-the-web/ Hi everyone! I don’t really feel like working too hard today, so I decided to write a blog post about how my student Will and I used rvest to mine articles from several different news sources for a project. All the scripts and current ongoings of this project can be found on our OSF page - this project is also connected to the GitHub folder with the files. First, we picked four web sources to scrape - The New York Times, NPR, Fox News, and Breitbart because of their known political associations, and specifically, we focused on their political sections. Working With Messy Text https://doomlab.github.io/post/working-with-messy-text/ Tue, 06 Mar 2018 00:00:00 +0000 https://doomlab.github.io/post/working-with-messy-text/ Heyo! I am doing my best to procrastinate here on a blustery Tuesday afternoon. So, I decided to share some code I’ve put together that solves problems in R that I used to do in perl. HTML or C++ was probably my first real language, but I love the heck out of perl. It’s never done me wrong (unlike you PHP). Anyways! The context of this project is that we are developing a dictionary of words to complement the work done by Jonathan Haidt and Jesse Graham - learn more.