The tweet that inspired this shared a tweet that compared useR (a conference about using the R statistical programming language in Brisbane, Australia in 2018), with ICML (a conference about machine learning in Stockholm, Sweden in 2018). The below chunk gets the data for these two conferences and for the 10th International Conference on Teaching Statistics in Kyoto, Japan.
Also note, the n = were chosen by trial and error. Very sophisticated.
There also seemed to be lots of tweets under the ICOTS hashtag that were about Ethereum, which wasn’t really related to us. I’ve filtered these out.
And you might want to look at the dates in a different timezone setting - these conferences are all in different countries and I’ve just taken the time and date created information directly from the Twitter API results with no transformation.
Let’s take a look!
Number of tweets by conference
Can you spot which day ICOTS had a half-day and folks went off exploring Kyoto or to see the deer at Nara instead of tweeting?
What about percentage of tweets getting favourited or retweeted?
I was curious whether support/quality within the hashtags was similar. I think proportion of tweets getting retweeted or favourited might be an interesting measure of the “interestingness” of the tweets, but more likely an indicator of how supportive the group on that hashtag are.
What better place to explore emojis from than the country that gave us the language that gave us the word! Ya follow? Emoji is a Japanese word that, if a quick Google search is to be believed, means “picture character”.
There is a table provided by the Unicode organisation that has the emoji, and an additional one that details skintone variations. These also provide a text description and groups the emoji into wider groups, like flags, buildings, types of facial expression etc. It would be ideal to have that information in R to work with directly, so I want to scrape the tables into R. I can check if this seems to be allowed by using the robotstxt package. There are a range of other ways to get databases like this, this just seemed easiest for this case.
Checking if we can scrape the emoji lists
I’ve removed the output, but when I ran it, the bits we wanted were okay, and the crawl delays were only for certain bots.
The paths_allowed() functions will return true if the path is allowed, based on the robots.txt file for the domain.
##  TRUE
##  TRUE
Both come back TRUE, we are allowed to scrape them.
Scraping the emoji lists
Now to clean up the tables we scraped, so they are nice for us to use later.
One wrinkle is that as far as I can tell, you can’t stop the Twitter app from truncating tweets that have been retweeted when using twitteR. This seems possible in rtweet - but I couldn’t get the authentication working. More on the GitHub issues page.
So, I’m just going to try to make this work out by getting all the unique tweets (“RT @username” will be in front of the retweeted ones) and then removing all the tweets that had been truncated (i.e. retweeted). If my logic is right, this should leave us with the untruncated text of all tweets, just some of them will be the retweeted version. This shouldn’t effect the emojis.
I am also going to remove the midway summary tweet that I shared with the current top emojis as that would increase the counts for those tweets.
Top 5ish emojis from three different conferences (as of 15 mins ago). Ranked 1-5ish, left to right, in ( ) means equal counts.\#ICOTS10 😍👏(😂🐱🎨🎼)\#useR2018: 📦👌(😍🌧️)\#ICML2018: 🔥😂(😁🤔🙌✈️✔️)
How many tweets (not counting retweets) use at least one emoji in this data?
So 7% of ICOTS tweets had at least one emoji in them, 11% of useR tweets and 3% of ICML tweets.
Which emoji were the most popular?
This is a not very elegant function to count how many times each emoji in the emoji_ref dataframe appears in the text you’re analysing (in the form of the all columns from the emoji_spread dataset). I’m also not 100% sure the tryCatch is set up properly - it needs a tryCatch because I keep getting errors for the keycap: * emoji.
The second function doesn’t count repeats of emoji in that same tweet. I know at least one of my tweets had a gratuitous number of hand clap emojis. Will not counting repeats in a tweet change the top used emoji? str_detect is quite useful for that.
You can short by either ranking (count or rank include repeats in a tweet, count_single and rank_single don’t include repeats within a tweet)
Top #ICOTS10 emojis
Top #useR2018 emoji
Top #ICML2018 emoji
More from ICOTS
Most of my tweets from this conference are in the below thread and I also set up a list of resources to check out after the conference based on people’s recommendations.
In earlier versions of this post I calculated what proportion of the #ICOTS10 tweets were my tweets but used ALL ICOTS tweets, including all the many retweets. I tweeted 6% of the #ICOTS10 tweets including retweets (my own and others’), but 15% of the original #ICOTS10 tweets (i.e. not counting retweets). What a loudmouth!