Are some of your friends on FB Messenger taking way too long to reply to you? Do you want to know if your friendship improves or deteriorates over time? Or do you want to know how long it takes them to reply on average? If you answered yes to any of these questions, this post is for you! Here, I’ll show you how I made a program in Python which parses data from FB Messenger website to get response times in a conversation, and then used it to plot a graph of response times over time in Excel which you can analyse.
Getting the data
First off, how do you actually get the data from FB Messenger? What would be the most elegant way to obtain the conversation data — using Facebook’s Graph API or Messenger API — won’t work because Facebook recently deprecated the functions in Graph API that handle private messages (the “read_mailbox” permission and “/inbox” operation were deprecated in API v2.4). Therefore, the next best thing that comes to mind would be to get the source of the messenger.com website (an online version of the Messenger app) when the conversation with a specific friend is open, and then use this to extract the conversation times. However, when you actually click “view page source” on such a website, you won’t get any of the actual conversation but only many lines of script, which I suppose is because the conversation is loaded dynamically after you load the website and that doesn’t happen when you just view the source. And so the third best option to get the data, which always works, is to use Google Developer Tools in Chrome (or an alternative for your browser) to get the HTML of the website you’re currently viewing on messenger.com and extract the data directly from there.
So, the way I get the conversation data is to open messenger.com and click on a friend whose response times I’d like to analyse — this will open a website with URL such as https://www.messenger.com/t/JohnSmith. Then, I right click on any of the messages in the window and select “Inspect” from the drop-down menu, which will open Developer Tools. Next, I scroll up through the HTML structure in DevTools until I find a “div” element with parameter “aria-label=’Messages’” — this div contains all the currently-loaded messages and their times and it is this element whose content we need to get. To get the div’s contents, I right click on it and select “Copy > Copy Element”.
However, before copying the whole element, note that messages on messenger.com are loaded dynamically every time you scroll up through your message window. This means that by copying the div’s contents you’re only going to get a certain number of your messages (those which have been currently downloaded), not the whole message history. To get more messages, simply scroll up through your message window and the messages will load automatically. When you’re happy with the number of messages loaded, you can copy the div’s contents as described before.
For now, save the copied div’s content into a file, eg “in_john.xml”. Soon we’re going to use this file to extract message information such as message contents, time sent, and person who sent the message.
Python script to extract information
We’ve got the message data in HTML format, now we need to extract information such as who sent the message, when did they sent it and what was the contents of the message (here the message contents are not necessary to extract, although they may be helpful in debugging). I have decided to use Python for this data processing since it’s a very high-level language and is easy to use. To parse the HTML from the input file, I’m using the BeautifulSoup library for Python which works well even with badly-written HTML code (unlike Python’s “minidom” parser). To be able to parse the HTML, you need to download the BeautifulSoup library from their website (e.g. use this link to download v4.5.3 — the same version I used) and extract it into the same directory as where your Python script is. Also, rename the library directory to “bs4” for use with the Python script.
Without further ado, here’s the source code of the program which I’ll explain shortly:
# Message Parser v1.0
# (c) 2017 Marian Longa
# for help on how to use this program, see http://marianlonga.com/analysis-of-facebook-messenger-response-times
THEIR_NAME = 'John'
MY_NAME = 'Marian'
READ_FILE = 'in_john.xml'
WRITE_FILE = 'out_john.csv'
PRINT_CONVERSATION = True # default: True
PRINT_RESP_TIMES = True # default: True
PRINT_ERRORS = True # default: True
MIN_RESP_TIME = 3 # default: 3 mins
MAX_RESP_TIME = 7200 # default: 7200 mins = 5 days
from bs4 import BeautifulSoup
from datetime import datetime
# need this function to convert message=None into a string (due to some problems with unicode reading)
# TODO: need to add code to recognize replies containing emoji instead of treating them as an error
def none_to_error_str(str):
return 'ERROR' if str is None else str
# convert FB time format into datetime
# TODO: need to add code to read messages from last week
def string_to_datetime(time):
try:
datetime_obj = datetime.strptime(str(datetime.now().year) + ' ' + time,'%Y %B %dth, %I:%M%p')
return datetime_obj
except ValueError:
pass
try:
datetime_obj = datetime.strptime(str(datetime.now().year) + ' ' + time,'%Y %B %dst, %I:%M%p')
return datetime_obj
except ValueError:
pass
try:
datetime_obj = datetime.strptime(str(datetime.now().year) + ' ' + time,'%Y %B %dnd, %I:%M%p')
return datetime_obj
except ValueError:
pass
try:
datetime_obj = datetime.strptime(str(datetime.now().year) + ' ' + time,'%Y %B %drd, %I:%M%p')
return datetime_obj
except ValueError:
pass
try:
datetime_obj = datetime.strptime(time,'%B %d, %Y %I:%M %p')
return datetime_obj
except ValueError as e:
if PRINT_ERRORS:
print(str(e))
datetime_obj = datetime(2000,1,1,0,0,0)
return datetime_obj
# open HTML file with messages
with open(READ_FILE,'r') as myfile:
html = myfile.read().replace('\n','')
# parse HTML
parsed_html = BeautifulSoup(html, 'html.parser')
# extract blocks of conversation
blocks = parsed_html.find_all('div', class_='_41ud')
# extract all messages
full_messages = []
for block in blocks:
# get name
block_name = block.find_all('h5', class_='_ih3')[0].string
# get message contents
block_messages = []
for msg in block.find_all('div', class_='_aok'):
block_messages.append(msg.string)
# get message time
block_times = []
for time_div in block.find_all('div', class_='_3058'):
try:
time = time_div['data-tooltip-content']
except KeyError:
pass # error is ignored eg when block contains a missed call instead of a proper message
datetime_obj = string_to_datetime(time)
block_times.append(datetime_obj)
# add newly parsed messages to full_messages list
for i in range(len(block_messages)):
full_messages.append([block_times[i],block_name,block_messages[i]])
# print all messages
if PRINT_CONVERSATION:
print("Messages")
for message in full_messages:
if message[0].year > 2000: # condition to ignore last week's messages (not implemented yet)
print(message[0].strftime("%Y-%m-%d %H:%M") + ";" + message[1] + ";" + none_to_error_str(message[2]))
print('')
print('')
# get response times
time_responses = []
for i in range(len(full_messages)-1):
if full_messages[i][0].year > 2000: # condition to ignore last week's messages (not implemented yet)
if full_messages[i][1] == MY_NAME and full_messages[i+1][1] == THEIR_NAME:
responseTime = (full_messages[i+1][0] - full_messages[i][0]).total_seconds()/60
# ignore immediate responses (when both people are most likely inside an active conversation)
# and too long responses (when it's not a response but the other person starts a new conversation after a long time)
if responseTime >= MIN_RESP_TIME and responseTime <= MAX_RESP_TIME:
time_responses.append([str(full_messages[i+1][0]),str(responseTime)])
# print response times
if PRINT_RESP_TIMES:
print("Response Times (mins)")
for time_response in time_responses:
print(time_response[0] + ";" + time_response[1])
print('')
print('')
# save response times into a CSV file
write_file = open(WRITE_FILE,'w')
for time_response in time_responses:
write_file.write(time_response[0] + ";" + time_response[1] + "\n")
write_file.close()
print("File written successfully")
Firstly, on lines 5-13 I define a few constants to make using the program easier for different cases. Replace THEIR_NAME with the first name of the person who you’re having the conversation with and YOUR_NAME with your first name — this will be used when determining who said what in the conversation. Then, set READ_FILE to the file name containing the HTML source of the messages div that you downloaded previously and WRITE_FILE to the file name where you’d like the output of the program (response times) to be placed. Lines 9-11 are used for printing various stuff for debugging purposes but have no effect on the output of the program. Lastly, the MIN_RESP_TIME and MAX_RESP_TIME variables set the minimum and maximum response times which will be recorded. (If the reply time is too big, e.g. a week, it probably means that the other person started a new conversation not that they replied to your message after a long time. And if the reply time is too small, e.g. under 3 minutes, it probably means that you’re currently in a conversation with the other person and the times in which they respond to you approximately immediately are not interesting for me to analyse).
Lines 15-16 just import the BeautifulSoup library for HTML parsing and Python’s datetime library for handling the times when messages were sent. I’ll explain the functions on lines 20 and 25 once they’re used in the program.
So, we start by opening the input file with the div containing our FB messages and place its contents into `html` (lines 61-62). Then we use a function of the BeautifulSoup library to parse this HTML and put it into `parsed_html`. Next, by playing with the Chrome Developer Tools on the Messenger website for a while, I have discovered that all message blocks, i.e. groups of messages sent by the same person consecutively, are stored in a div with class name `_41ud`. For example, if I send two messages and then another person sends three messages and then I send four messages after that, there’s going to be 3 message ‘blocks’, with the first one containing my two messages, the second one containing his 3 messages, and the third one containing my 4 messages. To get all such message blocks, all we need to do is to call the bs4’s method `find_all` with `div` and the class name`_41ud` as parameters, which will return a list of message blocks into the variable `blocks` (line 68).
Next, once we have the message blocks, we need to extract the sender’s name, the message contents and the time sent for each message (lines 71-93). The output of this part of the script is a list containing multiple entries in the format [time, sender’s name, message] which are stored in the variable full_messages. To do this, we go through all message blocks (line 72) and for each one we first extract the sender’s name (line 75) by looking for the first element `h5` with class `_ih3` that occurs in a message block. Again, the fact that we need to look for this exact element was experimentally determined by me looking at the HTML code with Developer Tools and noticing that the sender’s name always occurs in such `h5` element at the start of each message block (individual messages themselves don’t need a sender field since all messages in a single block are sent by the same person). Next, we get all the messages in a block by looking for all divs with class `_aok` in a specific block and saving them into a list block_messages (lines 78-80). Finally, we extract the message times (lines 83-90). It so happens that the date/time when a message was sent is stored as a value field of the attribute `data-tooltip-content` in divs with class `_3058` (which is only shown to the user when mouse is over a specific message), and so we go through all divs with this class (line 84) and for each one try to get the `data-tooltip-content` attribute value (lines 85-88). This is surrounded by try-except, because this method may sometimes fail if the div doesn’t contain a message but instead has a missed call notification, in which case the error is just ignored. We then convert the extracted time from a Facebook representation into a datetime format using the function string_to_datetime (line 89) and add the sent time into a list of times for the current block (line 90). Lastly, now that we’ve extracted the times, sender names and messages for the current block, we add all of them to our list full_messages containing all the message data (lines 93-94).
Now let’s get back to the function we used on line 89 — string_to_datetime. Since the message dates/times we extracted in the last paragraph occur in different string formats, we need a function to convert those formats into a datetimeobject which can then be used for performing operations like subtracting two datetimes to get time differences. The function string_to_datetime (lines 25-58) does this conversion by checking against various time formats, for example “March 20th, 8:20pm” (lines 26-30), “March 1st, 8:20pm” (lines 32-36), “March 2nd, 8:20pm” (lines 38-42), “March 3rd, 8:20pm” (lines 44-48), and “March 05, 2015 8:20 pm” (lines 50-56). Each of those blocks of code tries to convert the string `time` into a datetime object using the function `strptime` (string parse time). If the conversion was successful, the datetime object is returned, otherwise a ValueError is raised and subsequently ignored, and a different date format is tried. If all of them fail, the error might be printed (lines 54-55) and the datetime object is set to a default of 01/01/2000 (This will happen for example if the message parsed is less than a week old, since I haven’t made a code yet to parse such date formats).
Going back to the program, after all message data has been stored inside the full_messages list, we can print all the conversation for debugging purposes (lines 98-104). There, we just go through all the messages in the full_messages list and if the date has been extracted successfully (i.e. message year > 2000), we print the message date/time, sender name, and the message (line 102). The function none_to_error_str is just a custom function (defined on lines 20-21) which prevents errors when a message contains e.g. an emoji and can’t be parsed properly and instead of a string returns None — this function will change None to a string saying “ERROR” instead.
Next, using the message data from full_messages, we’re going to calculate the response times (lines 107-115). The main point to realise here is that the response time is the time difference between a message I sent at some time and a consecutive message someone else sent at some later time. The program goes through all the messages (line 108) and if they have been correctly parsed (line 109) it checks if the current message is sent by me and the next message is sent by the other person (line 110). If it is, it calculates the time difference between those two messages in minutes (line 111), and if this response time is between the minimum and maximum response times recorded (line 114), it stores the date/time of the other person’s reply and their response time into a list time_responses (line 115).
We can now print the response times calculated before for debugging purposes (lines 118-123). The program will go through the time_responses list and for each list element it will print the date/time of reply and the response time.
Finally, we use the data from time_responses to write the dates/times of reply with corresponding response times into a file separated by semicolons (lines 126-129). This file can then be used as an input CSV file for importing into Excel and plotting the response times of a given person over time.
Analysing response times in Excel
Now that we have an output CSV file containing dates/times of reply and corresponding response times, we can use Excel to import it and analyse the data. Open Excel and import the output file as a CSV file with delimiter set to semicolon. You should get two columns, the first one containing the dates and times of replies and the second one containing the response times in minutes. I prefer to add a third column where I calculate the response times in hours to make the results more readable (just divide second column by 60). Then you can plot the response times (column 3) vs time (column 1) using a scatter plot, add trendlines to your data, calculate the mean response time, etc. Here’s an example for one of my friends.
For this particular friend, I have calculated the mean response time to be around 5 hours and from the trend line you can see that the response time is relatively steady.
Here’s a graph for a different friend.
For this friend, I have calculated the mean response time to be around 1 day (around 3 days if only considering data points after 08/2016). You can also clearly see from the trend line that the average response time increases significantly over time. This suggests a possible deterioration of friendship over time (alternatively, this can also be explained by the fact that apparently the friend usually doesn’t check his messenger for a few days).
Another interesting thing you might want to do is compare the response times among your friends. Here’s a graph comparing the response times of 6 of my friends.
You can see that most of the friends have their response times steady at around 5 hours, while the response time of one of them is increasing significantly over time (this is the same friend as the one shown on the previous graph).
Zooming in, you can compare the other 5 people better, although the statistics here is too imprecise to give any concrete conclusions about the friendships with these people.
So, I hope you’ve enjoyed this post and had some fun with message analysis. As a note, these graphs should be treated only as a very approximate descriptor of friendship, if any, since shorter reply times do not necessarily mean better friendship. Also be careful how you interpret positive or negative slopes of trend lines, since the relatively large error in a line’s slope might render your conclusions invalid if the slope is too small. I guess the main point of this is just to have some fun and satisfy your curiosity about reply times of your friends and perhaps learn some data analysis, but don’t get too serious about it!
Leave a Reply