Welcome to the Treehouse Community
Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.
Looking to learn something new?
Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.
Start your free trial
Brian Fairlie
Courses Plus Student 126 PointsGrouping Tweet time stamps by 30 minute intervals and then plotting in line chart
I have a csv file of 5000 Tweets that I have read into python, the data has been separated into individual columns (see sample below). I want to group the stamps into 30 minute intervals and then plot the grouped time intervals on a line chart to show the most active times. I was looking at the datetime module, but I believe I need to format the time as well.
time 2015-04-10T18:55:42Z 2015-04-10T19:16:20Z 2015-04-10T13:29:31Z 2015-04-10T18:06:53Z 2015-04-10T15:53:10Z 2015-04-10T19:38:41Z
I hope that makes sense.
3 Answers
Brian Fairlie
Courses Plus Student 126 PointsOk, I figured out the issue. When defining 'tweets', I changed the code to
tweets = df[df['type'] == "TWITTER"]
and it runs just fine
Chris Freeman
Treehouse Moderator 68,468 PointsThere are some sweet utilities for parsing UTC Zulu time into datetime objects for comparison. There is also pytz for convenient timezone changes. Here is some sample code that you might find useful:
import datetime
# use 'sudo pip install python-dateutil'
import dateutil.parser
import pytz
# parse a tweet time into a datetime
date_start = dateutil.parser.parse('2015-04-10T18:55:42Z')
# create a start of time window
start_time = datetime.datetime(2015, 4, 10, tzinfo=pytz.utc)
# create end time 30 minutes later
end_time = starttime + datetime.timedelta(minutes=30)
# compare times
if tweet_time > start_time and tweet_time < end_time:
# group tweet
print("Tweet is in group")
Brian Fairlie
Courses Plus Student 126 PointsHi Chris,
Thank you for responding! So I would still need to define tweet_time and then plot the grouped 30 minute intervals. I 've started a portion that would plot the intervals on a chart:
#plots the data on a line chart
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
x = test.index.to_datetime() #map(lambda x: datetime.strftime(x, "%I:%M %p" ), test.index.to_datetime())
y = test
fig, ax = plt.subplots()
ax.plot_date(x, y, linestyle='-')
#ax.annotate('Test', (mdates.date2num(x[1]), y[1]), xytext=(15, 15),
# textcoords='offset points', arrowprops=dict(arrowstyle='-|>'))
#Break times into 30 minute intervals
ax.set_ylabel("Number of Tweets")
ax.set_xlabel("Time of Day: 30 Min Intervals")
ax.set_title("Best Time To Tweet")
fig.autofmt_xdate()
plt.show()
#Writes data to an Excel file
pandas.DataFrame(test).to_excel("besttimes.xlsx")
I would really appreciate your thoughts.
Brian Fairlie
Courses Plus Student 126 PointsHere is the current version of my entire script.
import datetime
import pandas
#Set encoding to UTF-8 if Arabic characters are expected (remove # to activate)
#import sys
#reload(sys)
#sys.setdefaultencoding("utf-8")
#Read in Sysomos file
df = pandas.read_csv('C:/Users/fairlieb/Desktop/Analytics/DAT6/DAT6-master/data/sysomos-content-2015-04-27.csv')
#Define datasets
df['time'] = pandas.to_datetime(df.time,format="%H:%M:%S")
df['counter'] = 1
tweets = df[df['type'] == "tweet"]
#df = df[df.text.str.contains("in|the|Iraq|Syria|Islamic State|states",flags = re.IGNORECASE )==True]
df.shape
import dateutil.parser
import pytz
tweets['time'] = tweets.time.apply(lambda x: x.tz_localize('UTC').tz_convert("EST"))
tweets['time'] = tweets.time.apply(lambda fulldate: str(fulldate.time()))
tweets['time'] = pandas.to_datetime(tweets.time,format="%H:%M:%S")
tweets.index = tweets['time']
test = tweets['counter'].resample("30 Min",how='sum')
test=test.sort_index()
#plots the data on a line chart
import matplotlib.pyplot as plt
x = test.index.to_datetime() #map(lambda x: datetime.strftime(x, "%I:%M %p" ), test.index.to_datetime())
y = test
fig, ax = plt.subplots()
ax.plot_date(x, y, linestyle='-')
#Break times into 30 minute intervals
#Add chart labels
ax.set_ylabel("Number of Tweets")
ax.set_xlabel("Time of Day: 30 Min Intervals")
ax.set_title("Best Time To Tweet")
fig.autofmt_xdate()
plt.show()
I receive an error message around Line 23 that reads as follows:
main:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_index,col_indexer] = value instead