Work with dates
Get true date format from string data
Example, you have dates like 21/09/2022 but you would a true date format to work with it:
[code]df['TrueDate'] = pd.to_datetime(df['OriginalDate'], infer_datetime_format=True).dt.strftime('%Y-%m-%d')[/code]
Group by months
If you have a data-frame df with a field Created at containing dates like:
2021-02-05 10:34:21 UTC
You can work with it, example to group and sort by month, you can cheat doing:
[code]df_Temp = pd.DataFrame(df['Created at'].str[:7])
df_Created_count = pd.DataFrame(df_Temp.groupby(['Created at'], dropna=False).size(), columns=['Total']).sort_values(['Created at'], ascending=True).reset_index()[/code]
indeed with str[:7] we extract years and months from strings. Then a usual groupby.
Ok but it is more a pandas-way to convert your field in a true date format, in a new virtual field. Then you can extract real months and store them in a proper data-frame.
[code]df['Created'] = pd.to_datetime(df['Created at'])
df['Created'] = df['Created'].dt.to_period("M")
df_TempCreated = pd.DataFrame(df['Created'])[/code]
Then easy to group and count them.
[code]df_Created_count = pd.DataFrame(df_TempCreated.groupby(['Created'], dropna=False).size(), columns=['Total']).sort_values(['Created'], ascending=True).reset_index()[/code]
As now you use a real date format, you can display the full months with dt.strftime.
[code]df_Created_count['Created'] = df_Created_count['Created'].dt.strftime('%B %Y')[/code]
Format date
If you get date like 2022-10-24 and you would 24/10/2022:
[code]df['My date'] = pd.to_datetime(df['My date'], format='%Y-%m-%d')
df['My date'] = df['My date'].dt.strftime("%d/%m/%Y")[/code]
Format date from french format
If you get date like 11/05/2023 and you would 2023-05-11:
[code]df['My date'] = df['French date'].str.split('/').str[2] + '-' + df['French date'].str.split('/').str[1] + '-' + df['French date'].str.split('/').str[0][/code]
Get simple date (DDDDMMDD)
[code]# SIMPLE DATE
df_DataUnion['date_simple'] = df_DataUnion['date_submit'].astype(str).str[:10]
df_DataUnion['date_simple'] = df_DataUnion['date_simple'].replace({'-': ''}, regex=True)[/code]
Extract the previous month
[code]PreviousMonth = today - pd.DateOffset(months=1)
print(PreviousMonth.strftime("%Y-%m"))[/code]
Convert false date field from Excel
Excel can concatenate true date format and string in the same column, arrgh! Later when you would use it in Pandas...
I use this tip to manage these kind of malicious fields in pandas:
[code]df.loc[~df_GDC['date'].str.contains(' ', na=False), 'date'] = df['date'].astype(str).str.split(expand=True)[0]
df.loc[~df_GDC['date'].str.contains(' ', na=False), 'date'] = \
df['date'].astype(str).str.split('-', expand=True)[2] + '/' +\
df['date'].astype(str).str.split('-', expand=True)[1] + '/' +\
df['date'].astype(str).str.split('-', expand=True)[0][/code]
Eventually I put the code above in a try:... except:...
Eventually I convert all my field in string before, with astype(str)
Manage numeric date from Excel
Sometimes dates from Excel come as integer, arrgghhh!
[code]import xlrd
def read_date(date):
return xlrd.xldate.xldate_as_datetime(date, 0)
df.loc[df['My date'].astype(str).str.isdigit() == True, 'My date'] = \
pd.to_datetime(df['My date'].loc[df['My date'].astype(str).str.isdigit()].astype(int).apply(read_date), errors='coerce')[/code]
Check date format
Sometimes (especilally when it comes from Excel...) it is good to check your dates after treatments.
Below we search dates not starting with DD/MM/YYYY:
[code]df2 = df[~df['my date'].str.contains('^[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]', regex=True)]
print(tabulate(df2, headers='keys', tablefmt='psql', showindex=False))[/code]
Check if a date is in the future
There are many ways, but here converting in number:
[code]from datetime import date, datetime
today = date.today()
df_ProblemeDate = df[['My date']][pd.to_numeric(df['My date'].str.split('/').str[2] + df['My date'].str.split('/').str[1] + df['My date'].str.split('/').str[0]) > int(str(today).replace('-', ''))][/code]
Most recent date
mostRecentDate = df['Date'].max()
Most old date
mostOldDate = df['Date'].min()
Period between 2 dates
periodFromToday = today - mostOldDate
print(periodFromToday.days)
Count opening days with the same date column between the 2 following lines
Imagine each line of your dataframe is a delivery, with a Date field:
[code]import pandas as pd
from tabulate import tabulate
from datetime import date
import numpy as np
import datetime as dt
...
df['Year'] = df['Date'].str.split('/').str[2]
df['Month'] = df['Date'].str.split('/').str[1]
df['Day'] = df['Date'].str.split('/').str[0]
df['Opening days'] = ''
df.reset_index(drop=True, inplace=True)
for i, row in df.iterrows():
dateDeliveryYear = int(row['Year'])
dateDeliveryMonth = int(row['Month'])
dateDeliveryDay = int(row['Day'])
if i == 0:
df.at[i, 'Opening days'] = 'First delivery'
if i > 0:
beforeDateDeliveryYear = int(df.iloc[[i-1]]['Year'].item())
beforeDateDeliveryMonth = int(df.iloc[[i-1]]['Month'].item())
beforeDateDeliveryDay = int(df.iloc[[i-1]]['Day'].item())
start = dt.date(beforeDateDeliveryYear,beforeDateDeliveryMonth,beforeDateDeliveryDay)
end = dt.date(dateDeliveryYear,dateDeliveryMonth,dateDeliveryDay)
df.at[i, 'Opening days'] = np.busday_count(start, end)
print('\n' + u)
print(tabulate(df.head(10), headers='keys', tablefmt='psql', showindex=True))
print(df.shape[0])[/code]