Automated Email with Python

Automated email notifications and task tracking system.

This article explains how to use a Python script in conjunction with a simple Action Tracking spreadsheet to create an automated email notifications and task-tracking system for your team.

To begin, let’s setup our “ActionTracker” spreadsheet as shown below:

ActionTracker_Overview

We can use the expression “=IF(ISBLANK(H2)=TRUE,”Active”,”Closed”)” in our Status column to acknowledge when a date has been entered in the “Completion Date” column.  This will help our script later on.

The “Days Open” column can be calculated using “=IF(ISBLANK(H2)=FALSE,H2-F2,TODAY()-F2)”.  As your list grows, be sure to drag down your formulas.

It can be helpful to apply conditional formatting here in order to see which items are “Open” and late, so that we know which items we expect to send notifications about.  This can be accomplished by the expression shown below, but it is not a necessary step.  Again, remember to update your applicable range as your list grows.

ConditionalFormatting

On our “Email” tab, we’ll list our unique assignees by name and add their email addresses (separated by a comma and a space) in column B.

EmailTab

In order to minimize errors, we can apply a Data Validation rule to our “Assignee” column on the “ActionList” tab.  We’ll select all of the unique names on our “Email” tab as the Source validation criteria.  New emails can easily be added to this list, however, we must update our Source range.

DataValidation

Here’s a download link for the ActionTracker template.

Next, we’ll use the following Python script to send automated email notifications to our team for any Open actions that are open for more than three days.  The three day threshold can be easily adjusted in line #50 of the script below.

Note: In order to allow the script to access your gmail account, make sure that your less secure app access settings are currently turned on.

import smtplib
import pandas as pd
import sys
from tkinter import Tk
from tkinter import filedialog
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart

#Define email login information.
from_email="_____@gmail.com" #Replace with your email address.  Must be gmail.
pw="_____" #Replace with gmail password.

#Select file to use.
Tk().withdraw()
filepath=filedialog.askopenfilename(defaultextension='*.xlsx',
                                    filetypes=[('.xlsx files','*.xlsx'),
                                               ('All files','*.*')],
                                    initialdir=r'C:\Users')
if filepath=="." or filepath=="":
    sys.exit(0)

#Import ActionTracker
ActionTracker = pd.DataFrame(pd.read_excel(filepath,sheetname='ActionList',
                                           parse_cols='A:E'))
ActionTracker_maxrow=len(ActionTracker)
status=ActionTracker.iloc[:,0]
LineItem=ActionTracker.iloc[:,1]
action=ActionTracker.iloc[:,2]
person=ActionTracker.iloc[:,3]
DaysOpen=ActionTracker.iloc[:,4]

#Import Email Addresses by name
EmailList=pd.DataFrame(pd.read_excel(filepath,sheetname='Email',index_col=0,
                                     parse_cols='A:B'))

#Establish connection to gmail server, login.
server = smtplib.SMTP('smtp.gmail.com',587)
server.starttls()
server.login(from_email, pw)

msg=MIMEMultipart('alternative')
msg.set_charset('utf8')
msg['FROM']=from_email

#Initialize a list of submittals with invalid email addresses
invalid_addresses=[]

#Send emails to late Action Tracker assignees
for i in range(0,ActionTracker_maxrow):
    if status[i]=='Active' and DaysOpen[i]>3:
        print('Active Line Item #'+str(LineItem[i])+': '+person[i])
        msg=MIMEText("Action Tracker Line Item #" + str(LineItem[i]) + " has been open for " +
                     str(DaysOpen[i]) + " days.\n\n" + str(action[i]) +
                     "\n\nPlease take action.",_charset="UTF-8")
        msg['Subject']="Open Action #" + str(LineItem[i])
        msg['TO']=str(EmailList.iloc[EmailList.index.get_loc(person[i]),0])
        try:
            server.sendmail(from_email, msg['TO'].split(","),
                            msg.as_string())
        except smtplib.SMTPRecipientsRefused:
            invalid_addresses.append(LineItem[i])
            print('Line Item #' + str(LineItem[i]) + 'has an invalid email address.')

if len(invalid_addresses) != 0:
    for i in range(0,len(invalid_addresses)):
        invalid_addresses[i]=invalid_addresses[i].strip('\xa0')
    try:
        if len(invalid_addresses)==1:
            msg=MIMEText(str(invalid_addresses) +
            " has an invalid email address associated with the responsible party.",
            _charset="UTF-8")
        else:
            msg=MIMEText(str(invalid_addresses) +
                             " have invalid email addresses associated with the responsible parties.",
                             _charset="UTF-8")
        msg['Subject']='Invalid Email Addresses'
        msg['TO']=str(from_email)
        server.sendmail(from_email, msg['TO'].split(","),
                        msg.as_string())
    except smtplib.SMTPRecipientsRefused:
        print('Invalid Email Address notification email failed.')

server.quit()

 

And that’s it.  Full automation can be achieved by hard-coding in the file location and using Windows Task Scheduler to execute the Python script.

Finding Correlations

Script for normalizing and finding correlations across variables in a numeric dataset.  Data can be analyzed as a whole or split into ‘n’ many subsets.  When split, normalizations are calculated and correlations are found for each subset.

Input is read from a .csv file with any number of columns (as shown below).  Each column must have the same number of samples.  Script assumes there are headers in the first row.

Input

import numpy as np

#Divides a list (or np.array) into N equal parts.
#http://stackoverflow.com/questions/4119070/how-to-divide-a-list-into-n-equal-parts-python
def slice_list(input, size):
    input_size = len(input)
    slice_size = input_size // size
    remain = input_size % size
    result = []
    iterator = iter(input)
    for i in range(size):
        result.append([])
        for j in range(slice_size):
            result[i].append(iterator.__next__())
        if remain:
            result[i].append(iterator.__next__())
            remain -= 1
    return result

#Functions below are from Data Science From Scratch by Joel Grus
def mean(x):
    return sum(x)/len(x)

def de_mean(x):
    x_bar=mean(x)
    return [x_i-x_bar for x_i in x]

def dot(v,w):
    return sum(v_i*w_i for v_i, w_i in zip(v,w))

def sum_of_squares(v):
    return dot(v,v)

def variance(x):
    n=len(x)
    deviations=de_mean(x)
    return sum_of_squares(deviations)/(n-1)

def standard_deviation(x):
    return np.sqrt(variance(x))  

def covariance(x,y):
    n=len(x)
    return dot(de_mean(x),de_mean(y))/(n-1)

def correlation(x,y):
    stdev_x=standard_deviation(x)
    stdev_y=standard_deviation(y)
    if stdev_x >0 and stdev_y>0:
        return covariance(x,y)/stdev_x/stdev_y
    else:
        return 0

#Read data from CSV
input_data=np.array(np.genfromtxt(r'C:\Users\Craig\Documents\GitHub\normalized\VariableTimeIntervalInput.csv',delimiter=",",skip_header=1))
var_headers=np.genfromtxt(r'C:\Users\Craig\Documents\GitHub\normalized\VariableTimeIntervalInput.csv',delimiter=",",dtype=str,max_rows=1)

#Determine number of samples & variables
number_of_samples=len(input_data[0:,0])
number_of_allvars=len(input_data[0,0:])

#Define number of samples (and start/end points) in full time interval
full_sample=number_of_samples
full_sample_start=0
full_sample_end=number_of_samples

#Define number of intervals to split data into
n=2
dvar_sublists={}
max_sublists=np.zeros((number_of_allvars,n))
min_sublists=np.zeros((number_of_allvars,n))
subnorm_test=np.zeros((full_sample_end, number_of_allvars+1))

#Slice variable lists
for dvar in range(0,number_of_allvars):
    dvar_sublists[dvar]=slice_list(input_data[:,dvar],n)
    for sublist in range(0,n):
        max_sublists[dvar,sublist]=np.max(dvar_sublists[dvar][sublist])
        min_sublists[dvar,sublist]=np.min(dvar_sublists[dvar][sublist])

var_interval_sublists=max_sublists-min_sublists

#Normalize each sublist.
for var in range(0, number_of_allvars):
    x_count=0
    for n_i in range(0,n):
        sublength=len(dvar_sublists[var][n_i])
        for x in range(0,sublength):
            subnorm_test[x_count,var]=(dvar_sublists[var][n_i][x]-min_sublists[var,n_i])/var_interval_sublists[var,n_i]
            subnorm_test[x_count,6]=n_i
            x_count+=1

var_sub_correlation=np.zeros((n,number_of_allvars,number_of_allvars),float)

#Check for correlation between each variable
for n_i in range(0,n):
    for i in range(0,number_of_allvars):
        icount=0
        for j in range(0,number_of_allvars):
            jcount=0
            starti=icount*len(dvar_sublists[i][n_i])
            endi=starti+len(dvar_sublists[i][n_i])
            startj=icount*len(dvar_sublists[j][n_i])
            endj=startj+len(dvar_sublists[j][n_i])
            var_sub_correlation[n_i,i,j]=correlation(subnorm_test[starti:endi,i],subnorm_test[startj:endj,j])

#Writes to CSV
np.savetxt(r'C:\Users\Craig\Documents\GitHub\normalized\sublists_normalized.csv',subnorm_test, delimiter=",") 

print(var_sub_correlation, 'variable correlation matrix')

Filenames to CSV

Python script for writing all filenames in the selected folder to ‘Filenames.csv’.


import glob
from tkinter import Tk
from tkinter import filedialog

def FileList(filepath):
    return glob.glob(str(filepath) + '*')

Tk().withdraw()     
folderpath_user=filedialog.askdirectory()

if folderpath_user.endswith('\\'):
    folderpath=folderpath_user
else:
    folderpath=str(folderpath_user)+ '\\'
    
with open('Filenames.csv', 'w') as f:
    f.write('{0},{1}\n'.format("Filepath","Filename"))    
    for file in FileList(folderpath):
        f.write('{0},{1}\n'.format(file,file.split("\\")[1]))

Unique Values from Spreadsheet

Python script for extracting all unique values from each tab of a spreadsheet.
Logs results in Output.csv in the same directory as .py file.


import pandas as pd
import xlrd
from tkinter import Tk
from tkinter import filedialog
 
def UniqueValuesFromSpreadsheet(WorkbookPath,sheet_name):
     df=pd.DataFrame(pd.read_excel(WorkbookPath,sheetname=sheet_name, header=None,index_col=False))   
     df2=pd.unique(df.values.ravel())
     return df2

Tk().withdraw()     
filename=filedialog.askopenfilename()
SheetNames=xlrd.open_workbook(filename).sheet_names()
NumSheets=len(SheetNames)
 
Results={}
 
for i in range(0,NumSheets):
    Results[SheetNames[i]] = UniqueValuesFromSpreadsheet(filename,i)
    print(SheetNames[i], UniqueValuesFromSpreadsheet(filename,i))
 
with open('Output.csv', 'w') as f:
    f.write('{0},{1}\n'.format("Sheet Name", "Value"))    
    for key in Results:
        for i in range(0,len(Results[key])):
           if str(Results[key][i]) != 'nan':
                f.write('{0},{1}\n'.format(key, Results[key][i]))               
                print(key, Results[key][i])