Create Folder Structure with a Batch Script

Generate uniform folder hierarchies using a simple batch file.

Take for example, the folder hierarchy shown below. This is our default starting point for any new project.
FolderHierarchy
Using a batch file to create directories for a new project is an easy way of making sure that we’re maintaining consistent structure.

Save the following text as a .BAT file and run it anytime a new project needs to be created. The script prompts the user for a project name and will create all directories as shown above.

@echo off
set /p project="Enter Project Name: "

MkDir "C:\Engineering\%project%\Documents\"
MkDir "C:\Engineering\%project%\Documents\Assembly Procedures\"
MkDir "C:\Engineering\%project%\Documents\Design Requirements\"
MkDir "C:\Engineering\%project%\Documents\Test Procedures\"

MkDir "C:\Engineering\%project%\Drawings\"
MkDir "C:\Engineering\%project%\Drawings\Assembly Drawings\"
MkDir "C:\Engineering\%project%\Drawings\Part Drawings\"

MkDir "C:\Engineering\%project%\3D Models\"
MkDir "C:\Engineering\%project%\3D Models\3D Printer\"

Check if file exists with VBA

UDF for checking if a file exists.

As mentioned in the title, the code below is for checking if a file exists in a certain directory. I find this function useful for monitoring if drawings have been created and are ready for production.

Public Function InFolder(Folderpath As String, Filename As String, FileExtension As String) As Boolean
'Infolder returns TRUE if a file is found in the specified folderpath

'Add backslash (\) to end of folderpath if there isn't one
If Right(Folderpath, 1) <> "\" Then
Folderpath = Folderpath & "\"
End If

'Add period (.) to front of file extension if there isn't one
If Left(FileExtension, 1) <> "." Then
FileExtension = "." & FileExtension
End If

'Create filepath from input arguments
Filepath = Folderpath & Filename & FileExtension

'Check if file exists
If Dir(Filepath) <> "" Then
InFolder = True
Else
InFolder = False
End If

End Function

You can either save this directly in your current workbook or if you’d like the function available in all workbooks, save it to a AddIn file (.xlam) in the default folder and enable the AddIn through File>Options>AddIns>Go.

If you are typing arguments in directly, then you will need quotes. However, if you are referencing other cells as input, quotes are unnecessary.

Find and Zip All .py Files

Create a Zip folder including all .py files on your computer.

This is a program for finding all .py files in a directory and zipping them into a single folder.  We can substitute any file extension (.doc, .xlsx, .dxf, etc.) that we’d like to search for, but in our example we’ll use .py

To find all of the files in a given directory, we’ll import and use functionality from the FolderHierarchy script.  Be sure to save this file in the same directory as the script below.

Next, we’ll filter our FolderHierarchy results to include just the .py files.  Then we simply loop through the filtered DataFrame and add each file to a zip folder using the zipfile module.

# -*- coding: utf-8 -*-
"""
Purpose: Copy all files of a given filetype within a folder, zip them,
         and save.

Input: 1. A folderpath to search in.
       2. File type/File Extension to search for.
Output: A zip file containing all of the files with the specified file extension
        that were folder in the specified input folder.
"""
import pandas as pd
import zipfile
import FolderHierarchy

#Define what filetype to search for
extension=".py"

#Execute 'FolderHierarchy' program (in same directory)
FolderHierarchy
results=FolderHierarchy.all_levels

#Filtering for just files with the defined file extension
found_files=pd.concat([results[level][results[level]['Path'].str.contains(extension,
                                  regex=False)] for level in results])

#Copy and zip all of the files found.
new_zip=zipfile.ZipFile('all_'+extension+ '_files'+'.zip',mode='w')

#Writing to zip (https://pymotw.com/2/zipfile/)
for file in found_files['Path']:
    new_zip.write(file,arcname=file.split('\\')[-1])
new_zip.close()

print('Found '+str(len(found_files))+' '+extension+' files.')

Capture Hard Drive Folder Structure with Python

Use glob and pandas to create a snapshot of any computers current folder structure.

Suppose we want to grab the folder structure of a computer without backing up every single file. Maybe we want to index the folder structure, maybe find all of the .py files scattered across our computer, or we want to take a look at all of the files/folders that exist on another computer.  Here we’ll capture the contents (all files and folders) of our input folder, along with the contents of every sub-folder.

 

folderhierarchy

To begin, we’ll define a few functions:

def FileOrFolder(filepath):
    if "." in filepath:
        return('File')
    else:
        return('Folder')

def StillFolders(dfcolumn):
    FolderCount=0
    for item in dfcolumn:
        if item=='Folder':
            FolderCount+=1
        else:
            pass 
    if FolderCount>0:
        return('Still Folders')
    else:
        return('No Folders')

We’ll want to continue looping through each sub-folder (and their sub-folders) until there are no more folders to look in. “FileOrFolder” identifies whether a given filepath is a File or Folder. “StillFolders” looks in a single column of a DataFrame and identifies whether or not any Folders are remaining.

def find_contents(folderpath):
    #Find contents of intial input
    contents=pd.DataFrame(glob.glob(folderpath + '*'),columns=[('Path')])
    #http://stackoverflow.com/questions/12356501/pandas-create-two-new-columns-in-a-dataframe-with-values-calculated-from-a-pre?rq=1
    contents['FileOrFolder']=contents['Path'].map(FileOrFolder)
    return contents

The “find_contents” function uses glob to find all of the contents of a given folderpath. The contents is returned as a DataFrame.

In order to find all lower-level files and folders, we’ll write a short procedure to continue identifying the contents of sub-folders while the previous “order” folder still contains folders. So the full code will look something like this:

# -*- coding: utf-8 -*-
"""
Purpose: Returns all Folders and Files in a parent folder with hierarchical order.

Input: A folderpath.
Output: An excel file with four columns
            A. Index - Integer.
            B. Path - String.
            C. FileOrFolder - String.
            D. Order - Integer.  "0" is the input folderpath.
"""
import glob
import pandas as pd
from tkinter import Tk
from tkinter import filedialog

Tk().withdraw()

def FileOrFolder(filepath):
    if "." in filepath:
        return('File')
    else:
        return('Folder')

def StillFolders(dfcolumn):
    FolderCount=0
    for item in dfcolumn:
        if item=='Folder':
            FolderCount+=1
        else:
            pass 
    if FolderCount>0:
        return('Still Folders')
    else:
        return('No Folders')

def find_contents(folderpath):
    #Find contents of intial input
    contents=pd.DataFrame(glob.glob(folderpath + '*'),columns=[('Path')])
    #http://stackoverflow.com/questions/12356501/pandas-create-two-new-columns-in-a-dataframe-with-values-calculated-from-a-pre?rq=1
    contents['FileOrFolder']=contents['Path'].map(FileOrFolder)
    return contents

folder=filedialog.askdirectory(initialdir=r'C:\\',title='Please select folder')

all_levels={}
all_levels[0]=pd.DataFrame()
all_levels[0]=find_contents(folder)
all_levels[0]['Order']=0

level=1

while StillFolders(all_levels[level-1]['FileOrFolder'])=='Still Folders':
    all_levels[level]=pd.DataFrame()  
    #http://stackoverflow.com/questions/7837722/what-is-the-most-efficient-way-to-loop-through-dataframes-with-pandas        
    for index, row in all_levels[level-1][all_levels[level-1]['FileOrFolder']=='Folder'].iterrows():               
        all_levels[level]=all_levels[level].append(find_contents(row['Path'] + '\\'),ignore_index=True)
        all_levels[level]['Order']=level
    level+=1

#Concatenate all dataframes in all_levels
combined_all_levels=pd.concat([all_levels[level] for level in all_levels])
#Save to excel on one sheet
combined_all_levels.to_excel('FolderHierarchyResults.xlsx',index_label='Index')

Our output is temporarily stored as a dictionary of DataFrames, which we then concatenate into a single DataFrame, and then finally use to_excel() to write our results into a spreadsheet.

Generating Math Tests with Python

Auto-Generate Unique Tests

This is a script for generating a bunch of unique math tests from a “Test Template” and a spreadsheet containing test inputs and problem solutions.

In our Test Template we set the layout of our test and define our test problems. Our test problems will have variable placeholders (TestID, Question ID, VarA, etc.) that we will replace with data from our “Test Data” spreadsheet.

In our excel file, we random generate values for the A, B, and C variables (using the =RANDBETWEEN() function) and clearly identify which Question, Equation, and Test ID they correspond to. In the Excel file, we’ll calculate solutions using the input data and equation listed for each entry.

Next, we can run our script. This is dependent on the docx (Note: pip install python-docx), docx2txt, re, pandas, and tkinter libraries.  Forms will pop-up prompting you for the Test Template and Test Data files.

"""
Creates unique test documents with data
taken from a DataFrame (which is populated from an excel file).

Input: Test Template (Word Document).  Test Data (Excel File)
Output: 20 Unique Tests (Test Data)
"""
#Import modules
import docx
import docx2txt
import pandas as pd
import re
from tkinter import Tk
from tkinter import filedialog

Tk().withdraw()

#Define "Test" template
template_file=filedialog.askopenfilename(title="Please select Word template")
testdata_file=filedialog.askopenfilename(title="Please select Test Data spreadsheet")

#Read file data
template_text=docx2txt.process(template_file)
testdata=pd.read_excel(testdata_file)

#Produce 20 unique tests
for i in range(20):
    new_text=template_text
    #Add data for 10 unique questions
    for j in range(10):
        #Define replacement dictionary
        #http://stackoverflow.com/questions/6116978/python-replace-multiple-strings
        rep={'QuestionID':str(testdata['Question'][i+j*20]),
             'VarA':str(testdata['VarA'][i+j*20]),
             'VarB':str(testdata['VarB'][i+j*20]),
             'VarC':str(testdata['VarC'][i+j*20])}
        rep=dict((re.escape(k),v) for k, v in rep.items())
        pattern=re.compile("|".join(rep.keys()))
    
        if j==0:
            new_text=pattern.sub(lambda m: rep[re.escape(m.group(0))],template_text,count=4)
            new_text=new_text.replace('TestID','Test #' + str(i+1))
        else:
            new_text=pattern.sub(lambda m: rep[re.escape(m.group(0))],new_text,count=4)
            
    #Create and save new test document
    test_doc=docx.Document()
    test_doc.add_paragraph(new_text)
    test_doc.save(r'C:\Users\Craig\Documents\Python Scripts\Test #'+str(i+1)+'.docx')

After the files have been selected, the script reads the Test Template text and loads the Test Data into a DataFrame. We then loop through the Test Data and produce 20 unique test documents by substituting the placeholder variables with values from the Test Data spreadsheet. Each test document is clearly labeled and we can use our original Test Data as our answer key.

Thanks to Andrew Clark for his code for replacing multiple text strings.

Reading & Writing Excel Data with Python

Using pandas to read/write data in Excel.

In this post we’re going to explore how easy it is to read and write data in Excel using Python.  There’s a few different ways to do this.  We’re going to use pandas.  The pandas DataFrame  is the main data structure that we’re going to be working with.

Reading

The sample Excel data we’ll be using is available on Tableau’s Community page.

To load a single sheet of the Excel file into Python, we’ll use the read_excel function:

import pandas as pd
sales_data=pd.read_excel(r'C:\Users\Craig\Downloads\Sample - Superstore Sales (Excel).xls')

This loads one tab of the spreadsheet (.xls, .xlsx, or .xlsm) into a DataFrame.

In fact, if we didn’t want to download the Excel file locally, we can load it into Python directly from the URL:

sales_data_fromURL=pd.read_excel('https://community.tableau.com/servlet/JiveServlet/downloadBody/1236-102-1-1149/Sample%20-%20Superstore%20Sales%20(Excel).xls')

Note that we can load specific sheets (sheetname), grab specific columns (parse_cols), and handle N/A values (na_values) by using the optional keyword arguments.

To load all of the sheets/tabs within an Excel file into Python, we can set sheetname=None:

sales_data_all=pd.read_excel(r'C:\Users\Craig\Downloads\Sample - Superstore Sales (Excel).xls', sheetname=None)

This will return a dictionary of DataFrames – one for each sheet.

Writing

Writing existing Python data to an Excel file is just as straightforward.  If our data is already a DataFrame, we can call the pd.DataFrame.to_excel(‘filename.xlsx’) function.  If not, we can just convert the data into a DataFrame and then call to_excel.

import pandas as pd
import numpy as np
df=pd.DataFrame(np.random.randn(50,50))
df.to_excel('MyDataFrame.xlsx')

This will work for the .xls, .xlsx, and .xlsm.  Pandas also writer functions such as to_csv, to_sql, to_html, and a few others.

To write data on multiple sheets, we can use the pd.ExcelWriter function as shown in the pandas documentation:

with pd.ExcelWriter('filename.xlsx') as writer:
    df1.to_excel(writer, sheet_name='Sheet1')
    df2.to_excel(writer, sheet_name='Sheet2')

Quick Data Grabs

Try experimenting with the

pd.read_clipboard() #and
pd.to_clipboard()

functions to quickly transfer data from Excel to Python and vice-versa.

Thank you, pandas, for creating and maintaining excellent documentation.