Automating excel files using Pandas

Automating excel files using PandasAkash ChandraBlockedUnblockFollowFollowingMay 28Python is increasingly becoming popular among data science and analytics enthusiasts.

The topmost reasons that come to my mind would be:easily available support for debugging, thanks to stackoverflow.

comlarge collection of open libraries to work withIn this article, I will be using Pandas to perform some basic manipulation (in this case, validating values from 2 files) and creating the final formatted excel file.

Usually, a programming language support working with a simple flat file (like .

txt, .

csv).

Strange as this may sound, but Pandas do even help in formatting the excels.

Before you start, ensure you have already imported some basic libraries -from __future__ import print_functionimport datetimeimport sysfrom os import listdirfrom os.

path import isfile, joinfrom time import gmtime, strftimeimport timeNow, its time to import Pandas and NumPy -import pandas as pdimport numpy as npLet’s start with giving certain parameters.

Notice that, we will be inputting both text file and an excel file in the example below.

You may have to deal with either or both of them.

# All inputsinp_file1 = '/path to file.

xls/'inp_sheet = 'sheet_name'inp_key = 'key'inp_var1 = 'variable to compare'inp_file2 = '/path to file.

txt/'inp_var2 = 'variable to compare'out_loc = '/path to output file.

xlsx/Before we begin, it would be helpful to add functions to perform some basic checks.

The steps below might not be needed but still helps to debug errors.

def checkFile(inp_loc): """ With input location of a file, returns warning in case of missing file Parameters- inp_loc (str): Full file path with extension""" if not(os.

path.

exists(inp_loc)): return print('Invalid location.

Please check file', inp_loc) i_quit = 1Reading the files in the snippet below.

Note that file_1 refers to an excel file and file_2 refers to a tab-separated text file.

# Check for filesi_quit = 0for file in [inp_file1, inp_file2]: checkFile(file) if i_quit: print "Ending Script" sys.

exit()# Input File 1 (excel file)f1_read = pd.

ExcelFile(inp_file1)f1_list = valf_read.

parse(inp_sheet)### Check file 1 inputsprint('Input file counts:', f1_list[key].

count())# Input File 2 (tab separated file)f2_list = pd.

read_table(inp_file2, sep=[ ], header=0, low_memory=False)### Check file 2 inputsprint('Input file counts:', f2_list[key].

count())Perform validation in 3 steps.

Merge the pandas dataframes, clean the fields (if required) and compare.

After the following step, we will have a dataframe comparing values across each key pairs.

# Left join with validation filefin_list = f1_list.

merge(f2_list, on=[key], how='left')# Remove commas from excel in numeric fieldsfin_list['inp_var1_num'] = pd.

to_numeric(fin_list['inp_var1'].

str.

replace(',', ''), errors='coerce')# Comparisonfin_list['var_compare'] = np.

where(fin_list[key].

isnull(),'', (np.

where(fin_list['inp_var1_num']==fin_list['inp_var2'], 'TRUE', 'FALSE')))fin_list = fin_list.

drop(columns='inp_var1_num')This is the final step where we will be creating the final excel output together with formatting.

writer = pd.

ExcelWriter(out_loc, engine='xlsxwriter')# Do not add a headerfin_list.

to_excel( writer ,sheet_name = 'Validation_Sheet' ,start_row = 1 ,header = False ,index = False )workbook = writer.

bookworksheet = writer.

sheets['Validation_Sheet']Using multiple formatting option in final excel sheet# Add a header formatheader_format = workbook.

add_format({ 'bold': True, 'text_wrap': True, 'valign': 'top', 'fg_color': '#D7E4BC', 'border': 1})# Light red fill with dark red text.

format1 = workbook.

add_format({'bg_color': '#FFC7CE', 'font_color': '#9C0006'})# Light yellow fill with dark yellow text.

format2 = workbook.

add_format({'bg_color': '#FFEB9C', 'font_color': '#9C6500'})# Green fill with dark green text.

format3 = workbook.

add_format({'bg_color': '#C6EFCE', 'font_color': '#006100'})# Bold formatformat4 = workbook.

add_format({'bold': 1})# Border formatformat5 = workbook.

add_format({'border': 1})Writing the headers along with final formattingfor col_num, value in enumerate(fin_list.

columns.

values): worksheet.

write(0, col_num, value, header_format)worksheet.

conditional_format(1, 0, fin_list.

shape[0], fin_list.

shape[1]-1, {'type': 'text', 'criteria': 'begins_with', 'value': 'FALSE', 'format': format1 })worksheet.

conditional_format(1, 0, fin_list.

shape[0], fin_list.

shape[1]-1, {'type': 'text', 'criteria': 'begins_with', 'value': 'TRUE', 'format': format2 })worksheet.

conditional_format(1, 0, fin_list.

shape[0], fin_list.

shape[1]-1, {'type': 'no_blanks', 'format': format5 })worksheet.

conditional_format(1, 0, fin_list.

shape[0], fin_list.

shape[1]-1, {'type': 'blanks', 'format': format5 })writer.

save()print('Final file saved at:', out_loc)Thank you all for taking your time to read through this article.

Hope this helps in automating some of your basic tasks.

Feel free to leave any feedback or comments below.

References:https://xlsxwriter.

readthedocs.

io/index.

html.

. More details

Leave a Reply