Convert raw Louisiana SWUDS output to a formatted balance sheet¶

The SWUDS output is a spreadsheet of individual industries tagged with its use category, water usage (in MGal/Day), and source type (SIC and Description). This script summarizes the statewide dataset to list the total usage by each SIC for a given year and reports the findings in a formatted MS Excel balance sheet.

The resulting balance sheet is created dynamically. It will contain a column for each SIC found in the input database, listing usage totals in appropriate locations. It will also contain a column of total supply and usage values.

This resulting balance sheet is limited by the data available: supply data are not provided, and usage data for Louisiana is solely from wells (no surface water intakes).

August 2017. John.Fay@duke.edu

In [2]:

#Import modules
import sys, os, openpyxl, xlrd
from openpyxl.utils import get_column_letter
from copy import copy
import pandas as pd
import numpy as np

In [3]:

#Get the filenames
dataDir = '../../Data'
dataFN = dataDir + os.sep + 'LouisianaWaterUse_Distribute.xlsx' #Raw output from SWUDS database query
templateFN = dataDir + os.sep + 'LA_SWUDS_BalanceSheet.xlsx'    #Template of water balance sheet
outFN = dataDir + os.sep + 'LA_SWUDS_BalanceSheet2.xlsx'        #Output file containing all data

Extracting the SWUDS data¶

This section imports and summarizes the raw SWUDS Excel data, producing a dataframe listing the total usage by each SIC for the selected years (2000, 2005, 2010). From this, the script generates a list of each unique SIC found in the dataset, which will be used to create individual columns in the output balance sheets. This dataframe is also used to identify the total usage by each category, the value of which is entered in the appropriate cell in the output balance sheet.

In [4]:

#Read the data xlsx file in as a dataframe
dfRaw = pd.read_excel(dataFN,sheetname='data1')
dfRaw.head()

Out[4]:

	County Code	SIC1	Description	Water Use Type	MP ID	MP Type	MP Name	Source Water Type	Aquifer Code	Aquifer Name2	Aquifer Group ID	Aquifer Group Name	HUC-8	HUC-12	Year	Annual Amt. (MGD)
0	1	112	Rice	ri	1173	Well	Ac-RI-112CHCT-08080201	gw	112CHCT	Chicot Aquifer	22	Coastal lowlands aquifer system S100CSLLWD - 2...	8080201.0	NaN	1979	171.8598
1	1	112	Rice	ri	1173	Well	Ac-RI-112CHCT-08080201	gw	112CHCT	Chicot Aquifer	22	Coastal lowlands aquifer system S100CSLLWD - 2...	8080201.0	NaN	1985	89.0000
2	1	112	Rice	ri	1173	Well	Ac-RI-112CHCT-08080201	gw	112CHCT	Chicot Aquifer	22	Coastal lowlands aquifer system S100CSLLWD - 2...	8080201.0	NaN	1989	30.4600
3	1	112	Rice	ri	1173	Well	Ac-RI-112CHCT-08080201	gw	112CHCT	Chicot Aquifer	22	Coastal lowlands aquifer system S100CSLLWD - 2...	8080201.0	NaN	1994	32.3040
4	1	112	Rice	ri	1173	Well	Ac-RI-112CHCT-08080201	gw	112CHCT	Chicot Aquifer	22	Coastal lowlands aquifer system S100CSLLWD - 2...	8080201.0	NaN	1999	41.3039

In [5]:

#Select only 2000, 2005, and 2010 records and list the number of data for each column
dfSelect = dfRaw[dfRaw.Year.isin([2000,2005,2010])]

In [6]:

#Create a list of all the unique SIC descriptions; these will define the columns in the resulting sheet. 
sectorNames = dfSelect.Description.unique().tolist()
sectorNames.sort()          #Sort the list alphabetically
sectorNames.append("TOTAL") #Add a total entry
#Display the names
sectorNames

Out[6]:

[u'Alkalies & Chlorine',
 u'Animal, Marine Fats & Oils',
 u'Carbon Black',
 u'Chemical Preparations',
 u'Civil, Social, and Fraternal Associations',
 u'Corretional Institutions',
 u'Cyclic-Crudes, Intermediates, Dyes & Org Pigments',
 u'Electric and Other Services Combined',
 u'Electrical Services',
 u'Engineering Services',
 u'Hotels and Motels',
 u'Industrial Inorganic Chemicals',
 u'Industrial Organic Chemicals',
 u'Marine Cargo Handling',
 u'Minerals & Earths: Ground Or Treated',
 u'Natural Gas Transmission',
 u'Natural Gas Transmission and Distribution',
 u'Nitrogenous Fertilizers',
 u'Noncommercial Research Organizations',
 u'Paper Mills',
 u'Paperboard Mills',
 u'Pesticides & Agricultural Chemicals',
 u'Petroleum Refining',
 u'Plastic Products',
 u'Plastics, Materials & Nonvulcanizable Elastomers',
 u'Primary Production of Aluminum',
 u'Products Of Petroleum & Coal',
 u'Psychiatric Hospitals',
 u'Pulp Mills',
 u'Refuse Systems',
 u'Secondary Smelting & Refining Of Nonferrous Metals',
 u'Shipbuilding & Repairing',
 u'Soft Drinks',
 u'Sugar, Cane',
 u'Sugar, Cane Refining',
 u'Synthetic Rubber (Vulcanizable Elastomers)',
 u'Water Supply',
 u'Water Transportation of Freight',
 'TOTAL']

In [7]:

#Pivot on year, listing usage (Annual Amt. MGD), by SIC1/Description
dfPivot = dfSelect.pivot_table(columns='Year',                #Creates a usage column for each year
                               index=('SIC1','Description'),  #Summarizes values by SIC    
                               values='Annual Amt. (MGD)',    #Value reported in the resulting table
                               aggfunc='sum')                 #Computes the sum of all entries for each SIC
#Reset the index to add index values back as columns
dfPivot.reset_index(inplace=True)
#Show the table
dfPivot.head()

Out[7]:

Year	SIC1	Description	2000	2005	2010
0	2061	Sugar, Cane	18.3352	12.4183	13.6363
1	2062	Sugar, Cane Refining	9.7405	2.5852	0.1233
2	2077	Animal, Marine Fats & Oils	NaN	0.0758	NaN
3	2086	Soft Drinks	0.3390	0.2410	NaN
4	2611	Pulp Mills	40.9448	38.6510	NaN

In [ ]:

#Open the balance sheet template workbook
wb = openpyxl.load_workbook(templateFN)

Functions¶

Here, two functions are created. The updateSICColumn function add appropriate formula for an individual SIC entry in the balance sheet. It also names the column using the supplied sector name. The updateTotalColumn function similarly populates cell formulae in the appropriate locations, but these formula compute totals from all preceding SIC columns.

In [ ]:

def updateSICColumn(worksheet,col,name,useGW=0):
    '''Updates the specified column in the balance sheet.
       col = the letter of the column to be updated
       name = the name of the sector
       useGW = the use from groundwater
    '''
    #Sector names
    worksheet['{0}2'.format(col)].value = name
    worksheet['{0}19'.format(col)].value = name

    ##Supply calculations
    worksheet['{0}12'.format(col)].value = '=SUM({0}9:{0}11)'.format(col)
    worksheet['{0}16'.format(col)].value = '=SUM({0}14:{0}15)'.format(col)
    worksheet['{0}17'.format(col)].value = '=SUM({0}12,{0}16)'.format(col)

    ##Use calculations
    worksheet['{0}24'.format(col)].value = '=SUM({0}21:{0}23)'.format(col)
    worksheet['{0}29'.format(col)].value = '=SUM({0}26:{0}28)'.format(col)
    worksheet['{0}34'.format(col)].value = '=SUM({0}24,{0}29)'.format(col)

    ##Totals
    worksheet['{0}36'.format(col)].value = '=({0}34-{0}17)'.format(col)
    
    ##Values
    worksheet['{0}22'.format(col)].value = useGW

In [ ]:

#Function for updating formulate for the totals column. 
def updateTotalColumn(worksheet, colNum):
    totalCol = get_column_letter(colNum)
    prevCol = get_column_letter(colNum - 1)

    #Add sum formula in the following rows
    for row in [4,5,6,7,9,10,11,12,14,15,16,17,20,21,22,23,24,26,27,28,29,31,32,32,34]:
        worksheet['{0}{1}'.format(totalCol,row)].value = '=SUM(B{0}:{1}{0})'.format(row,prevCol)

    #Add the grand total (supply - use)
    worksheet['{0}36'.format(totalCol)].value = '=({0}34-{0}17)'.format(totalCol)

Main section¶

This is the main section of the script. Here, it loops through each year and for each year it:

Gets the appropriate worksheet from the template
Updates the first column (which is preformatted in the template) with data for the first SIC
This include setting the column name and inserting the appropriate usage values
Loops through the remaining SIC entries, and:
Adds a new column to the worksheet,
Sets the style of rows in that column to match the initial column
And, as above, inserts appropriate formulae and values within the column
When the TOTAL column is reached:
A final column is added (again with matching style)
Formulate are updated using the updatTotalColumn function

In [ ]:

#Loop through each year
for year in (2000,2005,2010):
    
    #Get the sheet for the given year
    ws = wb.get_sheet_by_name(str(year))
    
    #Update the cells of the first sector column
    col = 'B'
    name = sectorNames[0]
    usage = dfPivot.loc[dfPivot.Description == sectorNames[0], year].iloc[0]
    updateSICColumn(ws, col, name, usage)
    
    #Loop through each sector, by index
    for idx in range(1,len(sectorNames)):
        
        #get the sector name at the index
        sectorName = sectorNames[idx]
        
        #get the letter of the column to create
        colLetter = get_column_letter(idx+1)

        #Add a new column and copy the style
        for src, dst in zip(ws['B:B'], ws['{0}:{0}'.format(colLetter)]):
            if src.value: dst.value = str(src.value).replace('B',colLetter)
            if src.has_style:
                dst.font = copy(src.font)
                dst.fill = copy(src.fill)
                dst.number_format = copy(src.number_format)
                dst.alignment = copy(src.alignment)
                dst.border = copy(src.border)

        #Update the value for the current sector, unless it's the TOTAL column
        if sectorName != 'TOTAL':
            usage = dfPivot.loc[dfPivot.Description == sectorName, year].iloc[0]
            updateSICColumn(ws, colLetter, sectorName, usage)

        #Otherwise, fill the column with the formulae for the Totals
        else: 
            updateTotalColumn(ws, idx+1)

In [ ]:

#Save
wb.save(outFN)