Another User Defined Function (UDF) prompted by a query at Eng-Tips.
The question concerned a set of data consisting of 13 columns, which was to be divided into two sets, based on the contents of Column 4, then the contents of columns 6 to 11 were to be summed for each row with identical contents in columns 12 and 13.
The first attempt copied the data into two arrays (based on the contents of Column 4), then did a nested loop, for each row of Array1 looping through all of Array2, and adding the contents of Array2 to the current row of Array1, when columns 12 and 13 were identical in each array. This worked, but was extremely slow, taking about 4 minutes for 50,000 rows of data.
One way to speed up the process would be to sort both arrays based on the contents of columns 12 and 13, then exit each loop when the columns no longer matched. An easier and more efficient way though is to create a dictionary object; the algorithm is:
- Copy the data into two arrays, based on the contents of Column 4
- Create a dictionary object of Array2, using a key formed from the combined text of columns 12 and 13, and an Item value of the row number.
- Whenever a later row in Array2 is found matching an existing dictionary key, the values of columns 6 to 11 are added to the matching row.
- Loop through Array1, and add the contents of columns 6 to 11 from the first matching row in Array 2, using the dictionary to identify the row number
The revised routine reduced the time for 50,000 rows from more than 4 minutes to less than 1 second!
The spreadsheet can be downloaded from SelectSum.xlsb, including full open source code.
The function SumSelectD can be used as an array function directly on the spreadsheet, or run the routine CopySumDict to use the data on sheet1, and copy the results to sheet2. Note that the subroutine needs three named ranges:
- TLD: the top left cell of the input data
- Criteria: five cells in a column, defining the selection criteria
- Results: A range of two or more cells at the top left of the output range (the subroutine automatically resizes this range to accommodate the output data).
Also note that the code includes the original, very slow, routines (SumSelect and CopySum). These will work, but may take several minutes to process a large range of data.
Finally, the spreadsheet also includes another short UDF using the scripting dictionary, taken from Daily-Dose-of-Excel. This UDF counts the number of unique items from a comma delimited list in a spreadsheet cell. The download file includes an example, and the full code is shown below:
Function F_unique(c00) Dim sn As Variant, j As Long, c01 As Long, Dict As Scripting.Dictionary sn = Split(c00, ",") Set Dict = New Scripting.Dictionary With Dict For j = 0 To UBound(sn) c01 = .Item(Trim(sn(j))) Next F_unique = .Count End With End Function
See the DDoE post for an even shorter version.
Note that to use the scripting dictionary you need to create a reference to “Microsoft Scripting Runtime” under Tools-References in the Visual Basic Editor.