The Excel Lookup functions (including VLookup, HLookup and Match) all allow for an “exact” or “closest” match on numerical data, but the closest option has a number of problems:
- The data must be sorted
- For VLookup and HLookup the data must be sorted in ascending order, but if it isn’t the function may return an incorrect result, rather than #N/A.
- For Match the data may be sorted in either ascending or descending order, but if the actual order is either unsorted or different to that indicated the function may return an incorrect result.
- The terminology used for the argument defining the match type is non-intuitive, and inconsistent between the Lookup functions and the Match function.
- The default option (ascending sort) may produce incorrect results, whereas the option for an exact match will always return either a valid result or #N/A.
- With ascending sorted data the functions will return the last value less than the lookup value, rather than the closest match.
- With descending sorted data the Match function (with Match Type = -1) will return the last value greater than the match value, rather than the closest match.
- The match will only look at data in a single column. There is no built-in function to return the closest point in 2D, 3D, or higher dimension space.
Some of these problems may be avoided by using the Round function on the lookup value, then doing an “exact” lookup, but this can also return misleading results in some circumstances, and does not handle multi-dimensional data.
To deal with all these problems I have written a Nearest() user defined function (UDF) that works on unsorted numerical data with any number of dimensions, and will return:
- The coordinates of the nearest matching point
- The row number of the nearest matching point
- The distance from the lookup point to the matching point
Optionally a “maximum error” distance may be specified, and the function will return “No match” if there is no point within this distance.
A second UDF, Dist(), returns the distance between any two multi-dimensional points.
The spreadsheet, including full open-source code, may be downloaded from: Nearest.xlsb
The screen shots below illustrate the problems with the Lookup functions, and use of the Nearest UDF.
With sorted data and equally spaced data Vlookup returns the highest value less than the lookup value, rather than the nearest match. Using the Round function on the lookup value in this case returns the correct results:
With an unsorted list VLookup returns #N/A when the lookup value is less than the first data value, but if it is greater it returns the value before the first data value greater than the lookup value:
Setting the VLookup “Range_lookup” value to FALSE (i.e. an exact lookup), returns #N/A in all cases in this example, because none of the lookup values have an exact match in the data. Rounding the lookup value to an integer returns a match in all cases in this example, but not always the closest match. The Match/Index combination with “Match_type” set to -1 (descending sorted list) returns a value when the lookup vale is less than the first value in the lookup data, but this is not necessarily the closest match.
With an array of 2D (or more) coordinates the Nearest UDF returns the coordinates that are closest to the lookup points. The function will also return the row number of the matching coordinates or the distance from the lookup point to the nearest mach.
The lookup can also be carried out with the Index and Match functions:
- Generate a list of distances from the lookup point to each of the data points.
- Find the smallest distance with the Min function
- Use Match with the exact option (Match_type = 0) to find the row number
- Use Index to find the coordinates of his point>
The Nearest UDF has a MaxErr option that requires matching data to be within a specified distance of the lookup point. Reducing this value to 0.05 with the example data returns “No match” because the closest data point is 0.054 from the lookup point: