The previous post presented a User Defined Function (UDF) using regular expressions to extract all numbers, or a selected number, from any string (or a list of strings). In this post I will look more closely at the code used to achieve this, and present a general purpose UDF allowing regular expressions to be used on strings. A spreadsheet with examples of both functions, and full open-source code, can be downloaded from RegExpres.xlsb.
First the Regular Expression object must be created and given the appropriate criteria to return all numbers (0-9), the decimal character (“.”), the exponent symbol (“E”), and the plus and minus symbols (“+-“). It is also set to return all matching strings (Global = True) and to ignore the case of alphabetic characters (IgnoreCase = True):
Const ExpString As String = "E", pmString As String = "+-" 'Initialize variables Set RE = CreateObject("VBScript.RegExp") 'Criteria for Regular Expression REPatt = "[0-9" & DecString & ExpString & pmString & "]" With RE .Pattern = REPatt .Global = True .IgnoreCase = True End With
There follows some code to convert the input range to a variant array and count the number of rows; the numbers in each string are then extracted with the line:
Set Matches = RE.Execute(NumString)
As discussed in the previous post, for a string containing a single value, that is all that is required, but if there is more than one number all the number strings will be concatenated, and must be separated. This can be done by stepping through the original string and comparing the characters with those in the Matches variable, as shown below:
'Loop Matches collection and compare with each character of original string to build strings of all numbers in the sample string strNumber = "" NumMatches = Matches.Count NumChar = Len(NumString) j = 0 k = 1 PrevMatch = False For i = 1 To NumChar MatchString = Mid(NumString, i, 1) If PrevMatch = False And UCase(MatchString) = ExpString Then j = j + 1 ElseIf PrevMatch = False And (MatchString = "+" Or MatchString = "-") And _ IsNumeric(Mid(NumString, i + 1, 1)) = False Then j = j + 1 ElseIf MatchString <> IgnoreString Then Digit = Matches(j) If Digit = MatchString Then strNumber = strNumber + Matches(j) j = j + 1 PrevMatch = True ElseIf PrevMatch = True Then If Position = 0 Then OutA(Row, k) = CDbl(strNumber) ElseIf Position = k Then OutA(Row, 1) = CDbl(strNumber) Exit For End If strNumber = "" k = k + 1 PrevMatch = False End If End If Next i If PrevMatch = True Then If Position = 0 Then OutA(Row, k) = CDbl(strNumber) ElseIf k = Position Then OutA(Row, 1) = CDbl(strNumber) End If End If If k < Position Then OutA(Row, 1) = CVErr(xlErrNA) Next Row
Regular Expressions also have many other uses, and the Reg_Exp UDF allows them to be applied quickly and easily. The Regular Expression object has the methods “List”, “Test”, and “Replace”, and the screen-shots below show the use of these methods on a list of strings:
Extract all numbers
Extract all occurrences of the string “Node”
Find strings containing “Node”
Replace “Node” with “Number”
Find all four letter strings starting with “n” and ending with “e”
For more details of setting up regular expression patterns see:
VBA RegEx: How to Evaluate and Use “Regular Expressions”
Finally note that if you want to apply these UDFs to a range they must be entered as an Array Function to return all the results.