“Yields circularity when preceded by its quoation” …

yields circularity when preceded by its quotation.

Today’s post takes us on a little circular tour of the internet, starting with:

Grumpy Old Programmer (Mike Woodhouse) who returns from a long holiday from blogging to publish a nice little routine to generate a circular optical illusion:

GOPillusion

He refers to the (very compact) code as being “golfed”, with a link to Programming Puzzles and Golf code, where I was introduced to the idea of a computer programming “quine”, that is computer code, which when run reproduces itself in full.  A very brief example in Python is shown below (line 1 is the code, the second line the output).

Python quine

But I can do better than that, using the “Classic” Lotus 123 macro script.  If we enter in cell B1:  /C~{D}~
and give it a range name, starting with a backslash, say \Q, then press CtrlQ, in cell B2 appears: /C~{D}~.

We have a 7 character long quine.  Now if we add a {D} on the end, the code now copies itself down a row, then moves down to the next line, where it reads and carries out any code it finds there, resulting in a second copy, and a second move down, and so on, until the end of the universe, which in the case of this no longer brief computer code, is row 65536 of the spreadsheet:

quine123

All this talk of Quine and quining reminded me of Douglas Hofstadter, and a search on his name led me to xkcd 917:

Hofquine

Which completes the circular tour with a link back to this blog, wherein a work of Douglas Hofstadter is reduced to not 6 words, but a single two letter word:

MU.

Hope you enjoyed the trip.

Posted in Computing - general, Drawing, Newton | Tagged , , , , , | Leave a comment

Extracting numbers with regular expressions

Shortly after I wrote about extracting numbers from text strings, Winston Snyder at dataprose.org wrote a detailed article about using “regular expressions” to separate text from numbers in any string.  I have adapted his routine for the same purpose as in the previous post, that is to extract a single numerical value from a text string.  The regular expressions approach has two main advantages:

  • The same function can be used to extract numbers from the left, right, or middle of a text string.
  • No delineators are required.

The only drawback is that if the text string contains more than one number the function will concatenate them if they are integers, or return zero if they both have decimals.

The new ExtractNum function has been added to GetNum.xlsb, and  Text-in2.xlsb, and is shown in use in the screenshot below:

ExtractNum Function

ExtractNum Function

This function is only scratching the surface of what can be done with regular expressions. For more details and links see dataprose.org.

Posted in Excel, UDFs, VBA | Tagged , , , , | 2 Comments

Dynamic sorting with Excel, VBA, and Python

A spreadsheet with User Defined Functions (UDFs) to dynamically sort a range of data has previously been presented here and here.

I have now modified the Python version of the UDF for improved functionality, added a second Python function, and added an example of how a dynamic sort can be accomplished without programming using the Rank() function.  The revised spreadsheet, including full open-source code, can be downloaded from: Sortfunc.zip.

Sorting data in Excel can be accomplished most easily (since 2007) by inserting the data as a table.  The data can then be sorted simply by clicking on the header of the sort column:

py_sort2

Table sorted on Column A

 

If you need a table that will automatically update when new data is entered, things are not so simple however.  The screen shot below shows the procedure using the built-in Rank() function.:

py_sort1

This procedure requires 4 dummy columns to generate the required row index values, which are used in conjunction with the Index() or Offset() functions to return the data.  Also note that this procedure cannot deal with two or more rows with exactly equal sort values, so the values are adjusted by subtracting different very small values from each row.

For situations where VBA is available, the VBA UDF shown below makes the whole procedure much easier and simpler: py_sort3

The only disadvantage of the VBA routine is that it only allows for one sort column.  This has been fixed in the revised Python sort function shown below:

py_sort4

The options range, specifying sort columns and sort directions, may be any number of columns wide.

The code for this function is shown below:

from operator import itemgetter, attrgetter

@xl_func("var SortRange, var SortCol: var")
def py_Sort(SortRange,SortCol):
    if SortCol is None:
        return sorted(SortRange)
    else:
        numsortrows = 0
        if type(SortCol) is list:
            numsortcols = len(SortCol[0])
            numsortrows = len(SortCol)
            x = int(SortCol[0][0])-1
        else:
            numsortcols = 1
            x = int(SortCol)-1
    sortrev = False
    for i in range(numsortcols-1,-1,-1):
        if numsortcols != 1: x = int(SortCol[0][i])-1
        if numsortrows > 1:
            if SortCol[1][i] is None:
                sortrev = False
            else:
                sortrev = SortCol[1][i]
        SortRange = sorted(SortRange, key=itemgetter(x), reverse = sortrev)
    return SortRange

A second Python function has been added, using the numpy argsort function for improved performance where there is only one sort column, and for use in other VBA and Python routines.  Note that this function returns the row offset for the sorted list, which can then be used with the Excel Index() or Offset() functions:

py_sort5

The code for this function is:

@xl_func("numpy_array SortRange, bool RevSort: numpy_array")
def py_ArgSort(SortRange, RevSort):
    if RevSort is None: RevSort = False
    sortind  = np.argsort(SortRange,0)
    if RevSort == True:
        indlen = sortind.shape[0]
        revind = np.zeros((indlen,1))
        indlen = indlen-1
        for i in range(0, indlen+1):
            revind[i] = sortind[indlen-i]
        return revind
    return sortind

For more details of using array functions, see the Using Array Functions and UDFs page.

For more details of installing and running Python from Excel, using the Pyxll add-in, see Installing Python, Scipy and Pyxll.

Posted in Excel, Link to Python, NumPy and SciPy, UDFs, VBA | Tagged , , , , , | Leave a comment

Salthouse

To show a bit of Southern Hemisphere solidarity with Jeff Weir (who has been getting a bit of flack for having the temerity to discuss non-Excel matters on an Excel blog), I’m bringing forward my next Bach instalment.

This post is guaranteed 100% Excel free.

Salthouse are a new Scottish group who have just released their first album, although the group members are all established musicians on the Scottish folk and jazz scenes.  The first link is from their first concert, which looks like it was held in a rather small shoe box.

Update 28 Mar 2014: This song is based on a poem by Lord Byron, but incorporates a number of other works, as listed at Salthouse.bandcamp:

From Byron’s classic poem of 1814, mixed with a few words of prison poetry by Scottish / Australian Bushranger / Highwayman / Bankrobber James Alpin McPherson (1842-1895) finished off with 3 verses from Ewan’s pen.

Verse 1 & 2 – G.G. Byron
Verse 3 – James Alpin McPherson
Verse 4 – 6, chorus and music – Ewan MacPherson (MCPS & PRS)
‘Berneray’ – Lauren MacColl (MCPS & PRS)

lyrics

She walks in beauty, like the night,
Of cloudless climes and starry skies,
And all that’s best of dark and bright,
Meet in her troubled eyes.

Waves in every raven tress,
Softly lighten o’er her face,
Lost but lined as ever strong,
Smiles from days of goodness spent.

And it’s down, down my lovely down
And it’s down, down my lovely down
And it’s down, down my lovely down
To the darkness deep and ever old.

Never a stone will sound tonight,
Beneath my horse’s lonely tread.
His sire was of the purest race,
That ever yet was born and bred.

Was not by silver stream we met,
Nor by rolling wave unseen.
I spoke she knew my only name,
I never gave it free or loud.

Then as I neared her on the track,
Her eyes looked ever into mine,
And wild as only weather knows,
She stole my heart and I her life.

So perfect my life shall never be,
And never a love can hold for me.
Whenever I look into the dark,
Her graceful form is near me still.

And here is the full text of the Byron poem:

She Walks in Beauty
By Lord Byron (George Gordon) 1788–1824

She walks in beauty, like the night
   Of cloudless climes and starry skies;
And all that’s best of dark and bright
   Meet in her aspect and her eyes;
Thus mellowed to that tender light
   Which heaven to gaudy day denies.
One shade the more, one ray the less,
   Had half impaired the nameless grace
Which waves in every raven tress,
   Or softly lightens o’er her face;
Where thoughts serenely sweet express,
   How pure, how dear their dwelling-place.
And on that cheek, and o’er that brow,
   So soft, so calm, yet eloquent,
The smiles that win, the tints that glow,
   But tell of days in goodness spent,
A mind at peace with all below,
   A heart whose love is innocent!

Poetry Foundation

The second piece is a “Setting Sun”, with a more spacious location:

 

Posted in Bach | Tagged , | 1 Comment

Transfer of arrays to/from Python with Pyxll – Part 2; Speed

Following the previous post, which looked at the way different data types worked when transferring data between Excel and Python, this post looks at ways t0 get the best performance.

As a benchmark I have used a short routine that:

  • Reads a range of 3 columns x 1,048,570 rows from Excel (this is 6 less than the maximum rows in a spreadsheet, in Excel 2007 and later).
  • Sums the contents in each row and saves this data in a single column array
  • Writes the sum of rows array back to Excel to a range 3 columns wide (generating 3 identical copies of the array).

Benchmark results were checked for the following combinations:

  1. 5 different combinations of array type (var, numpy_array, and float[]) to pass the data between Excel and Python.
  2. As 1, but using the Numba compiler.
  3. As 2, but looping the Sumrows routine 100 times

The data in the source array consists of numbers and blank cells, but no text.

For the first series of runs the data was read from Excel to a variant array in VBA, then passed to Python via Pyxll to sum the columns.  The resulting array was then returned to VBA and written back to the spreadsheet.  Typical VBA code is shown below:

Sub Timepysub()
Dim Func As String, InRName As String, InRange As Range, OutRange As String, Out As Long, TRange As String
Dim timenow As Double, timea(1 To 1, 1 To 4) As Double, RtnA As Variant

    timenow = Timer
    Func = Range("Func").Value
    InRName = Range("in_range").Value
    OutRange = Range("Out_Range").Value
    TRange = Range("trange").Value
    Set InRange = Range(InRName)

    Out = Range("out").Value

    RtnA = Application.Run(Func, InRange, Out)
    timea(1, 1) = RtnA(1, 1)
    timea(1, 2) = RtnA(2, 1)
    timea(1, 3) = Timer - timenow

    Range(OutRange).Value = RtnA
    timea(1, 4) = Timer - timenow
    Set InRange = Nothing
    If Out >= 2 Then
        Range(TRange).Value = timea
    End If
End Sub

Note that the data range is declared as a range (rather than a variant, as I would normally do when working entirely in VBA).  This is necessary to allow the full array of 1 million+ rows to be passed to Python, using “Application.Run”.

The results with the different options are shown in the screen shot below:

Benchmark results; read and write data from VBA

Benchmark results; read and write data from VBA

It can be seen that:

  • In the first series, the fastest results were obtained using a var array for both input and output.
  • The Sumrows time was significantly faster using a numpy_array, but transfer times were much longer.
  • Using the Numba compiler significantly reduced the execution time for the Sumrows function in all cases, but the effect was very much greater when working with numpy_arrays, where the time was reduced by a factor of the order of 400!
  • The much greater effect of Numba when working with numpy arrays was confirmed by looping through the Sumrows function 100 times.  For this case the total execution time for the numpy arrays was more than 6 times faster than float arrays, and the execution of the Sumrows function was over 60 times faster.

The results when reading and writing from/to the spreadsheet directly from Python are shown below:

py_arrays2-2

The execution times for this case are significantly slower than reading and writing from VBA because:

  • The time to transfer the data is of the order of 2-3 slower than working in VBA.
  • When using numpy arrays the blank cells are read as either ‘NoneType’ or as ‘numpy.float64′ with a value of ‘nan’ (not a number).  This results in rows with blank cells returning either an error or an incorrect value, so in the Sumrows function it is necessary to check for the blank cells.  This greatly slows down the performance of the function, in the case of the runs compiled with Numba the execution time being increased by a factor of over 100!

The results of using numpy arrays with dtype = np.float64 and not checking for ‘nan’ are shown in the screen shot below, where any row containing a blank returns 65535, rather than the sum of the two non-blank cells.  Note however that if the data set contains no blanks there is a huge improvement in execution time by not checking for ‘nan’, especially when using the Numba compiler.

py_arrays2-3

In summary:

  • When transferring large amounts of data, and where use of VBA is acceptable, read and write the data in VBA and pass it to Python using either Pyxll float[] or numpy_array data types.
  • If significant numerical processing is to be carried out in Python there can be a huge speed improvement by using the Numba compiler in conjunction with numpy_array.
  • If the numerical processing is limited the float[] data type may be significantly faster.
  • If Numba is not used then the Pyxll var data type may be the fastest (but only marginally faster than float[]).
  • If it is necessary to read and/or write from Python, and the data may contain blanks, either read the data to a Python List of Lists, or use a np.array and clean the data (by checking for values that are not of type “float”) before carrying out any numerical processing.
  • If it is certain that there are no blank cells then read the data to a numpy array using dtype = np.float64, and use the Numba compiler.
Posted in Arrays, Excel, Link to Python, NumPy and SciPy, VBA | Tagged , , , , , , | Leave a comment