Deprecated: Function set_magic_quotes_runtime() is deprecated in /home/mwexler/public_html/tp/textpattern/lib/txplib_db.php on line 14
The Net Takeaway: SPSS and Python, Take 2

OTHER PLACES OF INTEREST

Danny Flamberg's Blog
Danny has been marketing for a while, and his articles and work reflect great understanding of data driven marketing.

Eric Peterson the Demystifier
Eric gets metrics, analytics, interactive, and the real world. His advice is worth taking...

Geeking with Greg
Greg Linden created Amazon's recommendation system, so imagine what can write about...

Ned Batchelder's Blog
Ned just finds and writes interesting things. I don't know how he does it.

R at LoyaltyMatrix
Jim Porzak tells of his real-life use of R for marketing analysis.

 

HOW DID YOU GET HERE?

nettakeaway.com
iqworkforce.com
nettakeaway.com
https:
android-app:
android-app:
https:
https:
https:
https:

 

 

 

SPSS and Python, Take 2 · 06/19/2008 03:43 PM, Analysis

I’ve posted about SPSS’s jump into Python before, and I am still not loving the experience. I’m really trying to shift my stuff from Sax Basic (aka Winwrap Basic) to Python, and it’s just a pain. SPSS has provided tons of programmer style docs, but very little in the way of helping you understand the best way to approach the problem with Python.

Here’s what I have learned, with the help of Raynald Levesque’s fantastic book Programming and Data Management for SPSS 16.0: A Guide for SPSS and SAS User (direct link to the book). You’ll also want to keep the SPSS-Python Integration package.pdf and SPSS Scripting Guide.pdf (docs for the SpssClient) files open and handy, both in C:\Program Files\SPSSInc\SPSS16\help\programmability (perhaps only after installing the plugin, but that’s where they SHOULD be.)

(BTW, Raynald wrote the first few versions of this book, but to be fair, its becoming a work with input from a collection of SPSS staff as well, so kudos to them all. When you see me refer to Raynald’s book, mentally thank the rest of the SPSS gang who keep improving it…)

Don’t wait til the last minute to do this stuff. You’ll come out a better person on the other end, but getting there will create scars. I’ll try to post things that tripped me up, but start early getting used to this new world.

This is a pain. You need to install Python first. SPSS doesn’t say if the Activestate distro will work; they ship 2.5 default. The Activestate includes a much nicer IDE (PythonWin) and lots of helpful preinstalled modules for general programming; the SPSS distro includes NumPy and SciPy, which are handy for numerical programming… but include no real IDE, which is kind of in keeping with the SPSS spirit of minimal programmer support. When this finishes, you install the Python Plugin. This is all on your CD, or you can download the plugin from the SPSS DevCentral Python Plugin page. Yes, you may need a free account to get to this, sorry. At that point, you are ready to go; if you want, you can go to the Devcentral downloads area and update some included scripts/programs, but might be useful to get through this post first.

You probably know that you run “syntax” commands in the Syntax window; these commands tell SPSS to open datasets, process the data, etc. If you need to loop, there is a relatively little known extension called “Macro Language” which basically uses !variables and allows you to loop a command. While handy, it’s very limited.

Beyond macros, you could use Sax Basic Script to do more advanced things, including more sophisticated branching and looping, as well as modification of the Output Viewer tree, etc.

So, syntax is still syntax, the macro language is still there, and Sax Basic is still hanging on by a thread… but SPSS is hoping to replace the last 2 with Python. Macro stuff becomes a Program, and Sax Basic stuff comes Scripts.

Huh?

There are actually 2 types of things you can do with Python inside of SPSS:

What is the difference? Well, a Python Program can be inline with your SPSS syntax, just like a macro. You can do the following (cribbed liberally from Raynald’s book):

Sounds like a lot, right? Well, not quite. You don’t have full access to the output tree, so massive reshuffling of output is not available. Since that’s about half my time with SPSS, it is disappointing that this is not exposed. But honestly, you will probably never write another macro again. The only thing you need macro variables for is to pass information out of the Python portion back into regular syntax (Raynald describes this technique).

These programs use the phrase “import spss” as one of their first lines, which is the library SPSS coded up to expose their functionality to Python.

Ok, so what about the Python Script? These use a different library, the “import SpssClient” library. These focus on the stuff left out above, specifically:

Scripts are very akin to SaxBasic, which lived in a separate window (File | New | Script) and was run only via the SCRIPT command (if you are on Windows). Well, same limitations here. Accd to the docs (PDFs and help file), these Python Scripts cannot be used inside a Syntax file ala Begin Program / End Program (we’ll talk more about this below): “Python scripts can be run from Utilities>Run Script or from the Python editor launched from SPSS (accessed from File>Open>Script).” They don’t mention the SCRIPT command, which is kind of a huge omission. I understand that its Windows only, but that is a huge part of the SPSS userbase.

(Note: Though it is undocumented here, some comments from SPSS folks imply that the SCRIPT command will be improved in future versions, is an acceptable way to call Python, and may even allow parameter passing in later versions)

So, why the two? If you think about it, these Script are the “interface/windowsy” side of the system, while the Programs are more about actual data processing. Or, you could say that the Programs are focused on the “back end”, while the Scripts modify the “front end”, the “client”. This allegory falls apart if you push it too hard, but for most cases, it works. See more at SPSS Scripting Facility > Scripting with the
Python Programming Language in the SPSS Help system.

So, in short: Programs are Python in your Syntax file, ala macros. Scripts are Python that run outside of your Syntax file, either by manual calls (File Run) or SCRIPT commands. I suspect there is no real reason for this split other than the way SPSS is programmed. I can only hope they eliminate this arbitrary distinction and confusion at some point in the future. In the meantime, Programs are what you will do most of the time, and Scripts will be the way to make things pretty.

By the way, Python is a full language on its own. So, you could write all your analytic stuff in Python, using it to read and process a file, call SPSS to read the resulting file, call SPSS to analyze some stuff, call SPSS to save the output, and then finish it off in Python. SPSS would show up in the background here and there, but you would never see it. This causes no end of confusion to authors who feel they need to cover both the “SPSS calling Python scripts/programs” and “Python scripts/programs calling SPSS” situations. Here’s my simple advice on it: try everything in SPSS first, and when you are an expert in running things from the SPSS environment, then become batch-master. Why the authors of these docs don’t write their articles this way is beyond me; lots of confusion could be avoided. (Probably the same editor who forgot to mention the use of the SCRIPT command to call external Python Scripts)

Raynald gives some clever examples of using Python to create dialog boxes to let users select variables, etc. In effect you could build a mini, constrained front end on top of SPSS to run just a single analysis for students or for clients who need analytics but want the complexities hidden away.

If you do any searching, you’ll see people on the SPSS-X list whipping out some cool Python with the “import viewer” command at the top. Sorry, that won’t work with V16 as of the writing of this article; the viewer “library” (aka module) has not been updated for v16. Some of it can be rewritten to work with the “import SpssClient”, but it’s an adventure. The SPSS Developer Central Downloads section shows some other modules to play with, but not all of them are ready for V16 (like the “tables” module), so be prepared to experiment. With my V16, I had spss, SpssClient, spssdata, and spssaux modules pre-installed.

I’ll try to give some hints on how to deal with the Output Viewer down below.

It winds up looking like this… just type the below into a syntax window and fire away.
BEGIN PROGRAM PYTHON.
import spss
print “Welcome to Python in SPSS!”
END PROGRAM.

The “PYTHON” is optional, but its good form. All the output shows up in Log sections in the viewer. Between the Begin and End, you are using Python, which means no periods at the end of lines, upper/lower case matters, and spacing/indenting is how you make sections/blocks of code.

I refer you to Raynald’s great book starting on page 219 for how to start making SPSS dance from Python. The best book on Python, in my opinion, is Hetland’s Beginning Python but there are a couple of pretty good ones, as well as free tutorials online. But a book is pretty handy.

For the SPSS portion, besides Raynald’s book, look for SPSS-Python Integration package.pdf and SPSS Scripting Guide.pdf (docs for the SpssClient), both in C:\Program Files\SPSSInc\SPSS16\help\programmability.

As I said, accessing output from these Python programs is mixed bag. You can actually turn output into data (like making a dataset out of a frequencies output, something that SAS has done for years) by “walking the XML tree” of output, but its confusing. This can also be done with “OMS” syntax, but I still don’t understand that stuff, and it’s been around for years. Simplification here would be VERY APPRECIATED. Also, note that you can get to the output, but you can’t really reformat or shuffle it. If it’s a pivot table, you can do some stuff to it, but if it’s not, well, the program access is limited. (Yes, I know it can be as simple as http://support.spss.com/Tech/Troubleshooting/ResSearchDetail.asp?ID=40945 but somehow, mine always seem to be more complicated).

Once you get it working, there are some cool things you can do, including using Python to make a GUI for a custom experience, or using “SPSS Extension Commands” to make new commands in syntax which call Python: In effect, you never have to deal with the Python junk, you just use your new command in syntax just like usual.

(BTW, if you run the SpssClient externally, remember to use SpssClient.Exit() before you do the .StopClient())

One of my most popular scripts is something I put on Raynald’s SPSSTools.net site, called Change The Label and Title of Last Run Procedure. It lets you change the labels and titles of pieces of output, so you don’t have a list of 20 “Frequencies”, but instead can label them “Freq of Gender filtering high income” or whatever, making the viewer tree much more usable. This is in SaxBasic, so I decided to make it work in Python.

This took some doing. The basics are the same, but I struggled with some Pythonic pieces.

The biggest problem: The script assumes you will pass it the new label as part of the script call. This works fine in Sax Basic in V16, but SPSS didn’t include a way to pass parameters into external Python scripts. I will post a workaround to this in a bit.

How does it work? Well, there are objects your script can play with. There is the SpssOutputDoc, made up of SpssOutputItems. SpssOutputItems include Pivot Tables, Headers, Charts, Text Items, Title Items, and Log Items. Each analysis creates a package of output items in the tree. I intend to walk back up the tree from the bottom and find the most recent title, and change it.

Say we run FREQUENCIES VARIABLES=SEX /ORDER=ANALYSIS. on a blank Output Viewer.

We get back, on the Tree Pane:
Frequencies
—Title
—Notes
—Active Dataset
—Statistics (which is a pivottable)
—Sex (which is a pivottable)

If we were to walk the table with something like SCRIPT file=“C:\Python25\Lib\site-packages\spss160\spss\titleheaderwalker.py”. in my Syntax window, and the below script plopped in that directory, we get output which I’ve put in a table below.


# titleheaderwalker.py
import SpssClient
SpssClient.StartClient()
objOutputDoc = SpssClient.GetDesignatedOutputDoc()
objOutputItems = objOutputDoc.GetOutputItems()
for index in range(objOutputItems.Size()):
    objOutputItem = objOutputItems.GetItemAt(index)
    print "=================================================="
    print "Index = "
    print index
    print "Description = "
    print objOutputItem.GetDescription()
    print "GetType = " 
    print objOutputItem.GetType() 
    print "GetTypeString = " 
    print objOutputItem.GetTypeString() 
    print "SpecificType = "
    print objOutputItem.GetSpecificType() 
    print "SubType = "
    print objOutputItem.GetSubType()
    print "TreeLevel = "
    print objOutputItem.GetTreeLevel()
print "=================================================="
print "done!"
# SpssClient.Exit()
SpssClient.StopClient()



Index Description GetType GetTypeString SpecificType SubType TreeLevel
0 Output ROOT Blank SpssHeaderItem Blank 0
1 Log LOG Log SpssLogItem Blank 1
2 Frequencies HEAD Blank SpssHeaderItem Blank 1
3 Title TITLE Title SpssTextItem Blank 2
4 Notes NOTE Notes SpssPivotTable Notes 2
5 Active Dataset TEXT Text SpssTextItem Blank 2
6 Statistics PIVOT Table SpssPivotTable Statistics 2
7 Sex PIVOT Table SpssPivotTable Frequencies 2




So, using my previous code as a guide, we get the following:


import SpssClient, os
SpssClient.StartClient()
thelabel='New Label for Output!'
# want to change Title, then Heading
objOutputDoc = SpssClient.GetDesignatedOutputDoc()
objOutputItems = objOutputDoc.GetOutputItems()
for index in range(objOutputItems.Size()):
    objOutputItem = objOutputItems.GetItemAt(index)
    if objOutputItem.GetType() == SpssClient.OutputItemType.TITLE:
        # Fix the Title first
        objTitleItem = objOutputItem.GetSpecificType()
        objTitleItem.SetTextContents(thelabel)
        objOutputItem.SetDescription(thelabel)
        index = index-1       # Back up one for the header...
        objOutputItem = objOutputItems.GetItemAt(index)
        objHeaderItem = objOutputItem.GetSpecificType()
        objOutputItem.SetDescription(thelabel)
print "done!"
# SpssClient.Exit()
SpssClient.StopClient()

Call this with the Script command from your syntax right after running a procedure, and it will change to whatever is in that line next to “thelabel”. Useful, but only so much.

Why? You shouldn’t have to edit the python file every time you need to label something! That’s silly.

I’ll show a later post how I worked around that problem. Anyway, compare the SaxBasic version to how I did it here, and you’ll see that they are pretty similar, give or take.

In the SPSS Help file, search for “Script Editor for the Python Programming Language”




PS: I keep giving Raynald props, but Jon Peck of SPSS is tireless, an absolute robot, when it comes to helping people on the forums and mailing lists with these kinds of problems. That man is a living, breathing SPSS processor and is a daily lifesaver. He should be knighted.

* * *

 

  1. Thanks for the review. And I agree about Jon Peck!

    It does seem that some things done easily in SAS require backflips in SPSS but maybe it is just because I’ve spent so many years using SAS.


    AnnMaria    Feb 16, 06:14 PM    #


  2. I am very happy to read this. This is the kind of manual that needs to be given and not the random misinformation that’s at the other blogs. Appreciate your sharing this best posting.


    Custom essay writing service    Nov 22, 06:26 AM    #


Name
E-mail
http://
Message
  Textile Help
Please note that your email will be obfuscated via entities, so its ok to put a real one if you feel like it...

Admin
powered by Textpattern 4.0.4 (r1956)