Deprecated: Function set_magic_quotes_runtime() is deprecated in /home/mwexler/public_html/tp/textpattern/lib/txplib_db.php on line 14
The Net Takeaway: Page 25


Danny Flamberg's Blog
Danny has been marketing for a while, and his articles and work reflect great understanding of data driven marketing.

Eric Peterson the Demystifier
Eric gets metrics, analytics, interactive, and the real world. His advice is worth taking...

Geeking with Greg
Greg Linden created Amazon's recommendation system, so imagine what can write about...

Ned Batchelder's Blog
Ned just finds and writes interesting things. I don't know how he does it.

R at LoyaltyMatrix
Jim Porzak tells of his real-life use of R for marketing analysis.







More Protection from SPSS? · 12/14/2004 06:56 PM, Analysis

Hmm… I saw a note on the discussion list with concerns about getting SPSS to install with errors of “hardware key not found”, here

I usually ignore such things as misfires (“SPSS doesn’t use a dongle… others do, but not SPSS! You must be confused”), but then an SPSS employee replied on the list (thank you!) with a suggestion here

So, that user’s problem was solved… But then I realized: Yes, SPSS is using hardware dongles! My installs do not currently, so I guess I am just lucky. And, if they will be requiring registration and hardware dongles, then forget it. Its just not worth it. There are other options out there which aren’t trying to control me and ask for my money (Thank you sir, may I have another?)

But before I jump to conclusions, lets see what the SPSS web site says. A google search reveals the SPSS 13 licensing page, which is a masterful attempt to spin something with no consumer benefit into a “win-win”…

“This new technology will allow us to add flexible new licensing options in the future, while helping you maintain the terms of your license agreement.”

Right, sure. But at least no mention of a hardware dongle. But a search for “hardware key” in the support section reveals a few recent articles:

1 SPSS 38897 No hardware key was found on Toshiba laptop running Windows XP
2 SPSS 32109 Error 2051. No hardware key was found.
3 SPSS 32111 Error 2063. The hardware key device driver was not found.
4 AMOS 32111 Error 2063. The hardware key device driver was not found.

Some of these are from April 7, 2003. So, this has been going on for a while…

Why haven’t I been dongled? Perhaps because I am a good customer? Or perhaps in university situation where they want to control usage? I don’t know, but I don’t like it.

Its just one more straw which is pushing users like me to look to other solutions. Besides the testing we have in place to use more of R, we are also talking again to SAS and Statistica by Statsoft. Why not? After all, there is nothing SPSS can do that SAS or Statsoft can’t duplicate, and though SAS is onerous in their renewal fees, at least I don’t feel like they are trying to control every action I take.

SPSS will do what they think they need to to survive as a business and I respect that. And I like most of what SPSS is doing. But sometimes, letting customers sing your praises by loosening the reins becomes much more powerful than forcing them in line with sticks.


* * *


Desktop Search, or just where did I leave that knowledge? · 12/13/2004 02:27 PM, Tech Search

(If you are here from a search, thanks! This article is part of a series on desktop search. There have been updates, changes, all sorts of new posts, and the best way to see them all is with the “Browse your favorite category” drop down to the right. Choose “Search” and have fun, or click here.

Feel free to start with this one, its the first, but then read the rest for updates.)

(Yet Another update. Yahoo annouces a desktop search based on X1. Good idea, except X1 phones home (more details below), and some people report very heavy CPU usage during indexing. Will phoning home be part of the (presumably free) version of YDS? And will X1 continue to sell their own version with extra features, or retrench to a corporate market?

Ask Jeeves released their Desktop Search here

I’ve installed Copernic. More below… but I found a major flaw that is a dealbreaker for me. Uninstalled, reluctantly.

Yes, Google Desktop Search has been released. I talk about it a bit but I don’t love it. Blinkx has a new version which corrects many deficiencies, but still just feels middling; in addition, their smart founders have moved on, so unclear where this will end up.

More links to free/open-source options added, for those who prefer to see the details of what’s running on their machines. )

Desktop search is the new hotspot. While some folks say that consumer don’t care, I disagree… everyone I’ve talked to can’t wait to find the one that works. The need for “web integration” or “index all my media as well as my info” is less important, but if someone does it well, then maybe it will take off.

While we all expect the OS to help with the search problem, Windows Search has gotten more and more deficient with each version. Even with all my hacks, I still can’t get WinXP search to search all the files I want it to. Searching for an old email in Outlook is a nightmare. Apple’s Sherlock is pretty good, and has lots of plugins, but isn’t as fast or as flexible as it could be, according to some critics. Now that Google and Yahoo have upped their mail limits, expect searching of hosted mail to be a big deal as well (and more on these players down below).

How do these work and differ?
There are two ways to search a large volume of text and data:

  1. Make an index of terms and search that
  2. Open each file and search it

Both have their advantages and disads. For example, building the index means one has to, well, build and store the index. This takes time, cpu, and drive space. In addition, if the index is not well designed, there are limits to the types of searches which can be done. In addition, no searches can be done until the index is created, and if its not kept updated, then searches return erroneous results.

On the other hand, opening each file becomes problematic as you get more and more files. While the index can leverage all sorts of speed tricks, there are not many ways to “loop over directories recursively, open each file, search either line by line or byte by byte for pattern, print if found, repeat”.

Also, each tools search different sets of your files. Some also search Outlook emails, some search Outlook Express emails, some search MP3 and Image tags. What data each reads is up for grabs: Some can only handle text, some can read Office formats, and some can handle PDF. Some read email attachments and also index them, some don’t. Some will index network drives/folders, while others only read local drives until you pay to unlock additional capability.

Some allow sophisticated and/or capability in search. Some allow wildcards, stemming, phrase search, boolean and/or, or even full regex. Some allow “find where word near word” and some have “relevance” searches. So, depending on your need, you may need to shell out some bucks, or accept that some cleverl and useful search techniques will not be available to you.

So, what tools are out there to search your drives?

Index-based Searchers

Blinkx is gradually moving up the curve from hidden to hot. They are trying to mix web and local search, and have some clever visualization. I don’t love the “web and your stuff” mix in searching; I trust my stuff but the web data needs verification (this is even more annoying with Google Desktop Search: I find it really frustrating for it to ping out to the wire when I am trying to find one of my emails). (BTW, if you want to see great visualizations, play with Kartoo, requires Flash.) Appears to handle email and (from their faq) .txt files, Adobe PDFs, PowerPoints, Excel spreadsheets, emails and Word documents. Their new 2.0 includes “Smart Folders” (more details here) as a saved search/view. This has gotten lots of buzz, but no one has said to me “this is the one to get”, so I haven’t tried it yet. (BTW, Blinkx has nothing to do with BlinkPro, the bookmark tool I recommend elsewhere).

dtSearch has been doing this stuff for a while, and was one of the first to offer a desktop search. They create an index and, depending on your spend, can even create a shared index for a shared network search, like a private company search engine. This stuff is not cheap, with the entry level at $200, but its very powerful, designed for large collections of data (say, all of your invoices for the last 10 years, or whatever). This is probably the most sophisticated tool out there, and so is probably overkill for most users.

X1. You know, I really, really want to like X1. They do lots of things well. Their unique feature is the “search while you type” approach. X1 indexes the usual stuff, and then as you type, the list of hits shrinks to include only the matches (if you remember Lotus Magellan from the 80s, this will be a welcome return of an old friend; the Magellan gang helped create this tool). Its a nice graphic tool, with separate tabs for each of the searches you may want (Email vs. Files. vs. contacts vs. etc.) Other tools can mix all “types” of files in search results, but X1 keeps them on separate tabs. In addition, it can’t handle wildcards, but forum posts imply that this is coming.

So, what bugs me? Its really expensive, starting at $100 (though sometimes discounts knock it down to $75 or so). No support for wildcards: if you are looking for words ending in “raro”, for example, there is no "*raro" or "?raro"... all search terms must start a word. Also, it phones home at each use, primarily to check if its a pirated version or not. As I’ve said elsewhere, I don’t pay money to be watched, and if they have to stop pirates, they should do it without having my machine send info about me back to a company. Its in their privacy policy, and so they aren’t doing anything illegal… but it still sucks. So, for this price, and with a phone home “feature”, its cool, but not cool enough to put up with its problems.

Enfish has been in this game for a while as well; I beta tested a couple of early versions. The product was not only a search, but a powerful integrator of the various Outlook data types, way before MS built some of that into Office. So, for example, you could search for a name, and results would include emails to that person, from that person, their contact info (if you have it), maps, all sorts of data. It also searched the web, etc. It builds an index, and searches the usual suspects, with a file viewer built in.

It looks like they have now separated the search piece into a standalone tool, Find, for $50 and the full “integrator” for $200. When I tested it, I liked it, but it was somewhat slow to index and, because of all the graphics and integration, somewhat slower “feeling” than the other tools. The file preview feature was pretty nice, and if I recall, that same feature allowed it to search a much wider variety of file types than most of the tools I looked at. But I haven’t really used it in over a year, so try the trial yourself. If you have lots of older file formats and still want to search and view them, this may be your best choice.

Copernic Desktop has been in the “search aggregation” space for a while, with a client-side tool “Agent” which would aggregate search results from multiple engines. While I never found much use for it, some friends of mine swore by it. Well, they now have a desktop search, which works like their previous products: A free entry level, and a for-pay advanced version. The entry level looks pretty good, with an attempt to be “light” on resource use, and searches a wide variety of media types (Video, MP3, Favorites, History, etc.) These are worth looking at if you don’t already use a tool to index these; it feels rather bolted on otherwise. And yes, it indexes mail and the usual suspects in documents as well.

It’s biggest flaw in my use is that it doesn’t index the entire file tree. That is, I expected it to log the location of every file, no matter what type it is, and then index contents of files it knows how to read. Instead, it only reads files that it can index… meaning text and various office formats, as well as (by default) .mht and .zip. Go to the Advanced Options page and you see that, under “Additional file types to index (name and properties only)”, you have to manually add the extension of each type of file you want listed. This is rather silly. There is no way to add <strong>.</strong> or other “list em all!” options. I will not be hand typing in the hundreds of extensions that files on my drive use; if anything, I would rather a “do not list” extension field for the few file-type that I know I will not care about. This makes it a dead end for my use; if I am trying to find a file, I shouldn’t have to have had the prescience to add it to an index list.

Also annoying is that, by default, if it doesn’t have focus, it acts like its minimized and stops indexing til 30 seconds of non-use. Yes, this can be turned off, but it takes some hunting in the options.

Look and feel wise, its pretty nice with a very “windowsy” look and feel. (And its based near me in Newton, Mass, but don’t let that sway you.) It does let you index network drives if they are mapped as a drive letter, but not by the \\\\fileserver approach (this was also mentioned below by some commentors on a previous version of this entry). As this is something many of the other free tools do not do, that’s a plus. But given the “only list files with certain extensions” problem, don’t know how useful this will be.

So, in summary, for indexing a variety of “media types” in an attractive tool, Copernic has done a nice job. But by restricting the “file location” indexing only to files that a) have an extension and b) have that extension manually typed into a one line box, this tool is a huge letdown by not allowing me to locate files that I know are currently on my drive. If I wanted that “functionality”, I would just use Windows Search, which also won’t locate files unless they have a certain extension. If someone can tell me how to get Copernic Desktop to list every file (even if it doesn’t index the contents), I’ll gladly update this entry. But until then, I won’t be using this tool. And its too bad; they did so many other things quite right. (BTW, Copernic just got acquired by Canadian company, aka “the Mother of all search engines”).

DiskMeta Lite is the free version of the DiskMeta indexer. The “free” one is for non-comm use only, and only indexes .txt., .doc., .html. They have personal (around $50) and pro (around $100) editions. Looking at the Pro edition: They do appear to mention some extensions not often seen (like the .CHM requested by one of the commentors) as well as “Morphological support of the English language”. Also, no bones about it, they clearly support “local newtwork shared folders and on network mapped drives”. No, I haven’t tried this one yet either.

AskSam plays in the dtSearch space. A professional searching tool / freeform database, you basically “import” information not into an index, but into a database system. Then, you add additional info as you find it or create it, and its all searchable. This becomes a bit different than “Where did I put that file” and more into “if I am going to dump all my info somwhere, where should I put it…” Starts at $150. Similar “organizer” tools include Infoselect for $250, Zoot for $100, InfoRecall for the affordable $40, and NoteLens for the even more affordable $20.

Lookout is my current tool of choice. A beta product which went free when the 2 programmer shop (2 ex-Netscape guys, btw) was acquired by Microsoft, this tool integrates into Outlook as a toolbar. It has a fast search, a pretty fast indexing, and a combined output of the various things it searches. It also has flaws: the output window has no right click menu to move, delete, whatever the results (while X1 does offer this). Instead, you have to open each item to make changes. In addition, this requires the .Net Framework 1.1, which doesn’t hurt anything, but is yet another thing to install. While one guy took the money and moved on, another has stayed with MS and is still responding to user issues on the forum. A pretty strong query language, but overall a “no-frills” product. But for the price (Free!), its very fast, and has become my turn-to tool again and again. Recommended to try, if you are an Outlook user.

File Searchers
If you are not into indexes, there are also GREP type tools, mostly text based, but some with GUIs. These will each be “open each file” tools, so keep in mind that they can search from the moment they are installed, but speed will vary with how much junk you make them search.

Wingrep is a $30 shareware tool which searches via Regex (and soundex, cool!) and even inside Zip files.

Astrogrep is an open-source project which works pretty similarly. You can search via regex or simple wildcards, and it can read most text files, but not any binary or zip files. Of course, it is free open source, and if you want to add additional features… the author welcomes it. It lacks some of the niceities of sorting output, etc. but for free, its pretty nice.

Agent Ransack is the free (or “lite”) version of another search tool, FileLocator Pro. While the feature list seems pretty basic, it is more powerful than it looks, it is free, and even the “pro” version is only $13, so if you like it, its easy to buy (as compared to the overpriced X1).

A free indexer that I completely forgot about til I got an email reminder was Wilbur, formerly commercial, now free and GPL, for Windows only at this time. Its an indexer, can index in zip files, and can handle PDF files. It appears to have been updated last around April 30, 2004, so it is still under active development.

Ones I haven’t tried but should include:
Avafind shareware
AppRocket shareware
Filehand is .net and used to be shareware but now is free.

Summary, Other Options

The portals are also playing in this space. Terra Lycos’ HotBot has had a desktop search tool for a while, but I’ve never tried it. MSN/MS owns Lookout, but they also have hinted that the MSN Toolbar will include desktop search in a near-term release, though some have hinted that it will only be for MSN paying customers. Ask Jeeves bought Tukaroo before they had a chance to do much more than show their product to insiders. Google has also dropped hints that their toolbar may incorporate desktop search at some point soon (though how one will calculate pagerank for my sql query text file is beyond me).

Obviously, Google Desktop Search is now out, and everyone has written how they love it or hate it. I think its silly, but there are enough positive things about it that you should give it a try. Yahoo will be releasing a repackaged X1.

Most of these will be indexers, not filesearchers, so look forward to a period of indexing before searching, and of course, the need to upate your index so your searches are not out of date.

And, yes, of course, you can pull down numerous open source projects like Lucene and make your own search engine… but that’s really a pain. And, just to point it out, that’s basically what Lookout did, so save yourself the effort and leverage what others have created.

There are a couple of desktop searchers already put together with Lucene if you really do want the fully open source approach. Via Jamie’s Weblog, you can look at Docco. Others have mentioned Lucene Desktop and the “command line interface” to Lucene (more of a testing tool, but you get the idea) found here: Lucli. And if you really, really want to play with the edge of technology, Beagle uses and the nascent Mono project (duplicating the Microsoft .net structure in open source). Still open source, akin to Google Desktop, lets you host and run your own personal search engine for your mails and docs… and you can access it from anywhere if you set it up correctly.
X-Friend is another Lucene based engine, all in Java; currently free.
Baagle is an open source attempt to duplicate Google’s desktop search.
SWISH-E has also been suggested for unix/linux folks.

CollectiveCortex has a free trial but is ultimately commercial, like X-Friend on steroids is how one slashdotter explained it.

So, right now, Lookout is my choice. I have tried and removed Copernic (but will put it back on if they index the entire file tree) and should try Blinkx at some point. I liked X1, but until they lower the price and remove the phone-home, its not an option for me and I can’t recommend it. Enfish and some of the others are nice if you have special needs, but for your average analyst, I suspect you won’t go wrong with either Lookout, Copernic (with reservations), or Blinkx.

I don’t know if any of these are what I’m looking for. If nothing else, they all store their info locally which makes them great for speed, but bad for “distribution”. If I’m not at my machine, then I don’t have my knowledge. That is, if I’m searching for something on my machine, I’m already sitting at it. But if I need to look up info, I would rather have an online knowledge-base of some kind, ala a wiki.

BTW, if this stuff is interesting, you may want to check out Amit’s Blog for a different pov. In addition, the CNET gang give their 2 cents here.

PS: Windows search still sucks, but someone on slashdot suggested, to search by content, 1. start the indexing service, wait for it to index your drives. 2. search (Win-F), and prefix your search string with ”!”
I have no idea if this really does anything. More info here and here. A whole site dedicated to this topic, I learned new stuff on almost every page:

(Impressive matrix comparing searchers… Not sure how often it will be updated.

Comments? [20]

* * *


R doesn't want "newbies"... and that's a mistake. · 12/08/2004 03:33 PM, Analysis

The R Project for Statistical Computing team has an impressive mailing list called R-Help… but recently, the classic “we don’t want to answer boring frequent questions, we want esoteric questions” split has come. I’ve commented before on how rude this group is compared to, say, SPSSX-L, but over the last two months, things have really gotten snarky.

It starts here in November with a pointer to yet another article that, yet again, reflects the common saying among analysts: R would be great if it weren’t so stuck on making the easy hard, and the hard impossible (but still open source).

From there, the discusson meandered into “there’s a price to pay for learning powerful tools, and people should suck it up” and “stats that are too easy become mis-interpreted”.

And then… A posting entitled Reasons not to answer very basic questions in a straightforward way. It yet again says that folks should read the docs, they are very detailed; read the FAQs, make sure the question hasn’t already been answered; Search the list, see if the question was asked already.

Yet another post which ignores the fact the the faqs and documentation are written by and for statistical programmers, not users. If you don’t understand the phrasing in the manuals, and the FAQs are written even more obtusely, and the list is full of rude answers saying “we aren’t going to answer this”, then how, pray tell, is the person going to get their question answered? You almost have to give “proof” that you’ve done your search: “The faq says blah, but that doesn’t make sense; someone asked this in 2003 and the list said it was a new feature; the manual on page 203 defines it as an edge case… ok, so I wasted my time, now can you help me?

So, this leads to long threads about excuses as to why the utter rudeness, lack of civility, and “screw ‘em if they can’t take it” approach is completely appropriate in a “high traffic” list. I completely disagree, and it embarrasses me every time I see tenured professors and 20 year analytic vets posting cringe-evoking, rude responses.

Now, in December, a new thread extends the first: Protocol for answering basic questions

More of the same complaints and excuses, requests to split into an r-basic-help and r-advanced-help, and more rhetoric.

Don’t get me wrong: I admire many of the people on the R team: they are geniuses who share their code, and many are far smarter than I can ever hope to be. But they suck as people-persons… and they do the R Project irreparable harm with this behavior.

But think of it this way: Lots of smart stats guys are using R, but even they admit that some parts still suck (usability, gui integration, large-data limitation). The super-programmers who could fix these issues haven’t ever heard of R. Why? Not popular enough. But let’s let a few novices play with it, and they are showing it to others, and suddenly, the super-progs say “oh, you can fix your memory problems by doing this”. Why? Because they want credit too, so a project with a larger audience is more likely to get their efforts.

Look, not everyone is egotistical; this is a broad brush. But the power games on the R-Help list, from a psychologist’s perspective, are pretty transparent. These people can be so nice in person, but on the list, the lack of professionalism is disappointing. And yes, courtesy is a part of professionalism, in my (and many others’) book.

If the only way to make sure that R grows is to split the list, then split it, and volunteers will do the right thing to help out. And let’s try to make better, more realistic docs. Putting my money where my mouth is, I guess it’s time to write my “R for people who are smart enough to use SPSS but not matrix math” book. O’Reilly, here I come.

So, I don’t want to see quotes like this (find it, its in one of the PDF docs on the site): “Because R is free, users have no right to expect attention, on the R-help list or elsewhere, to queries. Be grateful for whatever help is given.” Instead, how about: “We will help as best we can, but we are all volunteers. Be sure to do some of your own legwork first; things move better that way and without some prep, you probably won’t understand the answer you get”.

And I really think its time for the R leadership to take a stance on whether they want a vanity project, or to make something which can change the world. I think R can be one of those standards in the world of analysis, something you automatically turn to. Its the efforts of the past which make me believe that, but its the efforts of the future which will make it real. So, come on, do a mitzvah, help out a newbie when you see their question, OK?

PS: interestingly enough, people who can’t take 2 seconds to explain something can spend 10 times as much time discussing what should become R’s Mascot, in a thread starting here. Sigh.


* * *


Java classes for parsing SPSS .sav- and .por-files · 12/06/2004 03:20 PM, Analysis

(Update: Looks like the java stuff below may be off and on, so keep trying… In addition, there are some commercial options, one of which I list below.)

GPL software for accessing SPSS data files… Very helpful.

Description from Freshmeat:

SoftwareHouse SPSS Utils is a collection of Java classes for extracting data and metadata from SPSS data files. It can parse both system (.sav) and portable (.por) files, and features missing value masking and value label substitution. It can also be used as a stand-alone tool to generate an XML-ish representation of a data file, and should compile to native with gcj quite painlessly.

Also, look at which costs from US$100 and up.

SPSS Format Writer by Cluetec

* * *


Clementine Scripting Explained · 12/03/2004 11:53 AM, Analysis

Scripting in SPSS’s Clementine… You too can do it!

Yes, the entry you’ve all been waiting for. How does scripting in Clementine work? I tell you, it took some digging. The documenation for this is written in such a strange way: It talks all about various node commands and such without ever really documenting how the language works until the end of an appendix section. BTW, I am working from Clementine 7.1; I understand that 9 is about to be released and I hope to get that up and running sooner or later.

1) How to run a script? The easiest way is from Tools | Standalone script. You can also attach a script to a stream which means that when the stream is run, so is the script. SuperNodes can also have scripts.

2) Basics of the language:

Note that there are no syntax highlighters or other “coding” features in Clementined. No debugprint or other debugging functions are available. So, no stepping or breakpoints either. You can open a file for output and log to it (see below), but that isn’t the most elegant of techniques. In general, try to find a way to do what you want with the GUI, because coding in this environment is about as painful as coding in SPSS syntax1. (Yes, that is a footnote, click on the number to read it)

Assignment is handled with set and a single = sign. Variables can be integers, strings, or “objects” which are really just Clementine things (nodes, streams, etc.). You cannot make your own objects or data structures, and there are no arrays or other indexed or grouped data structures.

Assignment is pretty generic… that is, from the simple statement description set PARAMETER = EXPRESSION, we have many things we can set:



But basically, thinking of it as name=value gets you most of where you need to be.

Quoting and Special Syntax
String literals (including Filenames) need to be double quoted: "druglearn.str" If necessary, you can use single quotes around strings (but filenames always need double quotes).

Also, CLEM expressions need to be double quoted (that’s rather annoying, isn’t it?) If you use quotation marks within a CLEM expression, make sure that each quotation mark is preceded by a backslash (\)—for example:

  set :node.parameter=" BP=\"HIGH\""

Parameter references such as ^mystream should be preceded with a ^ symbol

Comments are somewhat traditional: # is a single line, /* */ is for multilines.

If you need to continue a statement to a next line, you HAVE to use a /. This is similar to VB, btw.

  set :fixedfilenode.fields = [{"Age" 1 3}/
  {"Sex" 5 7} {"BP" 9 10} {"Cholesterol" 12 22}/
  {"Na" 24 25} {"K" 27 27} {"Drug" 29 32}]

Flow Control

Looping and Iteration is very limited.
You can iterate across the fields in a node (i.e., in your data) with for f in_fields_at type For loops are closed with endfor

“For” has a few other versions:

Branching is via if-then-else-endif. Unlike assignment, logical equality testing requires 2 (two) = signs: if first == 1 then

As part of manipulation, the “with” construct is available. If you have multiple streams available, you can specify which one to interact with for a series of commands:

  with stream STREAM
    for I from 1 to 5
      set :selectnode.expression = 'field > ' >< (I * 10)

Most of the Clementine functions are available in the scripting language (and have to be double quoted when used!). String concatenation is really oddball here: you use >< (yes, the greaterthan and lessthan signs).

Manipulating Clementine

Nodes have some special issues: You refer to nodes either by their name (a good reason to name each node), or name:type if you have different types of nodes with the same name. You can leave off the name to refer to all nodes of a certain type (:neuralnetnode). Things get a bit more tricky with indirection: set n = “Drug1”, and then you can refer to it as ^n. (This is handy for looping)

Basically, each node is an object with properties. Almost every property in the GUI for a node is also exposed as a script property (btw, these node properties are also called “slot parameters” by Clementine folks). As mentioned above, you use the set Name=Value.

If you want to set multiple properties at once, use braces ({}).

  set :samplenode {
    max_size = 200
    mode = "Include"
    sample_type = "First"

(Note that in this example, the script would impact EVERY samplenode on the stream)

Besides setting properties, you can perform many actions with the Clementine collection of objects:

Like perl, there are “special variables” for current stuff. You can refer to the “current” object in scripting using predefined, special variables.
The words listed below are reserved in Clementine scripting to indicate the current object and are called special variables:
* node—the current node
* stream—the current stream
* model—the current model
* generated palette—the generated models palette on the Models tab of the managers window
* output—the current output
* project—the current project
So, for example:

  save stream as "C:/My Streams/Churn.str"

Not much can be done to output. SPSS has been gradually adding to its OMS (Output Management System), reflecting SAS’s additions and work that Statsoft’s Statistica has had since day one. However, little of this has migrated to Clementine yet, so there are very few ways to manipulate Results. Basically, each terminal nodes include a read-only parameter called output that can be used to access
the most recently generated object.

For Tables, you can get access to a few attributes and values in the data that was generated. For example:

  set num_rows = :tablenode.output.row_count
  set num_cols = :tablenode.output.column_count

The values within the data set underlying a particular
generated object are accessible using the value command:

  set table_data = :tablenode.output
  set last_value = value table_data at num_rows num_cols

Indexing is from 1.

Creating Files
Open (create) a new file with open MODE FILENAME where MODE is either create (creates the file if it doesn’t exist or overwrites if it does) or append (appends to an existing file. Generates an error if the file does not exist). This returns the file handle for the opened file, so best to open file as part of assignment statement.

write|writeln FILE TEXT_EXPRESSION works as expected. close FILE is needed to flush any output caching.


  set file = open create 'C:/Data/script.out'
  for I from 1 to 3
    write file 'Stream ' >< I

  close file

CLEM in Scripts
Make sure to examine the Parameters section as well.
Pretty much every CLEM expression is available except any @ functions, date/time functions, and bitwise operations. Also, CLEM expressions have to be in double quotes (and if you have quotes in the expression, they need to be escaped via (backslash).

Examples of CLEM expressions used in scripting are:

  set :balancenode.directives = [{1.3 "Age > 60"}]
  set :fillernode.condition = "(Age > 60) and (BP = "High")"
  set :derivenode.formula_expr = "substring(5, 1, Drug)"
  set Flag:derivenode.flag_expr = "Drug = X"
  set :selectnode.condition = "Age >= &#8217;$P-cutoff&#8217;"
  set :derivenode.formula_expr = "Age - GLOBAL_MEAN(Age)"

Parameters and Misc
The scripting language often uses parameters to refer to variables in the current script
or at a variety of levels within Clementine.
* Local parameters refer to variables set for the current script using the var command.
* Global parameters refer to Clementine parameters set for streams, SuperNodes, and sessions

Local Parameters are just variables. They need to be predefined with the var command. If you use them to point to nodes, then you need the ^ indirect syntax:

  var my_node
  set my_node = create distributionnode
  rename ^my_node as "Distribution of Flag"

Global Parameters are single quoted: '$P-Maxvalue' These are well examined as part of the CLEM function explanation in the manual.

Exit current, current CODE, Clementine, Clementine CODE allows one to exit script or the program with an optional return code for batch use.

Okay, that should be enough to get you started. For further information, here are some good places to look:

Soon, I’ll post some code samples for you to learn from…

Finally, one last thing. What’s missing from all this stuff? External control of Clem. That is, other than calling command lines on scripts, I can’t externally use any of the modules. SPSS is a bit better, in that I can control modules using SAXBasic, but still problematic for external calls. I don’t want to embed anything, but since Clem is all in Java, one thought is that they could expose the API and Class structure and let us call the jars from other languages. Imaging using Jython or Judoscript to control Clementine… But that’s also for another time, I guess. It’ll probably have a web services interface or a “cook dinner” node sooner than having SPSS realize that its customers are also partners and open the kimono, but we can dream…

(Yes, I know that some folks have embedded SPSS and Clementine in tools… but note that almost every one of those is actually sold by SPSS… And its not like SAS is also opening its world… but looking at Eclipse, how its a tool and a toolset, I think the world is moving that way for the best tools. And, btw: Statistica by Statsoft does document the heck out of their APIs, so some are starting to play this way. And so does R 8-). )


1 I guarantee you that the people coding the SPSS programs are not using notepad; they are using Eclipse and Visual Studio and other programming tools with the features of modern development environments. So, why can’t they extend that to those of us coding with their products? Clem and SPSS both have programming environments not much more advanced than notepad (or pico/nano for you unix folks). I say, force the SPSS programming team to code the SPSS products with notepad and a command-line compiler for a week as punishment… and I bet we would see some amazing programming enhancements in the next release of the SPSS product line.

Comments? [4]

* * *


Fedex Logo · 11/17/2004 07:32 PM, Marketing

(Updated: Great interview with the creator of this logo, Mr. Lindon Leader. Read it at The Sneeze)

The Fedex Logo is always sort of interesting because of the hidden arrow between the e and x.

According to this, it was intentional. The logo/brand work was created by Landor Associates, with their press spin here. This was all in 1994, but I can’t remember any Fedex branding prior to ‘94, so I think this was successful. (Of course, I was also in grad school, and so I never needed to send anything overnight; academia doesn’t really work that way.)

I mention this b/c Ned points out that Fedex rebranded Kinkos. I don’t love the new look, but I think the purchase was a good business idea for them.

BTW, Famous designer Paul Rand (IBM logo, UPS Shield, ABC logo, etc., etc.) reviews the logo here (and more on the next page as well).

Comments? [1]

* * *


Expanding Firefox Search Bar · 11/14/2004 03:56 PM, Tech

One of the biggest hassles in the new Mozilla Firefox 1.0 is the inability to easily change the size of the search bar.

In the past, you had to add lines like this:

/* Make the Search box flex wider(in this case 400 pixels wide) */
#searchbar {
-moz-box-flex: 400 !important;
#search-container {
-moz-box-flex: 400 !important;

to your userChrome.css.

Well, they’ve made it a touch easier with the addition of the line

#search-bar { width: 30em !important; }

instead of the above. But for some reason (see Bug 205011), the Mozilla developers think that the average user should be editing config files instead of simply dragging to resize.

In the meantime, feel free to install the ResizeSearchBox extension to make life easier.


* * *


My one and only one election post. · 11/08/2004 02:04 PM, Trivial

What is going on with this country?

Anyway… what did I learn from this election? Well, other than deciding that I need to move to any country other than this one, I also realized something about the vaunted “power of the blogosphere”.

Like the internet bust, it’s a big joke. All these “important” blogs, the ones talking about voting, the ones quoted by op/ed people and mentioned on NPR… these had no impact on the election, in my opinion. Look at the results: 2000 had no blogs, 2004 did, with virtually indistinguishable results.

All these “influential” blogs were read by people who already had their minds made up. The blogs couldn’t get Dean any farther ahead than they could Kerry, since they weren’t introducing new ideas to new people, but just reinforcing ideas in current “net-savvy” readers.

Yes, a variety of stats and polls imply all sorts of voter turnout numbers and new consituencies… but no poll I saw pointed to blogs as their drivers. Instead, TV/Media and “the internet” as an abstract appear to be constant answers. In fact, some people were reportedly more likely to vote based on spam political emails than by blogs.

Look, blogs are important. The self-publishing model, the “home page” on steriods, is a major shift in how we communicate with each other as people. But like Kerry, we got hung up on thinking that if we all sat around and Harrummphed enough, it would change the others… who weren’t reading what we were writing. Note to self: If my blog isn’t being read, then it isn’t having impact. And if its being read by people who already agree with it… then it’s having no impact.

Instead, think of it like marketing. Segment your audience. Write the usual stuff to your usual elite/media savvy/internet weary/hip urban crowd, they love what you will say and will hosanna til night falls. But consider also making a site which allows the other side to point out their views. I wanna know what that crazy midwest person was thinking… maybe they were right and maybe they weren’t. But our current blogging is all about one way communication with some comments. Nowhere is the real integration of various ideas that the blogosphere pretends it’s about. Comments and “trackbacks” aren’t it.

Some folks say “Wikis shouldn’t be lumped in with blogs”. Ok, but at least wikis have multiple communication paths. I like the ideas of the Always On Network which is a collection of bloggers with intermingling of their posts and viewpoints. Here, we see a variety of conflicting views. That model needs to happen more often, to give people a more serendipity approach to seeing new ideas they may not have thought of before.

So, let’s not fall back and say, “Man! Blogging is cool! It galvanized the electorate and extended the constituency’s power to get their ideas explicated!”. Let’s say that we’ve got a foot in the door to communicate… but that we have to invite people in to listen… people who usually don’t. Now, that would be a powerful blogosphere.

And now back to your usual programming.


* * *


More "research" from eROI · 10/19/2004 12:02 PM, Analysis Marketing

If they make one mistake, I can excuse them... but to go ahead and repeat the same flawed study, come to the same flawed conclusion, and think that they have discovered something is frightening. Even more frightening is how many people appear to believe them.

Back in July, I pointed out the various flaws in this study... and there were a ton. Observational work can only take you so far, and picking a "best day" to email is easy to test via experimentation. And as I've pointed out previously, there is no best day, at least among my 5 years of email marketing research. Experimental manipulation, holding constant the client, type of mail, type of list, call to action, etc. etc. and changing just the day reveals... that in almost every case, no difference in aggregate findings across days.

Now, in an October 12 note in Direct magazine, we see again that eROI ran the same observational study, and came to the same flawed conclusion. In the past 5 months or so, no one there thought to try a controlled experiment?

Everything I said back then still applies to their update, so I encourage you to review my previous posting. I'm not trying to bash eROI... but if this is how they "analyze" data, then one has to wonder about the accuracy of any analyses or findings they do for their clients. Ok, that last was kind of snarky, and probably unfair. How about this: Judge for yourself. Run the test. Send the same mail on different days, wait a few days to aggregate findings from early responders, weekend readers, etc., and wait the same amount of time for each mail day (that is, monday to monday, tues to tues, etc, not mail each day of the week and analyze on the next monday: this truncates sunday, but gives monday 7 full days)... and see for yourself if day of the week matters for you and your specific audience, offers, and marketing.

After all, that's what really matters, right?

(Yes, I do work for an email company, and I guess eROI are competitors, though we've never directly competed for anything. And they probably have great tech, smart folks, and fun clients... so don't hold their research flaws against them. Research is only one part of the entire e-marketing experience, and they may be great at the other pieces. The only way to know is to judge for yourself.)


* * *


Link to Jason Calacanis? · 10/16/2004 02:21 AM, Tech Search

Jason really likes linking... so I've linked to him.

Of course, he is giving away a full version of X1, the "phone home" search program, but we'll see what happens... is where I really examine it, and its worth a read...


* * *


On a previous episode...

powered by Textpattern 4.0.4 (r1956)