Deprecated: Function set_magic_quotes_runtime() is deprecated in /home/mwexler/public_html/tp/textpattern/lib/txplib_db.php on line 14
The Net Takeaway: Vilno... Not for me.

OTHER PLACES OF INTEREST

Danny Flamberg's Blog
Danny has been marketing for a while, and his articles and work reflect great understanding of data driven marketing.

Eric Peterson the Demystifier
Eric gets metrics, analytics, interactive, and the real world. His advice is worth taking...

Geeking with Greg
Greg Linden created Amazon's recommendation system, so imagine what can write about...

Ned Batchelder's Blog
Ned just finds and writes interesting things. I don't know how he does it.

R at LoyaltyMatrix
Jim Porzak tells of his real-life use of R for marketing analysis.

 

HOW DID YOU GET HERE?

nettakeaway.com
google.ca
https:
https:
https:
https:
https:
nettakeaway.com
iqworkforce.com
nettakeaway.com

 

 

 

Vilno... Not for me. · 11/19/2006 10:36 PM, Analysis

So this guy says that he has the “new data crunching language”, and he says to read about it at his blog where you can download the language. He’s posted in a couple of places, so I went to take a look:

http://www.my.opera.com/datahelper/blog/ is now blank. Looks like there is a tarball at http://code.google.com/p/vilno/

There’s still some sparse docs at his other blogs:
www.xanga.com/datahelper and datahelper.blogspot.com.

The Xanga site shows some sample code, and the Blogspot one is more random than this blog is. Note to other folks: stick to one blog; its just easier for your poor readers.

So, is it any good? Well, here’s a code sample from the Xanga site:

inlist labdata ;
addgridvars float: change ;
gridfunc baseval=avg(value) by labtest patid
where (visit==-1 and value is not null) and highest date ;
change = value - baseval ;
sendoff(labdata2) labtest patid visit date value change baseval ;

Unfortunately, its even more obtuse than R, and that’s a sad thing to have to say.

The Opera link gives a download for Linux if you want to try it, but if this is the state of the art for “data crunching” languages, then we are still in a bad way.

Back to SPSS, I guess. Or, if you want to know more about R after seeing my ever-so-snide comment, look at my section on R.

* * *

 

  1. Well, I suppose I’d rather have bad publicity than no publicity at all. Thanks for the comment.

    It’s up to other people to decide if my language is easy to learn and easy to use, not up to me. Your answer is a resounding NO.

    But your blog entry reeks of sarcasm, and is rather inaccurate. Those 6 lines of code calculate change from baseline where the baseline data is VERY messy and VERY convoluted. You can certainly do it in SAS or SPSS, but you will struggle, and you cannot do it in just 6 lines.

    When the data is clean, the syntax is much simpler. When the data is dirty and complex, using a GUI is not the best way to go forward.

    I will port to Apple and Windows when I have a better work-area and more computers, right now I can’t port, due to financial constraints (currently looking for a job).

    For the sort of messy data preparation that my language facilitates, R is not a good tool, and the R folks say as much on the import/export section of their web site.


    Robert    Nov 28, 08:03 PM    #


  2. There is nothing misleading about my comments. You have calculated a change from baseline; the code would be the same in messy or non-messy siutations (except you could kill the where).

    Your syntax reads like NewSpeak from Orwell’s 1984. Inlist? addgridvars? sendoff?

    The point is not that its 6 lines, or that its 60. Its that its difficult to read and understand.

    My big push for all these languages is a simple one: treat the analyst as if they weren’t a code writer. Spss using Python is not a bad step, but its still too programmy.

    This language is really difficult for a non-programmer. What we really need is a simple language for handling and transforming large files.

    This is why I complain about R. It too has great power, but they made it so difficult to access that it continues to suffer from a lack of appreciation. I think your language could suffer a similar fate, but you could change that…

    If you want to create the next great language, wrap the complexities of your code into something more Englishy… and then I think you will have something.

    Yes, feel free to include the ability to read “messy” data, though you never really define it (outliers? improperly formatted? Tree or Hierarchical format?) I’ve actually been pretty impressed with what perl and python can do with tough data, but perhaps your language encapsulates some of that flexibility; I couldn’t get deep enough into it to see.

    You have some good nuggets in rough there, like the “rolling counter” functions like “highest date”... But getting through the rest of the syntax hides them.

    Consider making things easier for the analyst, and I’d be easier on the language.


    Michael Wexler    Nov 28, 08:17 PM    #


  3. You took those 6 lines out of context.

    You have not looked at the introductory documentation or beginner’s examples in the August 31 tarball. It’s clear from the description of the data problem , in the blog entry from which those 6 lines come from, that this is a difficult data problem. In fact, a pharmaceutical programmer with less than a year experience could easily make a mistake( such as using the very last available baseline date even if that date has only missing values). The intention of the blog entry was to compare VILNO code with SAS code with a difficult example.

    Those 6 lines form a data processing function – a paragraph of code that reads input datasets, transforms the data, and writes output datasets. There is only one input dataset here: labdata. There is only one output dataset: labdata2. The output dataset has 7 columns: labtest, patid, visit, date, value, change, baseval. The CHANGE column is added to the dataset by the ADDGRIDVARS statement( it’s floating-point, not integer or string). The GRIDFUNC transform adds the BASEVAL column( also floating-point, since it’s an average). The other five columns were in the input dataset to begin with.

    As I’ve noted, in later versions of Vilno, the GRIDFUNC statement will include a WHERE clause. So here the GRIDFUNC statement is 2 lines( if I could indent the 2nd line, I would, the blogging software doesn’t let me). Of the 10 data transforms in the data processing function, the GRIDFUNC transform is the least suitable for a beginner’s example.

    For data preparation problems that are this complex, I do not believe a GUI software tool ( graphical user interface ) is the right tool. R( really the S language) is not a great tool for this task either. The SPSS and SAS programming languages can be used for this. I personally believe the VILNO programming language is better for complex data crunching.

    I have to confess that the problem being solved would be clearer if I had put a printout of the input dataset and output dataset in the www.xanga.com/datahelper blog entry you looked at. I’m not a blogging tool expert, and when I try to use multiple spaces to align the data columns, the blogging software won’t let me do it.

    However, for LABTEST=“HEMOGLOBIN”, and PATID=8 (patient # 8), here is what you have:
    (I use visit=0 for baseline instead of visit=-1, again so the columns line up)

    VISIT DATE VALUE CHANGE BASEVAL

    0 May-10 160 10 150
    0 May-10 163 13 150
    0 May-11 149 -1 150
    0 May-11 151 1 150
    0 May-12 nul nul 150
    0 May-12 nul nul 150

    1 May-16 400 250 150
    2 May-20 400 250 150
    3 May-24 500 350 150
    4 May-28 602 452 150

    I’ve omitted the LABTEST and PATID columns.
    Because May 12 has only missing values, use the average of the May 11 values.
    As I’ve noted before , Vilno version 0.85 does not have date/time functions yet.

    If you know of a better, more user-friendly solution, using a GUI or a programming language, please let me know.

    Robert


    Robert Wilkins    Dec 1, 05:50 PM    #


  4. I’ve responded privately to the author, but I wanted his comments to be seen.


    Michael Wexler    Dec 1, 06:40 PM    #


Name
E-mail
http://
Message
  Textile Help
Please note that your email will be obfuscated via entities, so its ok to put a real one if you feel like it...

Admin
powered by Textpattern 4.0.4 (r1956)