Deprecated: Function set_magic_quotes_runtime() is deprecated in /home/mwexler/public_html/tp/textpattern/lib/txplib_db.php on line 14
The Net Takeaway: Branching Surveys...


Danny Flamberg's Blog
Danny has been marketing for a while, and his articles and work reflect great understanding of data driven marketing.

Eric Peterson the Demystifier
Eric gets metrics, analytics, interactive, and the real world. His advice is worth taking...

Geeking with Greg
Greg Linden created Amazon's recommendation system, so imagine what can write about...

Ned Batchelder's Blog
Ned just finds and writes interesting things. I don't know how he does it.

R at LoyaltyMatrix
Jim Porzak tells of his real-life use of R for marketing analysis.






Branching Surveys... · 04/06/2007 01:14 AM, Analysis Marketing

One of the best tricks someone showed me a few years ago was how to deal with the missing data resulting from branches (or “skips” but that’s a terrible way to phrase it) in surveys. I provide code for SPSS, but the same approach works in R and SAS.

In summary, you set all the missings to a “user saw but ignored” code, then walk the logic of the survey converting “branched over” items into a “didn’t see” code so you can treat these 2 types of missings appropriately. Didn’t see missings are fine; saw but ignored can be a potential issue in the analysis.

You start by setting all the missings to some high unused number. I’ll use 98. This will be the code for “seen but not answered by the user”. We start off pretending that every missing was a seen-but-not-answered.

RECODE q01 to Q09_7, q10 to q20, q21 to q25
    (SYSMIS = 98).

Then you walk the survey, following each branch. If the user actually did branch over an item, you recode it as another high number, such as 99. So, here, if the user answered 2 to Q01, we branch over q02 to q08 (making those 99s). If we have any other situation, we code the missings (which we made 98s previously) as 99, branched over. If we have a branch but these weren’t missing (ie, weren’t converted to 98s by the first transform), then we have a logical error, denoted here by 97. This is a problem: it means the survey was either coded wrong at presentation, or the user mucked around with it. If lots of people have this, then its a flawed execution; if just a few, its probably users mucking around with the survey.

DO IF (q01=2).
   RECODE q02 to q08
    (98,99 = 99)
    (ELSE  = 97).

This can also work if you have linked items:

DO IF (q18=2 and q13<>2).
   RECODE q19 to q20
    (98,99 = 99)
    (ELSE  = 97).

(and various iterations of Q18 and Q13 combos follow.)

Finally, at the end, we tell SPSS to treat these codes as missings and label the various “special missings” (as SAS liked to call them):

MISSING VALUES   q01 to Q09_7, q10 to q20, q21 to q25  (97 THRU 99).
ADD VALUE LABELS q01 to Q09_7, q10 to q20, q21 to q25
     99 'Did not see'
     98 'Did not answer'
     97 'LOGIC ERROR'.

So, what did we do? We converted the various missings into actual user ignores vs. never-saws, and we also identified places where data was present where it shouldn’t have been. This also allows you to create correct denominators for percentages and tabs, since you shouldn’t use all N for items were only a portion of the sample was exposed.

Its important to double check your code vs. the branches you proposed in the survey. If you skip a branch (such as a nested branch) or don’t precisely duplicate the logic via these transforms, you could substantially change the interpretation you give the resulting counts.

Also, duh, don’t use numbers which could appear in the variables. If you ask people to divvy 100 points, for example, don’t use the codes 97-99 as I have since those could overlap with real values. Use 1000097-1000099, for example, and change my examples appropriately.

* * *


  Textile Help
Please note that your email will be obfuscated via entities, so its ok to put a real one if you feel like it...

powered by Textpattern 4.0.4 (r1956)