Deprecated: Function set_magic_quotes_runtime() is deprecated in /home/mwexler/public_html/tp/textpattern/lib/txplib_db.php on line 14
The Net Takeaway: If you can't do it right, add more sample!


Danny Flamberg's Blog
Danny has been marketing for a while, and his articles and work reflect great understanding of data driven marketing.

Eric Peterson the Demystifier
Eric gets metrics, analytics, interactive, and the real world. His advice is worth taking...

Geeking with Greg
Greg Linden created Amazon's recommendation system, so imagine what can write about...

Ned Batchelder's Blog
Ned just finds and writes interesting things. I don't know how he does it.

R at LoyaltyMatrix
Jim Porzak tells of his real-life use of R for marketing analysis.






If you can't do it right, add more sample! · 03/22/2006 02:08 PM, Marketing

UPDATE: I’ve responded to some small criticism at a newer post ExactTarget still observing…

Original Post:

ExactTarget has some smart folks, including its founder, Chris Baggott, and its recent addition to its “strategic services”, Morgan Stewart. And, as I’ve said before, I am ever more disappointed when smart people do not-so-smart things.

You all know how much I hate observational studies which attempt to create cause out of correlation. For example, see my complaints about eROI’s work at Time of Day and Observational Studies and More ‘research’ from eROI.

Well, here’s another one. ExactTarget has released yet another completely observational study, which attempts to derive truth from a non-controlled, non-randomized, non-stratified, and basically non-research oriented research report. What’s their claim to fame for this one? “In fact, this is the largest, most comprehensive study to date, including data from more than 4,000 organizations, 230,000 email campaigns and 2.7 billion email messages. The study summarizes overall open, click-through and unsubscribe rates and provides additional analyses based on day of week for sending email while examining list size and target audience.”

Ah, I see. If you feel you can’t do it right by actually examining the impact of, say, manipulating the day of the week mail is sent, or examining what controlled factors impact unsubs, then add as much biased and non-randomized data as you can and hope that it all comes out in the wash. Look, larger is not always better. (In fact, they try to make some claims about the negative impact of list size as well, its not like they aren’t aware that just shoveling more mails in doesn’t make it better… but more on that later).

Why is this such a disappointment? Because they could have really done it right. That is, with all these clients, all these mails, couldn’t they have gotten just a few to actually control what’s controllable and actually demonstraate causation? Instead, not one variable is manipulated. Not one experiment is reported on. No real research happened here at all. Oh, and let’s add to the pain. The first sentence of the study says:

“ExactTarget’s 2004 groundbreaking study of which day of the week was the best day for marketers to send emails caused many to re-evaluate their common practices and employ testing to determine which day worked best for their customers.”

Hmm. So, puffery aside (e-Dialog clients knew to test this since 1999), they recognize that testing is required to really, really understand what works. 10 pages later, we see… no testing.

What else is missing? Well, no description of the sample or the population that they intend this sample to represent. For example, no description of the client distributions (industry, size of list, goal of campaigns, mailing frequencies). (Note: ExactTarget is under no obligation to reveal “confidential business information” and a detailed sample description could reveal more about their client makeup than they prefer to reveal. However, a stratified sample selection could have allowed them to say who was included without revealing what % of their client base that represents).

Without knowing who was in this data, should I be comparing my small business mailings to this data? How about my CPG brand mailings? How about my subscription renewal mailings? Just because its a large sample doesn’t excuse the fact that it may be biased (and in fact, we know that its biased towards the countless small and medium businesses and agencies using the ExactTarget platform and API; its an easy-to-use and powerful platform).

No description of how data was aggregated. This is actually a pretty common mistake, and one that we need to be better about as an industry. I’ll demonstrate with a contrived example, but it gets the point across.

Lemme aggregate my open data for my mailings for the Auto industry over the past time period of interest. I have 3 mailings:

So, what’s the average? I can average the rates: 36%. This seems a bit odd; its not really representative of any of the mails, either overstating or understating every one. Ok, let’s calc based on the sums of the underlying numbers: 610 opens/1,001,500 mails = 0.06% open rate. Well, that doesn’t seem quite right either; it substantially undercounts 2 out of 3 mails! Which should I use?

Well, it all depends: these three mails are to very, very different audiences: The first was follow up from a screensaver download from my site, the 2nd was a service message to users in the first week of ownership, and the 3rd was to my “house list” which includes contest names, list rental acquisitions, and other poor quality names. So, should I even be averaging these at all?

And, given the variety of mailing sizes: That 1 million has some pretty heavy impact when I use the underlying numbers, but that 100% service message to only 500 people has a huge impact on the “average of the rates”. Which should I be giving the “extra credit” to? This issue of mailing size is not only ignored, but its somewhat misinterpreted by ExactTarget; I explain below.

So, how we aggregate is HUGELY important. Yes, we need to be consistent on how we calc metrics (unique or gross clicks? Net mailed or gross mailed?) but also on how we combine them. The industry is starting to conform to standards on clicks and opens, but how we aggregate is still up for grabs. I encourage the ESPC and other groups to lay out standards on how to aggregate for future work.

No description of mime types; do they send everything Multipart? (I won’t even give them extra points off for the fact that they fat-fingered their definition of Click-through rates by copying and pasting the defn of Open Rates on page 9… but I digress.)

Reading the report, we see more talk of day of the week without reporting a single test. More talk of the value of segmentation without any examples of segmented content vs. a control group. These are all correlations, not really any causational data, and they really could lead people to believ

The conclusions they draw are great press… but of course, they are nothing special, and should be things that most mailers know. That is, larger lists tend to have lower rates than smaller lists. Hmm. Perhaps Mr. Stewart may not recall his Econ 101, but there is this concept of the law of diminishing returns (and not the “law of big numbers”, most people get this wrong). It points out that all thing being equal, increasing the size of one thing does not create a correlated increase in other things. In effect, making a list larger makes it harder to get higher rates. Is it segmentation? Well, it doesn’t have to be. I can send junk to 10 random people and still get a 30% open rate; I only need 3 people! Sending junk to 1,000,000 people requires lots more action before I can get that 30%.

What I don’t see in this work is any manipulation of a segmented approach vs. a control/generic approach, similar content, offer, seasonality, call to action, etc. In fact, doing such an approach does show major impact, but this study does nothing to support such a claim, even though it winds up being true.

There continues to be a difference between full service and self-service, and this kind of work demonstrates it clearly. They picked some conclusions out of the air (good ones, ones that clients should be aspiring to) and then they look at observational data and try to cram it all together. It really is the best of intentions, and they are trying hard to make it all sound good… but its not research.

It was bad when eROI did it, but they don’t have any research experience in their background. Morgan Stewart is better than that (see this press release for his background, including Targetbase, one of the best) and for every spark of greatness he shows in various articles, interviews and in recent work performed by ExactTarget, I wonder if he’s overruled by his marketing dept. in sending out studies which don’t really help expand our knowledge around cause and effect.

You will notice that e-Dialog does not release studies like this. We don’t see the value in random observational work when it can be done correctly through experimental design. Our analytic team performs the requisite controlled testing to understand for each client’s unique business model, capabilities, and marketing goals what the best approach would be. Yes, in other mediums, we don’t often have the control we have in email, so the many studies of self-reported data are all we have. But since it can be done better, why are we falling back into old bad habits?

Look, Chris and Morgan and his team are trying to do the right thing, and they deserve credit for that. But observational studies tend to have no end of of problems, see
There Are No Industry Averages! Get Over It!,
The ‘ladder’ has many rungs… to fall off of.,
Stats are Meaningless, Pt II, and even Quote of the day…. This stuff isn’t easy to do well, and it really does require some care and thought.

Is this work better than nothing? In my opinion, no. Bad research sullies what we are trying to do. But I know others disagree; the popular press loves printing anything which sounds like its fact, even if, on closer examination, it really isn’t. But I suspect the next work we see out of ExactTarget will be more rigorous… and that’s the one you will want to pay attention to. But this one, like most observational studies, should be used as directional learning: half the time its wrong, half the time its right, but its hard to know which half is which.

* * *


  1. It makes you wonder how a reputable, analytics-driven company like ExactTarget could produce something more gloss than substance. Perhaps they were simply constrained to using the lowest common denominator (coarsest) data for which they had permission.

    As you point out, the lack of detail about how lists were aggregated is particularly bothersome. To be useful, a study like this must describe the variability between lists (for starters). For example, when they say that Sunday is the best day for open rates, is that because it was consistently true across all lists? Or perhaps it was true just for a subset of very large lists, and the other lists were all over the board – effectively averaging into noise against the large lists’ signal.

    Showing grand-total averages like this too broad, too rolled-up. Some of their (so-called) conclusions would have been much more interesting if they gave us an analysis of these variances.

    Finally, this claim made me laugh:

    “However, there is a different dynamic on the weekends that must be considered. During the summer months, those in a target audience are most likely to spend the weekend outdoors away from their computers. In the winter, subscribers may spend more time indoors catching up on email and/or preparing for the week ahead.”

    This theory probably holds water if you live in Indianapolis, but it’s the exact opposite for folks living along the Gulf (Houston, for example, the 4th largest metro population in the U.S.)
    Eric from SiteSpect    Mar 22, 11:02 PM    #

powered by Textpattern 4.0.4 (r1956)