Deprecated: Function set_magic_quotes_runtime() is deprecated in /home/mwexler/public_html/tp/textpattern/lib/txplib_db.php on line 14
The Net Takeaway: Page 5


Danny Flamberg's Blog
Danny has been marketing for a while, and his articles and work reflect great understanding of data driven marketing.

Eric Peterson the Demystifier
Eric gets metrics, analytics, interactive, and the real world. His advice is worth taking...

Geeking with Greg
Greg Linden created Amazon's recommendation system, so imagine what can write about...

Ned Batchelder's Blog
Ned just finds and writes interesting things. I don't know how he does it.

R at LoyaltyMatrix
Jim Porzak tells of his real-life use of R for marketing analysis.






Infobright: The MySQL DataWarehouse · 09/30/2008 12:04 PM, Analysis

Most of the datawarehouses out there that aren’t built around Oracle or one of the other biggies tend to start with PostgreSQL. There are a variety of reasons, from the more complete SQL standard support in it’s query engine to it’s early handling of fully ACID transactions. Most of the “DW appliances” have been built around heavily modified PostGres, including Netezza and Datallegro, among many others. Some have called PostGres the “open source Oracle”, and in fact, EnterpriseDB have modified Postgres to run Oracle programs directly.

But someone mentioned that I should look at Infobright, an open source data warehouse built around MySQL (now owned by Sun). MySQL is by far the most popular database system around these days, offered on every hosting system and showing up in all sorts of places. Many of the differences between PG and MyS have been ironed out with improvements in MySQL storage engines and query handling.

Infobright’s Community Edition is chock full of code and documentation on how their system works. Like the rest, it includes the usual suspects: Columnar data store, compression, gridding, etc. Note that it runs only on Linux (currently, a slew of 64 bit Linuxes, soon 32 bit Ubuntu) and reqs 16GB or more RAM (UPDATE: From comments below, 16GB is recommended, but it can run in less.)

If you have the space (or a spare Amazon virtual server), might be worth checking out. Especially if you grew up on MySQL, and can make it dance and sing like Jeremy Zawodny. Ok, you don’t need to be that much of an expert. But chances are, if you’ve done any web development over the past few years, you’ve become pretty good at MySQL. Here’s a way to leverage that knowledge into a warehouse.

For more on Open Source BI, see my posts (reverse chrono order) on:

LucidDB… Open Source DB for Data Warehousing and BI

PHP and BI?

Open Source BI?

Comments? [4]

* * *


Resolver One · 09/26/2008 07:41 PM, Analysis

What if you were given a spreadsheet which was actually a front end to a set of Python calculations? So that, when you wanted something which would just, well, act like a program in the spreadsheet, you just write the Python?

Well, I haven’t seen a perfect one yet, but I am intrigued by this product, Resolver One.

From their site:

It retains the familiar table-based interface, but has been comprehensively designed to give you the power and flexibility that the traditional spreadsheet lacks. Every time a you change a Resolver One spreadsheet, the software generates Python code expressing the your formulae as a computer program; this code is immediately executed, and the results are displayed back to you in the spreadsheet interface. You can then edit the code, writing your own functions and algorithms, and see the results in the grid too.

It’s interesting: it blurs the line between data grid and programming language on the data. If you think about it, SPSS and other stats grids are pretty rigid: columns of data, variable names at top, rownums along the side. Then came Pivot Tables and OLAP approaches which allow flexiblity in crosstabs. But the spreadsheet is really a blank canvas: throw your data anywhere, do something with it, then reference it. What was missing was stronger link to a programming language. Sure, VBA was in Office, but it was a pain to get to and a pain to use. It was clearly aimed at people writing applications, not people who wanted to manipulate the sheet as part of an analysis.

Besides all the stuff at their site, there are some good examples of what you can really do with it at Resolver Hacks.

I’ll play with it (they have a non-commercial license, totally cool!) and update the post with my thoughts.


* * *


The Fourth Quadrant by Nassim Nicholas Taleb · 09/16/2008 11:12 AM, Analysis

An excellent read, and a must-read if you use statistics. While very focused on risk assessment during this financial crisis, it’s lessons are very important for every analyst.

Highly recommended, as are Mr. Taleb’s books.

THE FOURTH QUADRANT: A MAP OF THE LIMITS OF STATISTICS [9.15.08] By Nassim Nicholas Taleb. An Edge Original Essay

BTW, there is a more technical appendix, but the link is way at the bottom. goes into the stats with more detail, and even if you aren’t a super stats god, you can see how things screw up with the infrequent events.

Comments? [2]

* * *


Segmentation -- Quantivo Gets It. · 09/06/2008 10:58 PM, Analysis Marketing

Look back at my long post on What Web Analytics Is Missing, and you see segmentation all over it.

Now look at Quantivo and see the future coming to life. And its not from any of the web analytics guys.

It starts here. Combine behavioral info from all sources to understand customer needs and predict customer actions… and deliver as a hosted service.

We’ll look back, and wonder how the web analytics companies let this slip out of their fingers.

Quantivo are presenting at the Demo show, so we’ll see more from them soon.


* * *


What Web Analytics is Missing... · 08/13/2008 11:00 AM, Analysis

Formal Summary: If they are to survive, web analysis tools need to focus on business questions, not technical counting and simple reporting. There are 7 major areas that web analytics tools have continued to fail to deliver in over the past 13 years. These 7 areas are interdependent, and represent needs of both the advanced analyst and the basic business user. These areas reflect changes in the web and how interactive technology is used for marketing and overall business, the need for simplification of results sharing, the rise of large-scale data processing capabilities, and the simple fact that web analysis innovation has stagnated.

These 7 areas include:

In this extended post, I explain the issues, give examples, and recommend solutions.

I expect the WA world to:
a) become less generic and address specific business needs, recognizing the content structures and interactive marketing approaches that different verticals use
b) move beyond behavior tracking into a person-audience-segment centered approach
c) open up and integrate
d) keep the evolution going in areas like analytics, segmentation, customization, visualization, and other -tions.

Read on to see just what I mean.


Recently, as part of my job, I’ve reviewed the current state of affairs of the “web analytics” space. I looked at over 20 tools, from the traditional guys to startups, from BI vendors and marketing analytics guys all the way to open source. And what I’ve seen is, frankly, disappointing. I had other words, but I’m trying to keep this clean.

We have stagnated. We have gotten stuck in the weeds, and forgotten the face of our fathers (yes, from King’s Gunslinger series, I like the sound of it). We have forgotten what we set out to do when we first moved these tools past “counts of hits”.

I’m going to lay out the pieces that are missing, and some suggestions for filling in the gaps, but here’s the thing: they all hang together. I’ll try to make that clear by linking, but you need to understand: putting just one in and calling it a win is missing the point. Each of these is a different dimension to the whole problem.

And I’m not going to get hung up on measuring “engagement” or “the conversation” or AJAX or any of the other buzzwords. This post is on the more business basic stuff, the stuff people keep asking for and that I wonder why we, as a group, have so utterly failed to deliver. By basic, I don’t mean simple or beginner. I mean, the real underlying reasons that someone uses these tools, the business problems they are trying to solve while we spin around methodology.

This stuff applies to every tool in the Web Analytics world (aka WA), from the top quad (Omniture, WebTrends, Coremetrics, Google Analytics) to the latest hot-to-trot tool (CrazyEgg this week, you-name-it next week) to the variety of starter tools and open source tries. (And yes, Yahoo! IndexTools currently suffers from many of these problems too… just because Yahoo! acquired them, don’t think they get off any easier).

Yes, it’s long. Print it out if you have to.

Writing this drove me to tears. Its just so depressing how bad we are. I don’t want to hear any more excuses about “you can’t boil the ocean” or “its still a young industry” or “we provide what the industry asks for” or really any of those silly weasel-outs.

(And as you read this, you tool companies, you will be saying “But wait, we offer a strong segmentation tool, we’re rolling out this or that, we’ve always had this and he just doesn’t get it”. Keep saying that. But my advice to you: don’t drink your own Kool-Aid. No question, you guys helped foster and grow this nascent industry, and you get kudos for getting us to where we are now. But for the things I discuss below, you may have things that could be bent into solving these problems… but they aren’t doing it yet. Talk to the analysts out there who are making results happen with your tool. Don’t ask them how the tool helps, ask them how much work they are doing to extract the data, fold it around, run custom work, whatever, to really get at their business issues.)

It’s not like the field is new; I’ve been working around tool deficiencies for over 10 years. We’ve already boiled all the small ponds, tidal pools, and freshwater lakes. It’s time to start on the big problems.

So, yes, tool company, I’m talking to you, be you big or small, US or not, open source or closed. And analysts, consultants, tool users, agencies, publishers, etc., we need to work together and vote with our wallets to make these wishes come true.

Listen up.

Its time to fix it.

More Who, Less Do

Web Analytics is stuck on what people do instead of who they are. That is, we are forgetting that there is a person behind all those behaviors, and that the value of the behavioral tracking is to understand that person and meet their needs. Focusing on the behaviors instead of analyzing them to get at the person behind them is the mistake all the WA tools have made.

And is that bad? After all, I pay for behaviors (clicks) for my search ads, and I make money when people click on ads on my sites, or buy things. Shouldn’t I be focused on that? If I optimize towards behaviors, and I get paid for behaviors, why should I even worry about the soul inside the black box?

Years of marketing research (and even more years of psychology) will tell you that you can’t achieve your business goals in this world if you focus on behavior instead of the person. As a marketer or publisher, you are optimizing stimuli to drive a response, but you don’t have the luxury of a conditioning lab. You have to use everything you can to understand the person: emotional needs and fears, expectations and social norms, the whole WHO they are. Those of you who’ve been doing this a while may call this psychographics, but no matter what name it is, its about what’s driving the person to make the behaviors that are oh so easy to track.

And its not the overused shortcut of “intent”. Intent is session based. Understand a person’s overall interests, needs, expectations, wants, likes, desires, fears, hopes and/or dreams and you’ve got a customer for life.

(BTW, even search advertising, the classic “behavior is all we need in a CPA world” approach, is starting to recognize that if you know how a person thinks, you can find low-cost terms which are completely relevant to your customer’s semi-unique way of thinking. Instead of trying to guess at the universe of terms, you look for groups or segments and how they communicate… and there your terms are. For those who like buzzwords, this is similar to the point of the “long tail” Zipf distribution argument: You can get higher ROI by bidding less for terms which are less “popular” but more relevant to a specific type of needs. Look to 360i and IProspect as leading players in this space, and a slew of startups.)

But our current WA tools do little to nothing to help us with this. Sure, some have some simple filtering based on some IP-based geo data. Some let you pass in variables, such as the type of ad that drove the user (he clicked on the “fear” ad, while that other guy clicked on the “low cost benefit” ad). But where is the clear thought leadership that helps guide users to understanding how important it is to collect this data, and to USE this data? Where is the technology and guidance to help me build a picture of who my visitors and customers are, instead of just what they are doing?

For example, the tool could help with data collection:

The tool could help with creation of segments:

The tool could help with analysis:

My business (every business) is predicated on providing what people want to pay for. To understand that, I need to understand them, and make sure I have the stuff they want to buy. If I don’t understand that, then I lose. I expect my WA tool to help with this, not hide it in the database somewhere.

BTW, I am not saying that behaviors are bad, nor that derivations from behaviors (like Behavioral Targeting (BT)) are bad, or that we should ignore behaviors. But we’ve gone too far in the other direction: we ignore the person, and just count the behaviors.

Behaviors alone are not the answer. How they describe people and what people think will help you predict what they will do, and will also give you hints as to what levers will be most effective in driving behaviors of interest. And isn’t that what you really wanted? You didn’t really want simple counts of behavior after all, did you…

Ok, need a sample? Here we go. Simple question. What kinds of people are visiting my blog? What tool or report will describe their interests for me? Well, there isn’t one… but let’s do what the tool guys want us to do. Let’s look at the most popular content. That would describe the interests of the visitors, right? After all, the uninteresting content will never show up, and the most interesting stuff might tell me about their interests and needs.

Here are the top 10 pages on my site in July. (Ok, so it doesn’t seem like tons of traffic, but I have a very long tail of interesting posts. No, really! Anyway, back to the example).

Actually, even that’s not true. You see, the tool I used doesn’t even have a report by visitor. It only gives me PVs. I had to customize the report to count by visitors. See what I mean? The reports default to behaviors, not people, and no report like this is in my “Visitors” menu of reports.

So, of course, it’s obvious. The people who visit my site love Ubuntu Linux, the R Statistical package, and the Nero CD burning tool. Right?

Wrong. And you can’t really use the tools to get at this. Instead, either by pulling raws, or through custom tagging, you can get a list of cookies who visited each page. From there, a set of crosstabs of co-occurrence per session, or binary-biased correlation, or binary-adjusted clustering all reveal the same thing: There are 4 different groups represented in this chart. (See also Too much Web, not enough Analysis).

Group 1 is the Ubuntu Linux guys. Group 2 is the analysis world, who reads the R and SPSS stuff. Group 3 is the Windows/PC users who need help with stuff (besides Nero, they also like some of my excel posts). Group 4 are also analysts, but they are less statistical and more marketing/databasey (hence the BI and SQL links). While there are some overlaps, these are the basics.

So, there are 4 audiences I appeal to. I try to mix my posts up to appeal to each group, and every once in a while, I throw something out calculated to go beyond these groups. But this is a SEGMENT POV, not a page-popularity or even content-popularity POV.

So, Mr. Business Owner, if you aren’t thinking about the audience, the people behind the behaviors for everything you do, then you are just bit twiddling, and often coming up with the wrong answers. It’s not your fault, however; the tools force you into doing this. And that’s plain bad.

(BTW, Bob Page pointed out that the fact that people use the content oriented report to try to talk about their audience is a very common and and insidiously subtle error committed by both novices and professionals. Using one set of data (in most tools, the standard “PV report”) in a different context (talking about visitors) for which it is not suited will lead to unsupportable conclusions. I agree with him, but I also blame the tool vendors for not providing the tools to answer the who question.)

Too much Web, not enough Analysis.

For a group of tools called “Web Analysis” tools, we sure do have a strange view of “analysis”.

I’m trained as a Social and Quantitative Psychologist, so I have a pretty clear definition of Analysis. It involves turning data into answers and, wait for it, “insight”. This means looking for patterns in data, and then understanding what those patterns mean in the real world, as well as why certain groups don’t follow those patterns. If you think for moment, every problem you are trying to solve for your business will follow this same logic. You don’t need to be a psychologist to figure this out.

So, we all, analysts, marketers, business owners, financial officers, bloggers, all spend lots of time analyzing. Why don’t these web analysis tools help? It’s as if they were hiding in a rock for the last 40 years, and completely missed the rise of computational statistics and data mining. Yes, instead of calculating “sums of squares” by hand, it turns out you can use computers for this.

But all the current WA tools are glorified reporting tools. They just print lists of behaviors, sometimes with filters. Where are the actual summaries, where you explain in ENGLISH (or language of your choice) what is happening in the data? Why do your reports never give distributions or confidence intervals? Where is the trending and predictive expectations? Where are the exception finders and analyzers?

Or, given that there is so much data, where is the clustering and variable reduction? Where’s the part where you help us reduce our data to something useful? I expect to see not only clustering of people (see More Who, Less Do) but clustering of pages, clustering of content and items I sell, anything to help me see the patterns. Let me see co-occurrence, let me see islands, let me see what’s defining the experience I am providing my users.

I mentioned linguistic analysis. This means reading my sites (see Understand My Site). Read the content I create, the content my users create, and help me profile and understand the users creating their content and their response to mine (See More Who, Less Do). And as the “conversation” moves into 3rd parties who host aggregation, give me ways to understand that: work with Disqus, Twitter, Friendfeed, whatever ways you need to integrate things that are part of my site but occur off site (ala widgets, etc.).

An even simpler thing… how about scoring of behaviors to create “engagement scores” (or call it whatever you want)? Lots of people have scoring formulas, so give the user freedom to configure their own. And license the existing ones as a starting place.

Though there has been some scratching of the surface, there is a minimal set of tools helping with data-driven conversion attribution across media for traffic driving. And that little bit dwarfs the big vacuum around analysis driving ad performance on my site (see Publisher-side Business Assistance). This one will get it’s own post at some point, but the level of analysis around attribution is completely weak. You don’t give strong tools to facilitate optimal distribution of revenue across “assists” and direct drivers of action across channels and medium, nor do your tools help with media mix optimization or allocation.

BTW, you tool vendors all give some nice metrics… but you do little to allow me to create my own metrics or ratios. I have to export and stick things in spreadsheets to work around this. As you start borrowing from stats packages and from other BI tools, consider allowing custom metrics (ratios, etc.) to be first class citizens. And every metric should be available as a dimension (and vice versa, see how Excel Pivot Tables handle this very seamlessly).

And don’t get me started on how much these tools ignore the fact that a site is actually a “web”. Graph theory and network analysis, anyone? Yes, it’s more than making a pretty web chart.

You can add capabilities from open source tools like Weka, Yale/Rapid-i, or the R Project. You can license capabilities from SPSS, SAS, Kxen or Insightful or Angoss or a myriad of others. You can make up totally new algorithms if you want, reflecting the modern work of social and network analysis. You can mix and match from data mining, pattern detection, AI, evolutionary/genetic algorithms, frequentist statistics, Bayesian statistics, and even parametric statistics. Do whatever you want.

But you can’t just sit there and do nothing.

Start with the easy stuff: clustering. Help us understand how our content hangs together, how our users hang together. You can cluster content via the words themselves (see Understand My Site) or how people read them. Then start doing some predictive work: help me see how things will trend out, and start adjusting for seasonality. Help me predict how much money my ads will make for the next two months, and give me a goal to shoot for. Also let me know if I’m going below the 2 SD mark and should get worried.

Clustering, by the way, is not the same as Page Groups. Document groups is just aggregating pieces which add up to some larger unit. The cluster may be things that are related, but don’t add up to some obvious grouping. Which brings me to Page Groups: helpful idea, but poorly implemented in most tools (See Understanding My Site). Good idea, but needs more help.

(At this point, I would dive into “start giving me Test Design tools so I can measure the change on my site”, but Google is giving them away now, and the rest of the large independent players have been purchased. Even with all this, they are barely on the radar of most people I talk to. Even analysts I talk to have not heard of this capability, which was surprising. So, if you are an analyst, go to Google Website Optimizer after you read this post and learn some more. Also good is Microsoft’s work, see my post MS Experimentation team makes another winner).

If you call your tool a “Web Analysis” tool, it should at least provide some analysis. What we have now is sadly lacking.

This doesn’t deserve it’s own section, but I gotta point it out. There is a feeling that people love pretty, shiny, interactive graphs, and so the companies call that “Visualization”. The rise of products like SAP’s Business Object’s Crystal Xcelsius is proof of that: Pretty and shiny, but practically useless. (Though there is also hope in the gentle ascent of Tableau, who actually understand what visualization is supposed to be.)

As it turns out, shiny-but-useless is actually not what the visualization field is all about (though they now have a great name for this s-b-u, “chartjunk”). First off, CLEAN UP your act: no drop shadows, pseudo 3d, or fades. Have you guys even read any Tufte or Stephen Few? Second, learn what visualization is all about. The focus is on analysis: helping you derive meaning from the data, like the analysis portion we just discussed.

So, make it easier for the user to understand what’s going on. Phrase the results in terms of the problem, not as an abstract. (See Understand My Site and Understand My Business). For each type of business focus, or for different roles, reduce the amount of stuff you are showing, and give an English paragraph about the trend. Imagine if the tool said things like “Note: your visits per UU is going up, but PVs per UU per month is declining, which means users are coming back more but visiting fewer pages in general. This change happened around May 3, 2008. Did something change on the site?” Yes, you can visually present this, but now you’ve told the user what to look for, put context around it, and turned a mere report or chart into an answer.

Another sub topic in analysis, here are things that should just be fixed; I’m sort of surprised that these haven’t been put to bed by now.

Any metric should also be available as a grouper, especially as a count or range. Any grouper should be aggregatable as a metric. We should be able to set counts of “groups” of actions: How many people did any 2? any 3? Any 4 of these items?

Any “Action” should be available for the same things we do for pages. So, I should be able to see paths and groupings of actions, just like I can for pages. Note that most people count actions, and let you segment by them, but doesn’t make them available for other analyses. At the end of the day, its really a “level of analysis” or data granularity issue… but the future is one where I look at sequences of events, be they PVs, “configurations” in page, etc.

In fact, the artificial separation between “actions” and pageviews has to end. PVs should die, only to be reincarnated as an “action” or “action bundle”. Dennis Mortensen of IndexTools fame suggests that page views was never what it was all about, and suggests the term “content views“: this nicely puts the emphasis on the real reason you were tracking the page or AJAX event. Now, there is still some work on level of granularity: some folks want to know about every click, others just want the click lump or “completion” action for each step of a process, and others just want to know daily counts of sales. We need to provide ways to bundle or aggregate clicks and have those bundles still be first class citizens. It’s a scalability problem, but not unsolvable.

I mentioned Page Groupings earlier; this is related. The sequences I look at may have varying levels of rollups, so I may be looking at P1 -> P2 -> Object 2 (collection of pages) -> In Page Configuration Action -> In page Submit Action. As a user, I want to be agnostic to these details (unless I need to drill down). I want to think in terms of what my site is trying to achieve for my business, not be stuck in the technical “request-response” world of javascript (See Understand My Business).

Understand My Site

These tools do a TERRIBLE job of helping me understand my site. They are agnostic to the site: all they do is count the behaviors, and do nothing to model or correlate the structure of the site. All that log data just magically is generated by, well, something, but not anything the tool tracks or uses in its analyses.

For example, which tool will spider your site looking for link patterns (“Content X has only 1 links to it. Content Y is linked from every page”)? Which tool will auto-tag pages based on meta-tags and title tags to help you cluster and group content? A few have dabbled here, but name a Web Analytics tool which prints out a site-map these days…

The WA world seems to think that we can ignore the site itself in favor of the data traces the user leaves. It’s as if I were a detective, and I was more interested in the clues outside the bank than actually looking inside at how the bank was robbed: “Hmm, the thief wore tennis shoes, very worn down… what? You arrested a guy carrying large bags of money labeled with the bank’s name and a blowtorch? How did you know that was the guy?”.

Some readers have suggested that the WA world doesn’t have to worry about this stuff, that CMS (content management systems) deal with the structure of web sites. Perhaps. But most of them do a poor job of analyzing how the content is used. Editors make some decisions on which content should be highlighted or removed in order to meet some goals (persuade, sell, entertain, inform)... shouldn’t they be provided some tools which would help them? Also, these CMS systems are just starting to provide ways for users of sites to provide content back into the system: that content needs some analysis, and the CMS usually ignores that.

Give me tools to tag and organize content or items, and provide tools to auto-tag and auto-categorize. Have the tools actually READ (content analyze) the content on my site to provide content tags. Either recognize it yourself, or give me tools to build the hierarchies of how I want to understand my site, the makeup of pages with Ajax Modules, etc.: I may not care about “home page” except as a collection of modules and their use. I may have a hierarchy clearly laid out on my site that your tool can recognize… and then its some hypothesis testing to see if users are using the site the way I think… or recommend changes!

Learn from the site how the presentation should be structured. Look for signup pages and “flag” them on my behalf. For forms, ask for sample data to robot the form and understand possible paths so you can build up a tree of all possibles and show which are really being used. Show the page and let me tag or select areas I want to track, so you can autogenerate pixels for me (and even add them to the site!)

And my users are commenting, putting reviews, forum posts, pictures, ratings, and tags up. Why is none of that data examined as part of the process? Saying some users comment and some don’t is about as helpful as saying “There are two kinds of people in the world: those who divide things into two and those who don’t”. (See Too much Web, not enough Analysis)

I’ve mentioned the emphasis on behavior. I have clicks, and I have urls. You need to make sure that I understand what my readers are Seeing, not just what they react to (yes, More Who, Less Do). Simply seeing ads can cause brand change… and simply reading long posts like this one can change the world. Even if you don’t click on a thing, I need to understand what you saw. For static content, easy; for dynamic, user submitted, personalized and customized content, content from other sites, well, the tools are still struggling. Better now than in the past, no question… but still a struggle to set up, report on, and understand.

Ok, it would be remiss of me, after all this venting, not to point out that some of the companies have started down this path, in part by backing into it. For example, Interwoven acquired Optimost to start making the content more fully defined and integrated to the analytics suggesting optimal combinations. Other content management groups are similar analytics in their apps, and some analysts have gone as far as to say that basic analytics will be integrated into all of our apps, so standalone analytics approaches, for the basics, will be a dinosaur. Perhaps, but until open source blogging and publishing systems like Drupal and Wordpress include strong integration with the tools, or until the tools are smart enough to scrape what they can without requiring technical innovation, then I think we still need a lot more improvement.

So, there is a wealth of data about how I see the world and what my site is trying to achieve. And my users are actually telling me what they think about it, both by behaviors (clicks) and actual text. Why not take advantage of it?

Understand My Business

Ok, this is a huge one, almost as big as segmentation. The WA tools basically assume that my site is, well, a site, same as every other one: it’s on the web, it has pages, images, and links. Oh, there may be minor changes in functionality of the tool based on if I appear to have e-commerce (conversion funnels) but at the end of the day, every report looks the same no matter if I measure my corporate intranet or my store. Coremetrics has a few specialized ecommerce reports (the others do as well, but I found CoreM had put slightly more thought into theirs), but even they don’t care if I’m an online grocer or a seller of baseball cards.

The tools need the ability to customize to how I run my business. You need to use the terms I use in my business (widgets, work units, RFPs, whatever). You need to understand the buying cycles my customers face, and let me customize reports by those time frames. Understand whether I keep evergreen listings and order inventory on the back end, or I list perishables which change every week. And you need to understand my goals. What is my site supposed to do for me, long term and short? This is what you need to report on, not the site behaviors themselves.

Organize what you are tracking around what I think is important. Help me help you get out of page mode, and let’s all get thinking about events and how I need to understand them and their impact on my business.

How can any one tool do this? Well, here’s a thought: the wrong answer is to not try at all. Give an interview to the user and let them self categorize. Let them customize “units” and cost factors. Let them describe basic parts of their business. Do your homework to have predefined approaches for small service industries, catalogers, ebay and yahoo storefronts. Spend some time looking for the repeated patterns, and focus on solving the user’s business problem, not their web problem.

And while you are looking at the user’s business problem, consider helping them organize their business. Give much more emphasis on traffic generation and customer acquisition. Split out “paid vs. non-paid” traffic generation, and segment this (see More Who, Less Do). Allow a variety of “dependent variables” for counts in the paid vs. non-paid. Recognize return visitors so we can segment by “pay every time they visit” vs. “paid first time, organic every other time”.

I know, all you tool vendors will swear that you integrate with banners, search marketing, and email. That you track affiliate links. Yada, yada. Companies like e-Dialog continue to thrive because you vendors do such a poor job of making these tools relevant to the users that they pay others to read them so they don’t have to. I appreciate you supporting the consultant economy, but perhaps now is time to get with the program.

How about customer service within a company? Why no reports which help us understand which products or purchases are driving repeat visits to the “help” or “support” sections of my site? There is no post-sales report section built into any of these tools. Which searches are people doing in my forum? What types of questions are they posting? HDTV returns were cut by 33% when Costco realized that no one told people they needed to have HD coming into their home independent from the TV buy. A sign placed on each TV saved them millions in returns. I expect to be able to do the same for my business.

“This is impossible: everyone’s business is too different. This is why we have domain experts and business leaders and analysts: they understand their business, and we can’t tell them how to run it!” is a common tool vendor refrain. Funny, but last time I looked, everyone in the hardware store world runs their business in very similar ways. In fact, many industries have very broad industrial structures that companies need to fit into. The ERP vendors recognized this years ago, as they develop specific functionality and bundles to address the somewhat obvious needs that each vertical has. Look at SAP, Oracle, Lawson: you can buy the basics and configure yourself, but most everyone buys a preconfigured version to start with. While not every sale has been a success, it is undeniable that these guys are onto something with their vertical approach.

Publisher-side Business Assistance

I call it publisher, but nowadays, almost everyone is a publisher. When a storefront like eBay or Walmart is selling adspace, when anyone can put a Google or Yahoo search ad on their site, its a “sponsored by” world. Yet these tools do nothing to help site owners understand what ads are working for us. What ads are getting clicked? Should I move their location or try different colors? Can you help me test different ad vendors or networks to join? Can you help me understand which of my content drives “profitable” traffic (See Understand my Site). Can you help me understand what types of users are coming so I can raise my rates (See More Who, Less Do).

No one other than Rapt (now owned by Microsoft, good call!) and maybe Nuconomy get this. But like Understanding my Business, if I make money from ads, then you need to analyze ad placement and performance from the balance of “what works well for my advertisers” and “what makes money for me”. Those are correlated in the best cases, and if they aren’t, help me mitigate the damage.

And don’t just aim for the top of the heap; instead, consider helping tail and torso. Heads have specialized needs, but the tail and torso are the ones creating the buzz out there, and aggregate up to big bucks. Don’t focus only on the Techmeme top guys, but provide tools which allow people who make money from ads… to optimize how they make money from ads.

Publisher analytics and monetization optimization is a very underserved area… and that means there is money to be made here if its done right. The fact that so little money is being made here (for publisher tools) is a clue… that none of you have done it right yet.

This post is long enough, I’ll do another with a deeper dive on Publisher needs. But the point still stands: the WA world has missed the boat on this one so far, but its not too late to make some innovation.

Single-Site and Just-Sites Centric

The world is changing. I have widgets from others on my page. I have widgets I’ve created to spread my gospel. I send users between my sites, and I have partnerships with other sites. Why can’t my web tool help with this?

Working with in-page elements is always a disaster with all of these tools, from poor summarization (see Understand my site, No Analytics) to very manual tagging. Why do you not have a big push to help standardize video widget tracking? Why not work with widget sets to provide standard hooks for behavior tracking? I would expect to find 20 page tech docs for every widget tool out there telling me exactly what I should tag, how to tag, and how to “one-button” enable this tracking.

Track the stuff in my site: are people using my flickr Widget or my MyBlogLog thingy? If I make the NetTakeaway RSS Headline widget, can you help me track that as an extension of my site? Look, if the link is on my site, I don’t care how it got there, I want to know if people are clicking on it. Read the DOM, intercept the onClick event, do what you need to, but make sure I understand what people are seeing and clicking on.

How can I link data from two sites with totally different domains: you neither give me guidance, nor do you provide reports and tools to help me understand my users and their behaviors across both sites.

I need to understand what’s happening via the dynamic content coming in from off site but displayed on mine. If a user clicks on a headline taking them off my page, tag it and store it. Run it through linguistics so we can build up an interest profile per user (see More Who, Less Do). What about the video and other aspects running on my site? How about tracking my videos, since they are promotion to my sites?

The Web 2.0 world isn’t just “pages don’t redraw any more!”. It’s about the interconnection of services to create content and functionality. The user doesn’t see that its a mashup of 12 different sites, they just use it. The analyst shouldn’t have to deal with that problem either.

By the way, my site is being used in a variety of ways. Besides me extending my functionality out to others via widgets, I am also being visited by mobile phones. Do they use my site differently? They must, they are seeing a different version. How about helping me with that, if by no other thing than making a segment out of them (See More Who, Less Do). But each medium has different aspects of interaction and experiences which combine to different metrics. Instead of everyone coming up with their own, how about making some out of the box to help us understand how our modern mashups are being received?

Bob Page brought up a point I had neglected to mention: Instead of asking for standardization just for video, let’s work together to standardize all of the tagging.

So, we can all agree on certain types (broad though they may be) of events and where they play. Leave lots of room for customization, but have some lines in the sand that we can all work with. Make the tags themselves interoperable, b/c at the end of the day, all the js tags are the same. And with shared metadata, the mechanics of collection and sorting are genericized… and you can put your innovation in how to display the data and answer business questions.

API and Integration
I have mentioned the need to link up with any data I already own, and how to help me collect more (see More Who, Less Do and Too much Web, not enough Analysis). But as we find ways to make more mashups of functionality, instead of just being big reporting tools, WA systems will have to unleash the frontend from the back, and allow tools and systems to have easy read-write access to metrics. But you also need to allow alerts to flow into the systems, and allow programmatic configuration. Can I get a list of the top 10 pages or content items on my site in a widget from any of these tools? Not easily, and not always up to date. Can I load up a SQL query tool to dig into the data or export it in easy ways (easy meaning all data in ready to import forms) for advanced analysis? Again, not easily (though I quite liked Webtrends ODBC approach).

It is possible that all the tools are built this way before the front end is bolted on… but I think most tools have a ways to go before they will expose all their APIs.

BTW, the first ones can be really techie…. but later on, you will want to make sure any mashup tool or web services tool can work with your system, as well as simple scripters like Python. Make sure you provide ways to insert yourselves into custom dashboards that analysts may already be using. And provide ways for the front end (the site) to leverage the work. You should be providing recommendation engines (see Too much Web, not enough Analysis) and ways to leverage the knowledge in your systems in a live, per user (see More Who, Less Do) during their experience on the site.

Some companies, admittedly, may want to build their database around the WA silo. It’s got all their visitors in it, after all… but the WA companies have not done enough yet to open the database to make it a full centerpiece to a business… and yet, the WA vendors also don’t offer enough functionality on their own to manage many business functions.

If you try too hard to become the center of the marketing universe by forcing everything to work through you instead of offering to work with others, you may find that you actually become the center of “left-out-land”.

To stay “close” to the real center, you need to open. (Yes, that’s a bad play on words, and for that, I apologize.)

Are things really so bad in web analytics land? There are literally hundreds of tools, and more and more people start to understand the phrase “visit count”. Thousands of people around the world make their living using these systems, and thousands upon thousands see their output and make big business decisions based on what they see. The big tool names are recognizable (and some, publicly traded!) entities, and as a group, all the tool vendors have contributed to pushing web sites into making money while pleasing their users via optimization and other analyses. So, no, we aren’t at armageddon.

But it’s time to get our act together. We (analytsts, tool vendors alike) continue to talk technical metrics that have little correlation with the business at hand, and we get sucked into weeds of minutiae because the tools force us there.

Is there a pattern to my requests? You bet. Instead of making the request the center of the tool, make a person the center. A second emphasis can be about content objects. And then wrap all that in a model, a model of how various businesses work: how the content and the person come together in a commerce interaction, or a service interaction, or an entertainment experience.

And with this in mind, close your eyes and see what I see. Suddenly, we are talking about the few key metrics of the site’s business, and aggregating lots of little details up into patterns and simple displays. We talk the user’s language, not the language of web analysis. Oh, that’s still down there, but you have to dig a bit to get to it (or just have a different default screen)

We are talking about customers, and how they interact with my business. We are talking about what they feel and think about what I’m doing, and how I can help solve their needs. And you are providing ways for me to understand how my site is making me money, or helping my marketing, or just generally being a part of my business.

Because my business was never about the site. The site was just a part of my business. And finally, the tools start to see this, and work with me that way.

All because you addressed these 7 areas.

More Who, Less Do
Too much Web, not enough Analysis
Understand My Site
Understand My Business
Publisher-side Business Assistance
Single-Site and Just-Sites Centric
API and Integration

Ok, when I snap my fingers, you will wake up, and you will feel refreshed and happy. You will feel the world is a new and great place, and you will be energized to make some changes to make it even better.

Let this list we’ve just gone over be your guide. Time to get started.


Grateful thanks go to early reviewers including Bob Page, Dennis Mortensen, Eric Peterson, W Dave Rhee, Dylan Lewis, and Jim Sterne, though the doc may not reflect all of your wise suggestions and patience.

Comments and corrections are of course always welcomed.

Comments? [12]

* * *


Don't Miss X Change -- almost sold out.... · 08/07/2008 06:22 PM, Analysis

I have been given the honor of leading 2 huddles at Semphonic’s X Change conference. And a birdie told me that there are less than 10 slots left for attendees: either sign up ASAP, or miss your chance.

Every leading thinker and doer in the web analytics space will be there. This show isn’t like a usual “talking head” show; instead, it’s huddles: a nominal leader starts things off with the bigger issues, but it quickly becomes a discussion and knowledge share.

Emetrics is a great show, no question… but this show is more about the practice of web analytics; vendors are not allowed to attend. [CORRECTION: Based on Eric Peterson’s comment below, some thought leaders and senior staff of some vendors are there by invitation, but no salesmen and no pitches. BTW, Eric Peterson himself is one of the guys running the whole thing, so you know it will be good!]

You will wish you had attended this show, so give it a click, look at the people and the huddle topics… and sign up now. At the Ritz (ooohhh) in San Fran, August 17-19, 2008.

Comments? [1]

* * *


LucidDB... Open Source DB for Data Warehousing and BI · 08/05/2008 12:42 PM, Database

This looks like it could be the next big step in open source data warehouse systems. is a good summary presentation from OSCon 2008, and is the home page for the project.

From their site:

LucidDB is the first and only open-source RDBMS purpose-built entirely for data warehousing and business intelligence. It is based on architectural cornerstones such as column-store, bitmap indexing, hash join/aggregation, and page-level multiversioning. Most database systems (both proprietary and open-source) start life with a focus on transaction processing capabilities, then get analytical capabilities bolted on as an afterthought (if at all). By contrast, every component of LucidDB was designed with the requirements of flexible, high-performance data integration and sophisticated query processing in mind. Moreover, comprehensiveness within the focused scope of its architecture means simplicity for the user: no DBA required.

Rather than throwing hardware at data warehousing problems by relying on expensive clusters or specialized “appliances”, the scalability offered by LucidDB’s unique architecture allows you to achieve great performance using only a single off-the-shelf Linux or Windows server. Besides keeping costs down, this also minimizes maintenance and administration hassles.

A great blog by the main founder of the tool:

I’ll update more after I’ve had a chance to play with it.


* * *


Ed Foster has died. · 07/29/2008 12:49 PM, Tech

As reported on his site Gripe 2 Ed and reprinted below.

Ed Foster was a long time columnist for many tech magazines and a stalwart defender of consumer rights in tech purchases. Bad DRM, obnoxious clickwrap and shrinkwrap licenses, egregious tech support policies… they all were railed upon by Ed, who got both tech press attention and community support to make companies do the right thing. He was always a good read, and I wish there were a million more like him.

He will be very missed.


Ed Foster: 1949-2008

By Jeff Foster, Section The Gripelog
Posted on Mon Jul 28, 2008 at 04:59:24 PM PDT

I’m sorry to inform you all that my dad, Ed Foster, Died on Saturday of an apparent heart attack. He was 59. He was very proud of the work he had done and the community he had built here. He was very engaged in what you all had to say, and we had many running instant message conversations about comments or emails people had sent. He was an extremely smart man, and he loved to be mentally stimulated, whether it was a good book, an interesting conversation, or one of the many comments you posted that made him look at something in a slightly different way. As you can probably tell, I’m not half the writer my dad was, and it’s very difficult for me to think of the words to say what I feel(It doesn’t help that when I get stuck on a word or phrase, I think “I know, I’ll ask my dad!”) , but I just want everybody to know that this site and the people on it meant a lot to him. I’m going to miss him. More then words could express, even if he was here to help me think of them.

Services are going to be held on Saturday. For more information, Please email me


See the comments at for kind sympathy.


* * *


SimpleScript? · 07/24/2008 05:29 PM, Tech

Whatever happened to the days when you could simply just program a computer? Remember when basic was included with every Apple and Atari and Commodore, when you could make the computer do some simple stuff all on your own?

Where have those days gone?

What are the options for simple whip-it-out programs?

Judoscript, my previous fave, is gone. Groovy is huge and too javaish. Jython and Python are ok, but still require learning that spaces are important. Don’t even get me started about Ruby and Perl. VBScript and Javascript are easy, but there is no cross-platform and simple runtime for them. is too focused on graphics, and has few ways to access databases or do anything too intense.

GUIs are the death of every one of the scripting language. With the need to have events and listeners, you immediately remove any ease from the process. Or, you have too much GUI, and a limited back end, ala the very handy AutoIT scripting language for controlling Windows: can move icons around the screen, open and click on toolbars, all kinds of stuff… but there is no built in database access or other high power stuff. Still, in all my searching, it is coming out on the top for simple programming in a basic-like script with simple GUIs.

What do I really want?

Imagine a script with syntax like this:

Create Button {Name}
  StartLabel "Click Me"; HoverText "Activates Blender";
  If Clicked Do [Something]


Open File
Loop til File End
  Read Line
  Divide Line on Spaces into X[]
  Write Line
End Loop

Give it simple database functionality so one can “join” variables on keys, and can aggregate. It should have the ability to create tables (crosstabs and full tables) along with transpose capabilities.

Looks like pseudocode, doesn’t it? Except that’s what we really want. We often don’t want to muck with all the fancy stuff, we just want to get stuff done.

Make writing a program like writing English, and suddenly, anyone can program again.

Why don’t we have something like this? Is it really all that hard? Does everything need to have object inheritance, closures/lambdas, or other super sophisticated language features? Where is a simple language which can do powerful things?

I don’t mind the Wizards with Lisp and Erlang, with map-reduce and other magic. But is there something for the rest of us to use?

Comments? [1]

* * *


Data Visualization and Radiohead · 07/16/2008 12:24 PM, Analysis

You might have heard about the new Radiohead music video. Done completely in Flash, it used 3d data and made the whole thing using desktop technologies and no cameras. You can now play with the data and the approach using Processing. See the Papervision3d blog for more detail, or go directly to the video for House of Cards hosted at Google, where you can also download code and data.

Who did this amazing thing? A pair of guys Yahoo! chose to lay off in it’s purge in Feb ’08. Aaron Koblin and Aaron Meyers were part of the Yahoo Design Innovation group aka “yHaus”, a team dedicated to moving the art and science of visualization forward. This team was creating amazing work… and Y! purged them all.

It’s sad when you realize what you had only after you’ve let it go. And that old adage that if you love something, let it go and if it loves you back, it will return? Well, that only works if the person wanted to go. These guys wanted to stay, Y! laid them off, and now I expect their success will be the best revenge.

I believe that the ability to display data in a way that is easy to understand and encourages the viewer to engage with it is the future. Presenters all believe that graphics should tell an immediate story (see Seth Godin’s 3 laws) and others want all data displays to be interactive and clickable requiring you to dig to see anything (see Crystal Xcelsius). The real answer is a mix of these, requiring both art and analytic science. And groups like yHaus were on this track, ahead of so many others.

Sure, there are other great design and visualization groups and tools out there. But how many have had to deal with the real deal: massive data on an almost unimaginable scale? One of of the few who did are now gone, and I think we will all understand what we lost only well after it’s gone.

Keep your eyes out for other wonderous things from folks like:
S. Joy Mountford
Aaron Koblin
Doug Fritz
Ben Clemens
Daniel Massey
Aaron Meyers
Michael Chang
Juliana Yamashita
Ray Mcclure
Carrie Burgener
Jen Lau
Jenny Chowdhury
Vaibhav Bhawsar
mia (Jiyoung Yun)
Parul Vora

If I left anyone out, please put it in the comments. I also couldn’t find sites for some folks; if you know of them, put them in the comments and I’ll update.

See Yahoo Design Closed Down for more about their projects. and keep changing the number at the end.


* * *


On a previous episode...

powered by Textpattern 4.0.4 (r1956)