Deprecated: Function set_magic_quotes_runtime() is deprecated in /home/mwexler/public_html/tp/textpattern/lib/txplib_db.php on line 14
The Net Takeaway: Tuple-canoe and Tyler too...


Danny Flamberg's Blog
Danny has been marketing for a while, and his articles and work reflect great understanding of data driven marketing.

Eric Peterson the Demystifier
Eric gets metrics, analytics, interactive, and the real world. His advice is worth taking...

Geeking with Greg
Greg Linden created Amazon's recommendation system, so imagine what can write about...

Ned Batchelder's Blog
Ned just finds and writes interesting things. I don't know how he does it.

R at LoyaltyMatrix
Jim Porzak tells of his real-life use of R for marketing analysis.






Tuple-canoe and Tyler too... · 12/28/2004 06:50 PM, Database

First off, if you don’t get the terrible attempt at historical humour in the title, read the history of John Tyler, 10th president of the US.

When reading about databases, you may encounter various terms that seem like strange words for stuff you already know. Don’t waste your time looking them up; I’ve translated them for you here.

Cell=Field=Value=Data point (these are all intersections of Rows and Columns)

Yes, some people refer to the Columns as Fields, but to be honest, they’re wrong. Sorry ‘bout that.

Oh, and two other additional tips:
As seen in Speed Up SQL Server Apps by Roman Rehak, there are two common SQL queries which are poorly optimized in most database systems, and can be rewritten to help the database create a better execution plan.

1) Replace COUNT With EXISTS When Checking for Existence
A classic case is to code branches based on record existence, something like

     WHERE ShipVia = 3) > 0
  PRINT 'You cannot delete this shipper'

But if you are just looking for at least one (AtLO, or 1 or more, or however you like to phrase it), then an EXISTS clause is more effective. Basically, the database can short-circuit as soon as it finds the first row, instead of having to count them all (and throw away the value since you never really use it!).

     WHERE ShipVia = 3)
  PRINT 'You cannot delete this shipper'

EXISTS is also used in where, as in

  where exists (select 1 from t1
                      where t1.customerID=t2.customerID)

2) Be Careful When Using WHERE IN and WHERE NOT IN

If you are just doing WHERE customerID in (1, 2, 3), then things are probably fine. But if you have lots of values in the WHERE IN, then the database will often struggle. If you look at the explain plan, you may see many nested loops, which make the database work harder than it should. Instead, you might consider rewriting as an Outer Join… No, I don’t know how many “lots” is, so as Tom says, run it both ways and time it.

Replace the WHERE IN clause with OUTER JOIN if you’re using a subquery to generate a potentially large list. Doing so can improve performance significantly:

  SELECT c.*
     FROM Customers c
     LEFT OUTER JOIN Orders o
     ON o.CustomerID = c.CustomerID
     WHERE o.CustomerID IS NULL

The LEFT OUTER JOIN selects all rows from the Customer table—whether or not a customer placed any orders—and joins them with the Orders table. Then the WHERE clause filters out the rows where the columns from the Orders table have NULL values. Similar to the EXISTS thing previously, the database can “short-circuit” from having to do every possible join and will just deliver the necessary rows.

There are other tips in this article, many SQL-Server specific, but I thought these two deserved special mention as being very handy.

* * *


  Textile Help
Please note that your email will be obfuscated via entities, so its ok to put a real one if you feel like it...

powered by Textpattern 4.0.4 (r1956)