Monday, November 15, 2004

15 November 2004: Data

From the New York Times (via Slashdot):

"Fun fact: 'Wal-Mart has 460 terabytes of data stored on Teradata mainframes, at its Bentonville headquarters. To put that in perspective, the Internet has less than half as much data, according to experts.' That much information results in some interesting data-mining. Did you know hurricanes increase strawberry Pop Tarts sales 7-fold?"

And I used to get excited about one terabyte. But apparently it's true. They know what they sell, because their electronic tills are linked to a central computer that records every barcode scanned through. Gone are the days of those happy mechanical tills with the numbers that jumped up into a little window when the keys were pressed, along with the mysterious 'no sale' sign (took me years to figure out what that was about).

And they know what you buy, too. How? Because you have a 'loyalty card'. They match your 'loyalty' rewards with the stuff you like to buy, which may seem nice, but the bottom line is they want your data. Not boring stuff like your name, bank account or inside leg measurement: no, they want to know what you buy. What kind of customers do Wal-Mart (which, by the way, is Asda here in the UK), Tesco, Sainsbury et al actually have? They store the data - terabytes of the stuff - and go 'mining' in it later, looking for patterns of behaviour, trends that may give their shop a competitive edge.

Of course, they may get the wrong end of the stick: for instance, Lois from church says she uses her Tescos card mainly for buying stuff for a social project she's involved with, which means that to Tesco she only appears to ever buy doughnuts and toilet paper. Perhaps that's why they keeping sending her adverts for medical products?

The point is, William Gibson was right again. In his 1993 novel 'Virtual Light', he described a possible future (sometime around 2010, perhaps) where 'Data Havens' exist. These are countries who, like the Swiss with their banking, will store anyone's data, no questions asked. (Bruce Sterling's 'Islands In The Net' also cover the same topic). They have many customers, not the least of which is the US government who, in the guise of DatAmerica, store a linked database of all such consumer information in a nice, safe place where only they can look at it.

It's happening now. The only differences are, it's legal to hold the data within the country of origin, and of course it's Wal-Mart, rather than the government, who are holding the data.

Full-blown DatAmerica? I give it ten years. Five if I can figure how to make it myself. Well, what did you think AKT was all about? ;-)

Footnote: I think the idea that the internet contains less than half of Wal-Mart's 460 terabytes of data is utter nonsense. Citeseer itself is well over a terabyte now, and most of that space is taken up with its 716797 documents. The amount of PDF information out there alone will be vastly more than 230 terabytes, and that's before you start talking about audio and, especially, video files. A typical three-hour broadband baseball game, at 350Kbits/sec, weighs in at about half a gigabyte, meaning the archive of the 2004 MLB regular season is about 1.2 terabytes in total. I think those 'experts' could do with being stripped of their title.

No comments: