In my previous article, I pointed out how difficult it is to get all of one’s customer data into one basket and suggested that a customer data taxonomy would be a good place to start. After all, you have to know what you have – and what you don’t have – before you can manage it.
With the ability to store more data than is rational, from systems gathering data faster than is sensible, for reasons that are more aspirational than comprehensible, one is faced with the fact that (aside from the legal liability) it is now cheaper to keep data than to decide what to delete.
Regardless of how much data you keep, you need a classification system.
Taxonomy is the process or system of describing the way in which different living things are related by putting them in groups. Treating data like living things, here are some suggestions for categorizing your various data sets.
Data Capture Method
In the Good Old Days (pre-Internet), we relied on focus groups, surveys, and market share reports to figure out what people might want, whether they had an intent to purchase, and what was actually purchased.
The Good New Days started with web server log files. A standard feature of transaction systems, server log files never were intended to be revelatory, aside from showing what happened just before the lights went out.
But inquiring minds figured out that they could parse the meagre contents (resource requested, client IP address, request date/time, HTTP code, files and bytes served, user agent, referrer) to infer the most popular pages and the path that an individual took to wander about one’s site.
Sadly, the Internet itself was not originally built to be a transaction system and therefore was created as “stateless.” Because IP addresses are typically dynamic (handed out at the time of boot-up and/or login) one could not determine if the same computer came back to your site from one day to the next. And thus, cookies were created.
Thus began the ongoing effort to capture and retain as much data as technically possible. Here then, is a modest proposal for organizing data by capture technique and their potential attributes. This list is by no means exhaustive, but it’s a start.
|Website log files||Minimal information but useful for error code capture.|
|Cookies||Click-throughs, pageviews, visitors’ stickiness, attribution, conversion, bounce rate, path analysis, view-throughs, abandonment, recency, frequency, log in, identity, postal address, credit card, preferences, phone number.|
|Ad network cookie dissemination||Advertising exposure.|
|Survey/focus group||Opinion, attitudes.|
|Social media||Opinion, attitudes, social graph, influence.|
|Phone App||Location, direction, speed, device orientation.|
|Wearables||General health/fitness, sleep habits, driving habits, gait.|
These stewards are in the mix when anomalies pop up. Are they legitimate points of interest or only artifacts of the data collection and integration process? We have to treat data as one would treat the supply chain for a consumable product.
How the Data Were Shared
To understand the veracity of a specific data set, it is also important to understand each element from the customer perspective. If Amazon tells a customer that they know the customer’s email address, home address, credit card number, phone number, and how much toothpaste he consumes, that’s no surprise. But should Amazon happen to mention that they also know what you are most likely to buy next and they already put it on a truck in your neighborhood, or that they are actively listening to your conversations in case you want to ask a question of “Alexa” (Amazon Echo), then people might feel differently.
Therefore, it’s important to label the data elements you have by how they were acquired from the customer point of view, rather than from the technology perspective.
|Initiated||Membership application, preference settings|
|Published||Posted photos, posted location|
|Engaged||Cookie/Java, loyalty card|
|Unforeseen||Toll camera license plate reader, mouse movement/hover|
|Passive||CCTV camera, mobile phone location, iBeacon|
|Personally identifiable consumer data||Customer insights services (data) providers|
|Addressable audience clusters||Ad networks, memberships (social media), viewers (subscriptions)|
|Computational||Average spend, time on site, profitability, spend per visit|
|Notational||Order value, customer segment by age|
|Statistical||Credit score, lifetime value, life expectancy, attrition risk|
|Analytical||Next best offer, purchase propensity|
Data Subject Matter
Finally, the data you have might be classified by subject matter – another way customers think about information. It doesn’t matter to people if you got the bytes from a website or a phone or whether you purchased it or inferred it. What matters is what you know about them.
|Who They Are – Identity||Name, gender, age, race, address, phone, fingerprint, weight, gait, government ID.|
|What They Have Done – History||Education, career, criminal record, press exposure, publications, awards, associations, credit score, loans, divorce.|
|What They Like – Proclivities||Preferences, settings, avocations, political party, social likes, entertainment, hobbies, news feeds, browser history, brand affinity.|
|What They Have – Possessions||Income, home, cars, devices, clothing, jewelry, investments, memberships, collections, relationships.|
|What They Do – Activities||Keystrokes, gestures, eye tracking, day part, location, social posts, dining out, purchases, commute, TV viewing.|
|How They Feel – Beliefs||Religion, values, donations, skepticism/altruism, introvert/extrovert, generous/miserly, adaptive/inflexible, aggressive/passive, opinion, mood.|
Data Protection and Data Ethics
Now that you have a complete inventory of the data you have gathered, captured, calculated, and integrated, the Powers That Be will require that you keep it safe and only use it for the specific reason that you collected it in the first place.
Could you stand up in court, raise your right hand, and swear that the above is true? Do you really know why you are collecting all of this information – specifically? Do you care? Does your company have a policy or a process to enforce a policy of how information is treated internally?
“How does it feel to be
One of the beautiful people?
Now that you know who you are,
What do you want to be?”
—John Lennon and Paul McCartney, 1967
Jim Sterne is founder of the eMetrics Summit and co-founder of the Digital Analytics Association. He has consulted to some of the world’s largest companies, lectured at MIT, Stanford, USC, Harvard, and Oxford.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.