A Customer Data Taxonomy

by   |   March 3, 2016 3:00 pm   |   0 Comments

Editor’s Note: Jim Sterne is the founder of the eMetrics Summit, taking place April 3-6 in San Francisco. For more information about the event and to register, click here. Use the Data Informed code DIPAW15 for 15 percent off of your registration.

The eMetrics Summit is co-located with the Predictive Analytics World Conference. For more information about that event and to register, click here. Use the Data Informed code DIPAW15 for 15 percent off of your registration.

 

Jim Sterne, Founder, eMetrics Summit

Jim Sterne, Founder, eMetrics Summit

In my previous article, I pointed out how difficult it is to get all of one’s customer data into one basket and suggested that a customer data taxonomy would be a good place to start. After all, you have to know what you have – and what you don’t have – before you can manage it.

With the ability to store more data than is rational, from systems gathering data faster than is sensible, for reasons that are more aspirational than comprehensible, one is faced with the fact that (aside from the legal liability) it is now cheaper to keep data than to decide what to delete.

Regardless of how much data you keep, you need a classification system.

Taxonomy is the process or system of describing the way in which different living things are related by putting them in groups. Treating data like living things, here are some suggestions for categorizing your various data sets.

 

Data Capture Method

In the Good Old Days (pre-Internet), we relied on focus groups, surveys, and market share reports to figure out what people might want, whether they had an intent to purchase, and what was actually purchased.

Related Stories

Data and Information Management: A Journey, Not a Destination.
Read the story »

The Role of the Shipping Parcel in a Data Management Strategy.
Read the story »

A Holistic Approach to Combining Internal and External Data.
Read the story »

Contextual Integration Is the Secret Weapon of Predictive Analytics.
Read the story »

The Good New Days started with web server log files. A standard feature of transaction systems, server log files never were intended to be revelatory, aside from showing what happened just before the lights went out.

But inquiring minds figured out that they could parse the meagre contents (resource requested, client IP address, request date/time, HTTP code, files and bytes served, user agent, referrer) to infer the most popular pages and the path that an individual took to wander about one’s site.

Sadly, the Internet itself was not originally built to be a transaction system and therefore was created as “stateless.” Because IP addresses are typically dynamic (handed out at the time of boot-up and/or login) one could not determine if the same computer came back to your site from one day to the next. And thus, cookies were created.

Thus began the ongoing effort to capture and retain as much data as technically possible. Here then, is a modest proposal for organizing data by capture technique and their potential attributes. This list is by no means exhaustive, but it’s a start.

Website log files Minimal information but useful for error code capture.
Cookies Click-throughs, pageviews, visitors’ stickiness, attribution, conversion, bounce rate, path analysis, view-throughs, abandonment, recency, frequency, log in, identity, postal address, credit card, preferences, phone number.
Ad network cookie disseminationAdvertising exposure. 
Survey/focus groupOpinion, attitudes. 
Social mediaOpinion, attitudes, social graph, influence. 
Phone AppLocation, direction, speed, device orientation. 
WearablesGeneral health/fitness, sleep habits, driving habits, gait. 

These stewards are in the mix when anomalies pop up. Are they legitimate points of interest or only artifacts of the data collection and integration process? We have to treat data as one would treat the supply chain for a consumable product.

How the Data Were Shared

To understand the veracity of a specific data set, it is also important to understand each element from the customer perspective. If Amazon tells a customer that they know the customer’s email address, home address, credit card number, phone number, and how much toothpaste he consumes, that’s no surprise. But should Amazon happen to mention that they also know what you are most likely to buy next and they already put it on a truck in your neighborhood, or that they are actively listening to your conversations in case you want to ask a question of “Alexa” (Amazon Echo), then people might feel differently.

Therefore, it’s important to label the data elements you have by how they were acquired from the customer point of view, rather than from the technology perspective.

Provided  
 InitiatedMembership application, preference settings
 TransactedSurvey, purchase
 PublishedPosted photos, posted location
Observed  
 EngagedCookie/Java, loyalty card
 UnforeseenToll camera license plate reader, mouse movement/hover
 PassiveCCTV camera, mobile phone location, iBeacon
Purchased
 Personally identifiable consumer dataCustomer insights services (data) providers
 Addressable audience clustersAd networks, memberships (social media), viewers (subscriptions)
Derived
 ComputationalAverage spend, time on site, profitability, spend per visit
 NotationalOrder value, customer segment by age
Inferred
 StatisticalCredit score, lifetime value, life expectancy, attrition risk
 AnalyticalNext best offer, purchase propensity

Data Subject Matter

Finally, the data you have might be classified by subject matter – another way customers think about information. It doesn’t matter to people if you got the bytes from a website or a phone or whether you purchased it or inferred it. What matters is what you know about them.

Who They Are – IdentityName, gender, age, race, address, phone, fingerprint, weight, gait, government ID.
What They Have Done – HistoryEducation, career, criminal record, press exposure, publications, awards, associations, credit score, loans, divorce.
What They Like – ProclivitiesPreferences, settings, avocations, political party, social likes, entertainment, hobbies, news feeds, browser history, brand affinity.
What They Have – PossessionsIncome, home, cars, devices, clothing, jewelry, investments, memberships, collections, relationships.
What They Do – ActivitiesKeystrokes, gestures, eye tracking, day part, location, social posts, dining out, purchases, commute, TV viewing.
How They Feel – BeliefsReligion, values, donations, skepticism/altruism, introvert/extrovert, generous/miserly, adaptive/inflexible, aggressive/passive, opinion, mood.

Data Protection and Data Ethics

Now that you have a complete inventory of the data you have gathered, captured, calculated, and integrated, the Powers That Be will require that you keep it safe and only use it for the specific reason that you collected it in the first place.

Could you stand up in court, raise your right hand, and swear that the above is true? Do you really know why you are collecting all of this information – specifically? Do you care? Does your company have a policy or a process to enforce a policy of how information is treated internally?

“How does it feel to be
One of the beautiful people?
Now that you know who you are,
What do you want to be?”
—John Lennon and Paul McCartney, 1967

Jim Sterne is founder of the eMetrics Summit and co-founder of the Digital Analytics Association. He has consulted to some of the world’s largest companies, lectured at MIT, Stanford, USC, Harvard, and Oxford.

Editor’s Note: Jim Sterne is the founder of the eMetrics Summit, taking place April 3-6 in San Francisco. For more information about the event and to register, click here. Use the Data Informed code DIPAW15 for 15 percent off of your registration.

The eMetrics Summit is co-located with the Predictive Analytics World Conference. For more information about that event and to register, click here. Use the Data Informed code DIPAW15 for 15 percent off of your registration.



Subscribe to Data Informed
for the latest information and news on big data and analytics for the enterprise.



Predictive Analytics: Opportunities, Challenges and Use Cases




Tags: , , , , , , , , , , ,

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>