An Argument for a Data Science Code of Professional Conduct

by   |   August 8, 2013 3:53 pm   |   0 Comments

It’s a truism that in today’s data-driven world not all data leads to good decision-making. It’s a reality that can put both consumers of data products and data professionals themselves at risk. Inaccurate data, or mishandled data in a predictive analytics system or process,  can result in business damage, such as rewarding the wrong customers with loyalty points, discounting the wrong products and extending loans to those most likely to default.

Related Stories

Analytics pros see value in code of ethics for growing field.
Read the story »

The five elements of a data scientist’s job.
Read the story »

U.S. internet industry leaders want to see common international privacy policies.
Read the story »

Focus on: privacy.
Read the story »

Then there’s the sanctity of confidential data, and how violating privacy practices can put data professionals themselves at risk. The U.S. National Security Agency, for instance, is now embroiled in controversy over the scope of its surveillance methods and just how much private data about American citizens the government is justified in collecting in the name of national security.  For data scientists in charge of confidential information, violations can easily expose them to legal liabilities.

To prevent data abuses and minimize personal liability, Michael Walker, a managing partner at Rose Business Technologies, a Denver-based systems integration and technical services provider, believes it’s time data scientist was recognized as a profession. For this reason, Walker has drafted a 12-page data science code of professional conduct covering everything from what it means to be a data scientist to everyday duties.  He posted the proposal in March and invited public comment. An excerpt from the draft is at the end of this article.

This isn’t the first time someone has tried creating a kind of Hippocratic Oath for analytics professionals. Earlier this year, the Institute for Operations Research and the Management Sciences (INFORMS) established a code of ethics along with the launch of its Certified Analytics Professional (CAP) designation program.

While the INFORMS code caters to its members in operations research, management science and analytics professionals, Walker envisions more of an all-encompassing oath capable of protecting a wide variety of data scientists. In this interview with Data Informed, Walker shares his thoughts on the nature of data scientists’ work and why the profession is ready for its own code of conduct.

Data Informed: How would you define a ‘data scientist’?

Michael Walker of Rose Business Technologies

Michael Walker of Rose Business Technologies

Michael Walker: Data scientists who are practicing competently are like well-trained surgeons or highly specialized lawyers. They’re using scientific methods to extract meaning from data but it’s very different from your garden-variety business analysts, who are using descriptive analytics. What data scientists concentrate on is more predictive than analytics and that’s a very different mindset.

How does the role of a data scientist differ from that of an analyst?

Walker: A lot of people call themselves data scientists who are not data scientists. There are a ton of Ph.D.’s in statistics, psychology or economics that have a great deal of experience working with data but in a narrow framework. Data scientists understand machine learning, how to design and execute algorithms and understand how to read the data so that they’re actually getting meaning from it. That’s tricky business. You can’t just take your garden-variety business analyst and ask them to start working with these large data sets, slicing and dicing it, and expect them to be able to come up with actionable intelligence for the client to use. That’s why data scientist needs to be become a profession.

What are the dangers of working with data?

Walker: If you’re not careful using scientific methods, you can end up with the wrong answers. You’re going to advise your clients poorly and they’re going to end up making very bad decisions. For example, a government agency can end up making bad policy decisions, or the leaders of a business can end up relying on flawed results because an analyst doesn’t know what he’s doing.

How can a code of conduct protect data scientists?

Walker: There’s a huge problem with personal data privacy issues today. There’s a lot of data out there that contains very sensitive, private information on individuals that needs to be treated very carefully. It’s wonderful if an organization like the NSA plans to use data to protect us from harm like terrorism, but there are secondary uses of this data that can be abused. Data scientists have a moral and ethical duty not to abuse this private data.

So if the government asks data scientists to find out certain information, or collect private data to target a particular group or individual, but it doesn’t have to do with protecting us from harm, data scientists should be able to say, “No, we can’t do that because we signed a code of conduct that doesn’t allow us to abuse people’s private information.” That would create a layer of protection for the data scientist so that they’re protected from the government or business asking them to abuse people’s personal data. That’s going to be a huge issue in the future.

What should a code of conduct include?

Walker: A code of professional conduct gives data scientists a guideline of what’s the proper use of data, what’s not the proper use of data and protects them in case a client or employer asks them to abuse this data. It also needs to protect a client so if they hire a data scientist, they know they’re ethical, they’re following a code and they have at least have some ground floor of competency to practice data science. It’s an extra level of scrutiny and a higher standard that data scientists have to follow because the consequences of being wrong are just too great.

Cindy Waxer, a contributing editor who covers workforce analytics and other topics for Data Informed, is a Toronto-based freelance journalist and a contributor to publications including The Economist and MIT Technology Review. She can be reached at or via Twitter: @Cwaxer.

Excerpts from a Proposed Data Science Conduct Code

Michael Walker of Rose Business Technologies published a draft of his “Data Science Code of Professional Conduct” in March 2013 for public comment. The document includes sections on client relationships, dealing with confidential information, professional integrity and misconduct and terminology. In the proposal, data science refers to “the scientific study of the creation, manipulation and transformation of data to create meaning,” and data scientist “means a professional who uses scientific methods to liberate and create meaning from raw data.

Below is an excerpt from the section entitled, “Data Science Evidence, Quality of Data and Quality of Evidence:”

A data scientist shall not knowingly:

(1) Fail to use scientific methods in performing data science.

(2) Fail to rank the quality of evidence in a reasonable and understandable manner for the client.

(3) Claim weak or uncertain evidence is strong evidence.

(4) Misuse weak or uncertain evidence to communicate a false reality or promote an illusion of understanding.

(5) Fail to rank the quality of data in a reasonable and understandable manner for the client.

(6) Claim bad or uncertain data quality is good data quality.

(7) Misuse bad or uncertain data quality to communicate a false reality or promote an illusion of understanding.

(8) Fail to disclose any and all data science results or engage in cherry-picking.

(9) Fail to attempt to replicate data science results.

(10) Fail to disclose that data science results could not be replicated.

(11) Misuse data science results to communicate a false reality or promote an illusion of understanding.

(12) Fail to disclose failed experiments or disconfirming evidence known to the data scientist to be directly adverse to the position of the client.

(13) Offer evidence that the data scientist knows to be false. If a data scientist questions the quality of data or evidence the data scientist must disclose this to the client. If a data scientist has offered material evidence and the data scientist comes to know of its falsity, the data scientist shall take reasonable remedial measures, including disclosure to the client. A data scientist may disclose and label evidence the data scientist reasonably believes is false.

(14) Cherry-pick data and data science evidence.

Tags: , ,

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>