It’s a truism that in today’s data-driven world not all data leads to good decision-making. It’s a reality that can put both consumers of data products and data professionals themselves at risk. Inaccurate data, or mishandled data in a predictive analytics system or process, can result in business damage, such as rewarding the wrong customers with loyalty points, discounting the wrong products and extending loans to those most likely to default.
Then there’s the sanctity of confidential data, and how violating privacy practices can put data professionals themselves at risk. The U.S. National Security Agency, for instance, is now embroiled in controversy over the scope of its surveillance methods and just how much private data about American citizens the government is justified in collecting in the name of national security. For data scientists in charge of confidential information, violations can easily expose them to legal liabilities.
To prevent data abuses and minimize personal liability, Michael Walker, a managing partner at Rose Business Technologies, a Denver-based systems integration and technical services provider, believes it’s time data scientist was recognized as a profession. For this reason, Walker has drafted a 12-page data science code of professional conduct covering everything from what it means to be a data scientist to everyday duties. He posted the proposal in March and invited public comment. An excerpt from the draft is at the end of this article.
This isn’t the first time someone has tried creating a kind of Hippocratic Oath for analytics professionals. Earlier this year, the Institute for Operations Research and the Management Sciences (INFORMS) established a code of ethics along with the launch of its Certified Analytics Professional (CAP) designation program.
While the INFORMS code caters to its members in operations research, management science and analytics professionals, Walker envisions more of an all-encompassing oath capable of protecting a wide variety of data scientists. In this interview with Data Informed, Walker shares his thoughts on the nature of data scientists’ work and why the profession is ready for its own code of conduct.
Data Informed: How would you define a ‘data scientist’?
Michael Walker: Data scientists who are practicing competently are like well-trained surgeons or highly specialized lawyers. They’re using scientific methods to extract meaning from data but it’s very different from your garden-variety business analysts, who are using descriptive analytics. What data scientists concentrate on is more predictive than analytics and that’s a very different mindset.
How does the role of a data scientist differ from that of an analyst?
Walker: A lot of people call themselves data scientists who are not data scientists. There are a ton of Ph.D.’s in statistics, psychology or economics that have a great deal of experience working with data but in a narrow framework. Data scientists understand machine learning, how to design and execute algorithms and understand how to read the data so that they’re actually getting meaning from it. That’s tricky business. You can’t just take your garden-variety business analyst and ask them to start working with these large data sets, slicing and dicing it, and expect them to be able to come up with actionable intelligence for the client to use. That’s why data scientist needs to be become a profession.
What are the dangers of working with data?
Walker: If you’re not careful using scientific methods, you can end up with the wrong answers. You’re going to advise your clients poorly and they’re going to end up making very bad decisions. For example, a government agency can end up making bad policy decisions, or the leaders of a business can end up relying on flawed results because an analyst doesn’t know what he’s doing.
How can a code of conduct protect data scientists?
Walker: There’s a huge problem with personal data privacy issues today. There’s a lot of data out there that contains very sensitive, private information on individuals that needs to be treated very carefully. It’s wonderful if an organization like the NSA plans to use data to protect us from harm like terrorism, but there are secondary uses of this data that can be abused. Data scientists have a moral and ethical duty not to abuse this private data.
So if the government asks data scientists to find out certain information, or collect private data to target a particular group or individual, but it doesn’t have to do with protecting us from harm, data scientists should be able to say, “No, we can’t do that because we signed a code of conduct that doesn’t allow us to abuse people’s private information.” That would create a layer of protection for the data scientist so that they’re protected from the government or business asking them to abuse people’s personal data. That’s going to be a huge issue in the future.
What should a code of conduct include?
Walker: A code of professional conduct gives data scientists a guideline of what’s the proper use of data, what’s not the proper use of data and protects them in case a client or employer asks them to abuse this data. It also needs to protect a client so if they hire a data scientist, they know they’re ethical, they’re following a code and they have at least have some ground floor of competency to practice data science. It’s an extra level of scrutiny and a higher standard that data scientists have to follow because the consequences of being wrong are just too great.
Cindy Waxer, a contributing editor who covers workforce analytics and other topics for Data Informed, is a Toronto-based freelance journalist and a contributor to publications including The Economist and MIT Technology Review. She can be reached at firstname.lastname@example.org or via Twitter: @Cwaxer.