Big Data and analytics (and especially predictive analytics) are powerful inventions of modern technology. Inventions that influence our lives by assigning our credit scores, protecting us from fraudulent credit card charges, showing ads relevant to our interests and more. But, these tools are not innately objective. In fact, if they are not developed correctly, they can reinforce discriminatory stereotypes, creating massive disadvantages for affected individuals.
Beware of Bias in Big Data and Analytics
Unfortunately, analytics tools are only as good as their developers, so accounting for human bias is mission critical—and there are several types to consider.
Selection Bias – This occurs when subjective and/or non-random data has been selected to represent an entire population. Unfortunately, much of the data companies base their decisions off of is low quality, inaccurate and incomplete.
Confirmation Bias – This happens when there is a desire to confirm a hypothesis, whether the desire is a conscious one or not. Because there is a lot of flexibility in how data is processed, the risk for confirmation bias to appear in the results is significant.
Outliers – These are data points that measure significantly higher or lower than normal values or the natural pattern of distribution. Outliers can bias analytical results if they are not removed, and are hard to spot for people who do not work with statistics often. While some outliers should be factored into analytics, they may need to be given a lower value. This makes relying on common averages particularly dangerous to achieving objective results.
Simpson’s Paradox – This is a situation that occurs when a trend is indicated in specific groups of data but disappears when the same data is combined with additional data sets. This can be an issue when using data visualization or descriptive analytics. Data scientists must be able to statistically evaluate the trends in order to determine that the contributing data is true and significant, rather than noise to be filtered out.
Confounding Variables – Relationships between variables can be proven false when a confounding variable has been omitted or overlooked. In short, a confounding variable is simply an ‘extra’ variable you didn’t account for. The risk of confounding variables can be reduced with random sampling, introducing control variables, testing the same subjects multiple times, and measuring certain portions of your group under specific conditions.
Get Your Algorithm Right
It may be impossible to keep biases out of predictive analytics, but there are some ways data scientists and business analysts can mitigate the risk. Being transparent with the purpose and development of your algorithm is the first step. While transparency is not equivalent to accountability, it can still help outside data scientists spot biases, outliers and anomalies internal teams may have missed.
Because the use of predictive analytics is new, its governance is limited. There are, however, talks of outlining best practices, sharing data sets and benchmarks, and imposing ethical boundaries on algorithms. These issues are being addressed by the Partnership on AI, which is a collaborative effort from Amazon, IBM, Facebook, Google and Microsoft, as well as OpenAI and the UK’s Future Intelligence Centre.
There is even a fund dedicated to regulating these technologies called the Ethics and Governance of Artificial Intelligence Fund, which has been backed by LinkedIn’s founder, Reid Hoffman, the Omidyar Network, the Hewlett Foundation, and others. Their goal is to ‘support interdisciplinary research to ensure AI develops in a way that is ethical, accountable, and advances the public interest.’
Humans Are Still Necessary
Ethically speaking, humans far surpass machine learning and AI. That’s why sometimes, even when the data suggests otherwise, it’s a good idea to let people make the final judgment call. For instance, when Amazon decided to roll out same-day delivery in New York, their system decided predominantly white neighborhoods were the best places to start. Their AI based this decision on the concentration of Amazon Prime subscriptions in certain areas, distance from distribution centers and other factors. Obviously, their algorithm was not inherently racist, but the results seemed to be. Because Amazon is still run by humans, they have the moral authority to deny the data and create a more even-focused, nondiscriminatory rollout strategy.
“As soon as you try to represent something as complex as a neighborhood with a spreadsheet… you’ve made some generalizations and assumptions that may not be true, and they may not affect all people equally.” -Sorelle Friedler, Professor of Computer Science at Haverford College
Though there are concerns over humans losing their jobs to AI and robotics, public relations professionals may have better job security.
Integrate Big Data and Analytics Ethically
The discriminatory issues of AI need to be addressed, but they do not invalidate the technologies of big data and analytics, and their extensions. In the Amazon case, the human element kicked in. They altered their algorithm to expand same-day delivery to all of New York City and other formerly excluded zip codes.
With the right data scientists, business analysts and ethical guidelines, big data and analytics can still help your business produce positive customer experiences and higher profits.
At 7T, we offer tools that can help your business benefit from Big Data and analytics. Our product, Sertics, is a data lake creation platform that provides effective and secure data governance. Sertics also integrates with leading predictive analytics and data visualization tools in order to fuel business insights and empower data-driven decision making.