Why Is It Ethical Not to Test for Emotional Impact?

The most troubling thing about the recent Facebook and OKCupid experiments may not be the experiments themselves. It may be the potential of the controversies surrounding their publication that discourages wider emotional experimentation.

Ethical behavior implies acting in a way that cultivates the good life. Content providers and services that honestly seek to pursue this need to do more testing to see not only what sells products or engages users, but what leads toward the greater good.

Until now, this kind of research has been impractical, but the rise of tools for automatically assessing emotions — like text analytics tools and behavioral tracking — could play a role in creating a better Web. Ultimately, many of the concerns raised by these experiments may have been generated by the way in which Facebook and OKCupid notified users rather than the experiments themselves.

Michelle Meyer Professor of Bioethics at the Union Graduate College explains,

Both Facebook and OK Cupid seriously undermined their users’ trust in the way they handled these experiments. Telling users up front exactly what researchers were going to do and why would have biased the results, thereby defeating the point of the experiments themselves. But the companies might have made a blanket statement that they conduct tests to assure and improve the quality of users’ experiences with the websites. And after the experiments were done, they could and should have disclosed everything to users.

When users feel manipulated

Although websites and online stores experiment on users all the time, experts note that users can feel differently about sites that merely sell them products compared with ones designed for self expression and connection. Aaron Chavez, Chief Scientist at Alchemy API, which provides a sentiment analysis service for Websites and other uses, says, “People identify themselves more closely with the content they would post on Facebook or the profile they put on a dating website than with their purchasing choices. To be manipulated on that front is more of a deeper invasion that is closer to who they are as a person.”

Others note that users do care about all kinds of testing, they just want to know about it. As Meyer notes,

I’d be a little cautious in assuming that people don’t care about A/B testing. The response to the Facebook experiment shows pretty clearly that most people had no idea that their online experience is usually carefully curated or that companies constantly engage in A/B and other forms of behavioral testing. And there’s a lot of variation within the category of behavioral website testing. So I think we largely don’t know how people would react to different kinds of tests on different kinds of platforms for different kinds of purposes with different kinds of effects on users.

That said, I agree that it’s unlikely that many, if any, people would care if they knew that, for instance, Google had tested 41 shades of blue to see how it affects user clicks. Companies redesign their websites and logos all the time for aesthetic and marketing purposes and the public is aware of that fact. No reasonable person thinks that color redesign, per se, is inconsistent with Google’s relationship with its users.

When colors means more than engagement

There is no known risk to the user in seeing one shade of blue as opposed to another, although Google’s analytics might suggest that the effect of being shown particular shades of blue is that users are more likely to click on the toolbar (thus generating revenue for Google). But users are likely to discount the idea that something as seemingly trivial as a shade of blue could cause them to behave differently than they otherwise would. But clicking is not often perceived as a particularly risky or otherwise problematic behavior.

On the other hand, Meyers notes that concerns might be raised in cases where third party research, such as Target’s infamous pregnancy prediction scores, suggests that color preference is predictive of some sensitive personality traits like narcissism or autism, which was used for targeted marketing. Some users might object to Google “knowing” (or believing) these things about them when it’s not information they’ve provided, and/or to providing them a different user experience on that basis.

Testing versus manipulation

One step further would be a company that manipulates the online environment in order to get inside users’ heads and cause some users to have a sensitive trait that they didn’t have before the intervention. Meyers says this is exactly how lots of people see the Facebook mood experiment.

People were upset about the Facebook mood study because the media message that went viral was: “Facebook intentionally depressed users just to see what would happen.”

Meyers notes that this was a highly misleading way of characterizing the study and the background conditions against which it was conducted. However, when the experiment was seen in this light, it’s easy to see why people were upset. She says,

It’s one thing to conduct trivial A/B testing that carries no risk, or to which users consent. But on the surface, it looked like Facebook teamed up with a couple of academics to study how to cause people to become depressed (a dangerous state that increases risk of suicide, drug and alcohol abuse, and on and on) just to satisfy their curiosity and get some career-enhancing publications.

How can you be sure if you don’t test

Theories are easy to come by, but actually confirming that they are true demands that they be tested. In Facebook’s case, prior research had suggested that that exposure to an unremitting stream of our friends’ happy news makes us feel sad and envious, because our own lives seem less successful by comparison. But these studies were fairly small and depended on subjects self-reporting their moods.

At the same time, another general body of research suggested the opposite phenomenon of “emotional contagion” in which one person “catches” another’s mood, much as communicable diseases travel. It was plausible that that phenomenon might apply to Facebook and other online experiences, in which case exposure to others’ positive posts would tend to make users happy, the opposite effect of emotional contagion.

It was only possible to get an idea of which hypothesis was more relevant to Facebook users by further testing before any conclusion could be drawn. In Facebook’s case, they slightly adjusted the algorithm for selecting items for the news feed that were just a tiny subset of all the items posted by friends. The results seemed to confirm the emotional contagion hypothesis. Meyers suggests that not doing the experiment carried more risks in the long run. “The alternative to attempting to determine the answer to that question is to subject millions of Facebook users to unknown emotional risks.”

In OkCupid’s case, they wanted to find out if their algorithm for matching users really worked. It’s quite possible that the algorithm itself was just a placebo, which could have harmfully encouraged millions of people to form unhealthy relationships. Meyers notes,

These claims that the experiment harmed users assume that OK Cupid’s algorithm really measures compatibility in the first place as opposed to working as a placebo, where people find each other more attractive if a supposedly scientific algorithm tells them that they should find each other attractive… But the only way to know whether an algorithm (or anything else) works is to test it. It is indeed ethically problematic to accept money from consumers for a service and then fail to give them that service.

A similar dilemma takes place in medicine. People want their doctors to choose their medication for them. They don’t want a computer to randomize them to one drug versus another, much less a placebo. Doctors are obligated to make medical decisions in the best interests of their patients, but that assumes there is evidence it works. Meyers says, “A lot of medicine simply isn’t evidence-based and doctors’ practices are based on tradition or hunches.”

The danger of simple conclusions

The Facebook experiment suggested that when people are exposed to negative sentiment they are in turn more likely to express negative sentiment. But just because people were complaining more on Facebook, does not necessarily mean they were unhappier. The researchers even went so far as to characterize a users increased use of negative words in their post with a decline in their mood.

But just because people were expressing more negative sentiment does not necessarily mean they were less happy. As Meyers notes,

A user who increases her own use of negative words after seeing her friends do the same may have ‘caught’ their bad mood, as the researchers suggest, or she may have already been feeling sad or angry but simply more free to express those feelings once she saw more of her friends doing the same. To get a better sense of how the change to the news feed algorithm affected users’ mood, researchers would have had to give users psychology batteries designed to measure mood before and after the intervention.

The researchers also reported that people who were exposed to more positive sentiment tended to write more positive things as well. A simplistic reading of the results might suggest that the one way to make people happier might be to cut out all the bad news. But what if in the process, a user lost touch with their Aunt Edna who happened to be going through a major health challenge.

As Alchemy API’s Chavez notes,

This is where you need a more complicated way understanding of what you are trying to give people with respect to sentiment. You can fairly reliably detect positive or negative sentiment now. But just because something is negative does not mean it is not something that someone does not want or need. It is important to use the tools at your disposal in a comprehensive way to make better decisions than what a single statistic shows you.

Sidebar: ethical considerations for online experiments

Michelle Meyer Professor of Bioethics at the Union Graduate College Suggests Websites should consider the following practices when conducting experiments on Websites, mobile applications and games:

Does the intervention — the change in the user’s online environment that the company makes — pose any known additional risks to those users above and beyond any risks the website may already expose them to? It’s important to consider risks that are foreseeable and for which evidence exists at the time of the intervention, rather than engaging in Monday-morning quarterbacking. And it’s important to limit risk analysis to the incremental risk, if any, that testing imposes on users beyond what they’re already exposed to. If normal use of a company’s website already exposes users to risk, then that may raise ethical issues in its own right, but it’s distinct from an ethical analysis of testing.
Does the testing raise any informational privacy issues (you can think of informational privacy as one type of risk)? Is the data about user behavior the testing produces sensitive? How does the company collect test data and what do they do with it? Does the company associate data with individual user names or are the data anonymized or aggregated? Does the company share that data with third parties?
What level of consent did users give? Moving from one end of the spectrum to the others: did users provide explicit consent for the specific intervention, data collection, and data use at issue? Did they explicitly agree to behavioral testing in general? Did they “agree” to behavioral testing by accepting the company’s terms of service? Did the testing violate the company’s terms of service, privacy policy, or other agreement with users?
Absent explicit consent to particular testing, to what extent was the testing consistent with users’ reasonable understanding of their relationship to the website and the company and/or in users’ interests? Sometimes a company’s interests are aligned with those of its users, as when a more user-friendly interface leads to both increased user satisfaction and increased corporate revenue. Behavioral website testing aimed at quality assurance or improvement should be consistent with users’ reasonable expectations of most companies, and even absent explicit consent, testing that comports with reasonable user understandings and expectations may be ethically acceptable whereas using the website to test something orthogonal to the user-company relationship may not be.
Related to, but separate from, the consent question is what level of transparency did the company provide before, during, and after the testing? Did the company omit to disclose, or actively deceive users about, the testing? Were there potentially legitimate reasons for any omission or deception (like the desire to avoid biasing the test results) or was it driven by the company’s fear that users wouldn’t understand or agree to participate, or mere lack of concern about users?

George Lawton has been infinitely fascinated yet scared about the rise of cybernetic consciousness, which he has been covering for the last twenty years for publications like IEEE Computer, Wired, and many others. He keeps wondering if there is a way all this crazy technology can bring us closer together rather than eat us. Before that, he herded cattle in Australia, sailed a Chinese junk to Antarctica, and helped build Biosphere II. You can follow him on the Web and on Twitter @glawton.