THE WORDPRESS NEWS CORE
FEATURING WORDPRESS COMMUNITY EXPERTS AND THE WP DAILY ARCHIVES

If You Wouldn’t Say It in Person, Would You Say It Online?

This question is at the center of the recent security breach involving Disqus, a widely used comment hosting service, and Gravatar, a service owned by WordPress co-founder Matt Mullenweg’s company Automattic. Gravatar allows users to maintain a consistent profile picture across websites enabling a Gravatar plugin. Until recently, Disqus enabled Gravatar which uses an MD5 hash, an algorithm designed in 1991, to mask the email addresses associated with each Gravatar account.

Gravatar’s MD5 hash has been proven to be easily hackable. In December, a group of investigative journalists at Sweden’s Researchgruppen was able to de-anonymize the identities of thousands of Disqus commenters who use Gravatar. To do so, Researchgruppen requested commenter data from Disqus using Disqus’s open API protocol and wrote a script to automate the download. Included in the data they downloaded was the un-hashed email addresses of Disqus users who also used the Gravatar plugin. Researechgruppen identified targeted Disqus users by aggregating data posted from various accounts tied to their email addresses.

According to Swedish news outlet, The Local, Researchgruppen’s investigative project began as an effort to identify right-wing commenters who post comments Researchgruppen deems hateful or racist. Researchgruppen sold the identities of targeted commenters, including public officials and private citizens, to the tabloid Expressen. Among the targeted commenters were members of the Sweden Democrat party, several of whom resigned due to Expressen’s publication of de-anonymized comments they had made on Gravatar-enabled sites.

Researchgruppen’s data mining did not stop there. They obtained data related to over 29 million comments as well as the identities of thousands of Disqus users. They have not made an announcement as to what they plan to do with this data. The original project was specifically targeted at right-wing policymakers and citizens in Sweden, and it remains to be seen how or if they will use the remainder of the data they obtained.

When Citizens Lack a Constitutional Right to Free Speech

Such de-anonymization of commenter data has different implications in Europe than in the US. Unlike Americans, European citizens are not protected by a constitutional right to free speech. Risks to de-anonymized commenters in Europe include getting sued for hate speech or libel, getting fired from their jobs, or getting physically harassed by political or ideological opponents. In China, authorities have used commenter data to track down and jail political dissidents.

Yet, de-anonymizing user data isn’t always bad. In the case of infamous redditor ViolentAcrz, doxxing, or revvealing the identity of anonymous commenters through hacking, revealed and put an end to an abhorrent collection of content including pornographic galleries of upskirt shots (photos taken literally “up the skirts” of unsuspecting women). At the time, reddit abdicated responsibility for posts or galleries by users such as ViolentAcrz, and according to the first amendment, what he posted was legal. But any reasonable person would be offended by his content, and indeed he lost his job and briefly became the internet’s enemy #1 when his identity was exposed.

Gravatar and WordPress

Disqus and Reddit aren’t the only publishing platforms vulnerable to doxxing. WordPress, along with other major tools like social publishing platform Hootsuite and collaborative software building site GitHub, use Gravatar. In fact, Gravatar was created by WordPress co-founder Matt Mullenweg, who says Gravatar hosts over 20 billion images per day. If Gravatar users are as easily attacked as it in the case of Researchgruppen and Disqus, all owners of Gravatar-enabled sites should be concerned, especially in countries where citizens can be persecuted for expressing controversial political opinions.

Content producers, developers, and designers know how important it is to understand our audiences. Data collected by services such as Gravatar is a valuable commodity, but data is only useful up to a certain point. Ultimately, numbers are important, but they are not a replacement for creative ingenuity. At what point should marketers depart from what numbers say and rely upon great, original creative?

No matter how one answers that question, users should always have to opt-in to share their opinions or personal information and it should always be clear if the data they opt to share will be sold to third parties or monetized in any way. Bottom line: marketers need to find ways of understanding audiences that preserve their right to privacy.

Internationally, it is vital that social platforms educate their users about the law in the country where that user resides and possible repercussions of comments made online. Currently, there is a lack of easy-to-understand material about privacy concerns online or the legal ramifications of certain types of posts. Instead, there is a lot of paranoia, trolling, and careless posting among users, fueled by over-eager marketers who are failing their users in pursuit of data about them. We would all benefit from better user education about privacy and more effective opt-in tactics and incentives for users who choose to share opinions or personal information online.

Add your thoughts in the comments: what are some solutions for preserving user privacy while monitoring content that is libelous, illegal or potentially dangerous?


megan_croppedMegan Blanchard graduated from UCLA with a B.A. in Film, Television and Digital Media. She has been writing and designing content strategy for online outlets since 2009.
  • Ben

    i don’t see how anyone thinks they can get away with anything these days. i think as your title says “If You Wouldn’t Say It in Person, Would You Say It Online?” are words to live by frankly.

    • Megan Blanchard

      Hi Ben. Thanks for your comment. I agree, although in researching this, the angle of people living under oppressive regimes and using social media to express subversive political opinions made me wish that anonymous commenting were possible. Sadly for political dissidents in these countries, expressing an opinion that challenges their governments puts them in danger. This is why people everywhere must be better educated about privacy, or lack thereof on social platforms – and this responsibility lies with the creators of these social platforms and tools. For Americans who are protected by the First Amendment, my takeaway from researching this issue (and from working in and around marketing for several years) is that marketers are applying math to messaging that requires a sense of humor or humanity that analytics cannot provide. In so doing, marketers are compromising the privacy of their customers. So, to grossly simplify it, this is a call to action for marketers to become more comfortable taking risks based on great creative.

      • Ben

        definitely a thought provoking article and a complicated issue with real life implications especially for folks in less “free” countries. Not trying to say we should give up on the idea of freedom of speech that the internets surface level anonymity presents. i guess i just get a bit cynical when it comes to the notion of true anonymity on the internet. i feel like no matter how anonymous we are able to make people, the powers that be will always have a way around our “safeguards” it’s only a matter of if we the public will know about it.

    • andreasnrb

      Thing is you might not be able to say things in person due to a political climate. Also what some would deem harmless from an online nobody can be used to try and discredit said person. Also assuming that everyone knows how everything works is also not very good from a developer standpoint. WordPress core devs cannot assume that everyone knows that their email hash i published publicly and the possible consequences of doing so.

      In Sweden quotes were taken out of context etc and used to discredit people. Things that can be seen as sarcastic or ironic was used to paint a picture of racism etc.

      • Megan Blanchard

        Good point, Andrea, quotes taken out of context can be irrevocably damaging. There is a lot of irresponsibility all around in leveraging de-anonymized comments for political purposes.

  • andreasnrb

    Excellent stuff. Given the whole NSA mess and what not. Its even more prudent that more people become aware what the tools they use enable others to do. It also prudent that the tools we create does not make it so easy to track people without their consent. Users should be made aware in an easy to understand manner the potential consequences of using services.
    WordPress.com does not for example include privacy information on each hosted site concerning the use of gravatar and so forth.

  • Nick Ciske

    Not sure about the focus on MD5 or Gravatar here — they are red herrings — Disqus essentially handed out a GUID (globally unique identifier) as part of their API response. Yes, that was for use with Gravatar, but it does not matter which hashing algorithm was used, or which service was involved, the issue remains the same: people (foolishly) trusted a 3rd party (Disqus) to keep them anonymous… then got outed when that wasn’t the case.

    Lesson for everyone: don’t use your real email when you want to be anonymous.

    If someone had been erroneously outed due to a collision in the (long know to be imperfect) MD5 hashing algorithm then it may have been relevant… but any unique id would have allowed this data harvesting to occur… and that’s highly improbable.

    If they had harvested web pages looking for the MD5 hash used by Gravatar, then it may have been relevant, but they do not have appeared to do so (and they’d need the email in the first place to even attempt that). All the sites they harvested from appear to use Disqus… so all roads lead back to them leaking hashes and not noticing that someone was essentially scraping all their comment data via 10 simultaneous connections.

    Lesson for site owners: be careful who you trust with your users data.

    Lesson for API operators: have appropriate safeguards/limits/alerts on excessive use.

    Some fact checking:

    1. Disqus claims they never leaked actual email data (just hashes) unlike the author of this article claims:

    “Included in the data they downloaded was the un-hashed email addresses of Disqus users who also used the Gravatar plugin.”

    “Disqus has not been cracked. No emails were leaked by Disqus,” vice president for marketing Stephen Roy said in a statement released on Tuesday.

    2. MD5 is unsafe for *cryptographic* purposes. Not sure if Gravatar would ever be considered a cryptographic use since the method for creating hashes is published in the documentation – i.e. intentionally public. It’s simply obfuscating the email into a GUID, not encrypting it. The chance that 2 email addresses would collide (have the same hash) is very unlikely. SHA1 and other more modern hashing functions have the same issues… the solution is a long private salt… which would have to be published for gravatar to work as designed… making it pointless (it would just break backwards compatibility for no additional security).

    • andreasnrb

      The Researchgruppen harvested the hashes on targetted sites using Disqus API yes. They did not harvest ordinary blogs since the sites they targetted didnt use the standard WordPress commenting systems. If they did they would have done so as well. So this has really nothing to do with Disqus but everything to do with Gravatar. There are possible legal problems with some European countries with the use the WordPress commenting system. If site owners are not made aware of potential issue they cant take action. The WordPress core developers response to all this is: not our problem and we will never change.

      There is nothing saying that groups wont harvest hashes from ordinary WordPress sites. Proof of concept has been made using the Stackoverflow site. You do not need to have the email address inorder to harvest hashes from WordPress sites. That statement is just wrong. All you need is a HTML parser.

      Researchgruppen have also requested help to identify the hashes they didnt match with their own rainbowtable. Its really easy to identify emails using the md5 hash. All you need is a little time.
      Since you need to see how easy it is to all this: http://www.youtube.com/watch?v=fdphoc3XUF8

      • Nick Ciske

        “The WordPress core developers response to all this is: not our problem and we will never change.”

        What is that based on?

        I can see their point though: comments are inherently public, Gravatar is optional, site owners are the ones who need to make privacy decisions for their users, no one forced those users to use their email, etc.

        In the end, anyone expecting to use their real email to make anonymous or pseudonymous comments online is going to get outed by something or someone eventually.

        Also, I can put anyone’s email into a comment form, so how can you prove *I* made that comment?

        • andreasnrb

          “What is that based on?”
          They have said so much in the user submitted request to change this feature and through other channels.
          Also the WordPress core developers, those with actual power, are all employed by the same company in one way or another that owns the Gravatar service. So them changing it would be going against their employers wishes. Basically WordPress is “owned” and controlled by Automattic and Aubrey Capital which employs the main developers.

          “I can see their point though: comments are inherently public, Gravatar is optional, site owners are the ones who need to make privacy decisions for their users, no one forced those users to use their email, etc.”
          The feature is enabled by default. Site owners aren’t informed when they install WordPress or enable a site on wordpress.com. There aint privacy policy either on wp.com. The site owners cant make a choice since they are not aware of the issue. Not even disabling by default and make it optin is something the developers are willing to implement. There is no message next to the option regarding these issues either.

          “Also, I can put anyone’s email into a comment form, so how can you prove *I* made that comment?”
          Doesn’t always need to matter. Where there is smoke there is fire and so forth.

          • Nick Ciske

            Clearly someone has an axe to grind with Automattic?

          • andreasnrb

            You asked, I answered. Sorry for explaining the reasoning. And please return with more belittling arguments they work so well.

          • Nick Ciske

            Wasn’t trying to be belittling, just noting there was some issues you have with “Automattic the company” mixed up in what I saw as a technical/meta discussion of the larger privacy issues. Sigh. So hard to communicate tone online…

  • R7 Rocket

    Solution, locate the physical address of everyone who is part of Box 11080
    100 61 Stockholm (Researchgruppen) and Expressen.

    Post such addresses online. Don’t worry, the rule of law is dead, you just need firepower…

    http://2.bp.blogspot.com/-dkVrYV2XjMA/UNHHqZmW8AI/AAAAAAAAAos/Z5OWHahY9zc/s1600/tommorow_by_alecsystem.jpg

    http://www.nucleardarkness.org/include/nucleardarkness//images/graph/buster_vs_romeo_zoom.jpg

TOP