{"id":482,"date":"2011-07-26T07:54:38","date_gmt":"2011-07-26T14:54:38","guid":{"rendered":"http:\/\/www.nadynerichmond.com\/blog\/?p=482"},"modified":"2011-07-25T15:51:05","modified_gmt":"2011-07-25T22:51:05","slug":"stop-the-data-abuse","status":"publish","type":"post","link":"https:\/\/www.nadynerichmond.com\/blog\/2011\/07\/26\/stop-the-data-abuse\/","title":{"rendered":"stop the data abuse!"},"content":{"rendered":"<p>As a researcher, I always get annoyed when I see the wanton abuse of data. \u00a0Mashable&#8217;s article <a href=\"http:\/\/mashable.com\/2011\/07\/14\/google-plus-male\/\">Google+ Users Are Nearly All Male<\/a> is a great example of data abuse.<\/p>\n<p>The data abuse starts by reporting data without noting anything about the methodology behind it. \u00a0Data is meaningless if you don&#8217;t know how it was collected. \u00a0For this article, they report on two different websites which claim to have analyzed Google+ user profiles. \u00a0Neither of these websites say that they&#8217;ve looked at all of the profiles, and neither of them note what method they used to sample the profiles that they did analyze.<\/p>\n<p>The data abuse continues by ignoring the major differences in the data that is returned by the two sites. \u00a0One says that 86.8% of sampled profiles are male, the other says that 73.7% are male. \u00a0What explains a delta of more than 10 points? \u00a0I can come up with possibilities, but I don&#8217;t know if any of my potential explanations are correct. \u00a0In the case of any of the possibilities, it would tell us a very different story. \u00a0For example, if the difference is one of time (that is, one set of data was collected earlier than the other), then we&#8217;d learn something about the early-adopter curve. \u00a0If the difference is one of sampling method, then we might learn about the relative strengths of each of those sampling methods for this type of dataset.<\/p>\n<p>What really bothers me about this breathless repeating of such statistics is that there is no attempt at analysis. \u00a0If we accept that the current Google+ users skew male, is this any different than the usual early-adopter curve? Or the early-adopter curve for social media? Or the early-adopter curve for new Google applications? \u00a0Data without analysis is meaningless. \u00a0Reporting on the data suggests that we should care, that there is something different here. \u00a0But it appears that no-one has bothered to answer such basic questions about the data.<\/p>\n<p>We can do better than this. \u00a0Let&#8217;s stop the blind reporting of data, and instead expend some effort on analyzing the data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As a researcher, I always get annoyed when I see the wanton abuse of data. \u00a0Mashable&#8217;s article Google+ Users Are Nearly All Male is a great example of data abuse. The data abuse starts by reporting data without noting anything about the methodology behind it. \u00a0Data is meaningless if you don&#8217;t know how it was &hellip; <a href=\"https:\/\/www.nadynerichmond.com\/blog\/2011\/07\/26\/stop-the-data-abuse\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">stop the data abuse!<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-482","post","type-post","status-publish","format-standard","hentry","category-nadyne"],"_links":{"self":[{"href":"https:\/\/www.nadynerichmond.com\/blog\/wp-json\/wp\/v2\/posts\/482","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.nadynerichmond.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.nadynerichmond.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.nadynerichmond.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.nadynerichmond.com\/blog\/wp-json\/wp\/v2\/comments?post=482"}],"version-history":[{"count":3,"href":"https:\/\/www.nadynerichmond.com\/blog\/wp-json\/wp\/v2\/posts\/482\/revisions"}],"predecessor-version":[{"id":494,"href":"https:\/\/www.nadynerichmond.com\/blog\/wp-json\/wp\/v2\/posts\/482\/revisions\/494"}],"wp:attachment":[{"href":"https:\/\/www.nadynerichmond.com\/blog\/wp-json\/wp\/v2\/media?parent=482"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.nadynerichmond.com\/blog\/wp-json\/wp\/v2\/categories?post=482"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.nadynerichmond.com\/blog\/wp-json\/wp\/v2\/tags?post=482"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}