In Part I of this series, we established that vetting all research we editors might publish ought to be required practice. The next step is “how.” What questions might we ask of the researchers and what answers might we expect from them? The questions recommended in Part II apply for both research conducted by outside organizations as well as research you publish conducted by your own company or publication.

Numerous areas are ripe for journalistic concern, such as the possible conflict of interest of the sponsoring organization and whether the research questions are biased or not. These are qualitative considerations. The quantitative consideration to examine is the statistical measures, which is the focus of this article.

Among the basic statistical measures is to determine whether you can access the population you seek to study, the size of the subsequent random sample, and the response rate. The editor and writer must ask the sponsoring organization and the researcher about the six items below. The implications of these items ought to be explained in any story about results of research.

Moreover, a strong case can be made to NOT publish information about any research that does NOT meet minimal requirements. As stated in Part I of this article, the publishing of poorly researched information might lead to poor policy, poor decision-making and perhaps even death, in the case of medical research.

Let’s take a closer look at six fundamental measures:

  1. Population size. How many people are available from which to choose your random sample? If you are choosing a population of editors, for example, are you able to truly choose from all of them? Do you know how many editors there are? If you are emailing a survey, do you have the email addresses of every editor? If not, is the difference statistically significant?1
  2. Random sample. Is each individual or element in a population chosen entirely by chance, with an equal chance of being included? Statistically, a resulting sample size doesn’t change much for populations larger than 20,000.2
  3. Margin of error. The margin of error is the amount of error that you can tolerate. The more you tolerate, the poorer your data. Five percent is the commonly accepted choice.3
  4. Confidence level. This is the probability that you can project survey findings onto the entire universe (population). The lower the probability you accept, the poorer your data. Typical choices are 90%, 95%, and 99%.4
    Survey responses needed for 
95% confidence with ±5% margin of error
    (Numbers below assume all responses are “good,” no errors, worthy of inclusion in the study.)
    Population size  Responses needed
    10 10
    100 80
    200 132
    300 169
    400 197
    500 218
    600 235
    700 249
    800 260
    900 270
    1,000 278
    2,000 323
    5,000 357
    10,000 370
    20,000 377
    100,000 383
    1,000,000 384
    Above sample sizes derived from an online sample size calculator at http://raosoft.com/samplesize.html

    Caveat. If you “mail a survey invite to your house list; post a link to your website, Twitter, or Facebook; rent an online access panel — that’s a convenience sample. … Convenience samples do not produce representative results. If you need to extrapolate to the target population, convenience samples aren’t going to get you there,” indicates a blog post on blog.verint.com.5)

    A blog post by Vovici (now Verint Systems) says: “If you are posting a link to your survey on blogs and Twitter feeds, it will not be representative of any target population, and no number of responses is going to make it so. The information will be interesting from a qualitative standpoint, but — since it is not a random sample — it is not quantitative. In this case, whether you get one hundred, one thousand, or one million responses doesn’t matter. The information is interesting to talk about and might be fine for illustrative purposes for a blog post or a webinar, but the findings will not be useful for decision making.”6

  5. P (probability) value. This measure tests the validity of a claim that is made about a population; the probability of obtaining the observed sample results. Claassen says: “A P-value of .05 or less, meaning there are only 5 or fewer chances in 100 (or a 5% or less probability) that the result could have happened by chance, is regarded as low, and thus statistically significant. The higher the value, the more likely the result is due to chance, and thus not reliable.”7
  6. Correlation and causation. A common error of statistical interpretation is using the word “correlation” to mean “causation” or saying that correlation proves causation. This is “a logical fallacy by which two events that occur together are claimed to have a cause-and-effect relationship. The fallacy is also known as cum hoc ergo propter hoc (Latin for “with this, therefore because of this”) and false cause. By contrast, the fallacy post hoc ergo propter hoc requires that one event occur before the other and so may be considered a type of cum hoc fallacy.8 In the real world, moreover, I would say rare is the case in which no variables intervene between the two correlating variables.

What the editor should do

  • Find out who sponsored the research? Is there a conflict of interest?
  • Review the survey questions to help you determine bias
  • Determine if there is a random sample from a knowable population. Is it adequate?
  • Find out if confidence level, margin of error, and P value is adequate?
  • Ask a 3rd party, independent statistician/researcher to review methodology.
  • Ask about any correlation analysis and understand its weakness.
  • Discuss these issues with the researcher.
  • Include as much info about the methodology in your article to give reader minimum way to judge validity. Define the terms you use in your description of the methodology.
  • Follow ASBPE guidelines.
  • Think carefully about publishing articles about research with poor methodology.
  • Don’t misinform your readers. For example, if you decide to publish an article about research that used a convenience sample, explain to your readers what a convenience sample is and why you are publishing the research despite it not being representative to the population you are studying.
  • Don’t sin.

References

1–4) http://ww.raosoft.com/samplesize.html
5) http://blog.verint.com/convenience-samples-pros-and-cons
6) http://blog.vovici.com/Blog/bid/18119/
Recommended-Sample-Size-for-Accurate-Surveys/
7) Ethics in Science Journalism, By George Claassen, University of Stellenbosch, www.pdfio.net/k-61498623.html
8) http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Correlation_does_not_imply_causation.html

About the Author: Robin Sherman is a consultant specializing in editorial development and publication design as well as a freelance editor and layout artist in the business-to-business and nonprofit publishing markets. He is a long-time member of the ethics and research committees of the American Society of Business Publication Editors, and is a former corporate director of editorial development for a large business-to-business publisher.  www.linkedin.com/in/robinshermaneditdesign