The Dos and Don’ts of Communicating with Data: The double y-axis

Samantha Tyner

Graph of the correlation between computer science doctorate and Arcade revenue.

This post is a version of another posted on the author’s personal blog.

On a previous blog post I explain in detail how to reconstruct a graph to make clear the information it aims to convey. This post expands on that post by providing further insights into why a double y-axis can be a misleading form of data visualization.

Don’t: the double y-axis

The double y-axis is a common data visualization faux pas. This visual monster frequently rears its ugly head when the author is trying to make a statement about the relationship between changes in two variables over time. At the top of this page, for example, there is a graph showing that arcade revenue and the number of computer science PhDs awarded are highly correlated. There is, however, no evidence that arcade revenue and computer science degrees are related. This graphic was made to warn people of the dangers of these types of “spurious correlations.”

"Total revenue generated by arcades correlates with computer science doctorates awarded in US"
Data sources: U.S. Census Bureau and National Science Foundation | Image source: https://www.tylervigen.com/spurious-correlations

When seeing the data in the same visual space, however, the brain immediately begins making connections. We can come up with all sorts of hypothetical justifications that arcade revenue and computer science PhDs could be related. After all, if we picture the stereotypical arcade game player and doctor of computer science, they will probably look pretty similar: nerdy, male, enthusiastic about something very few people can relate to, wears glasses because his eyes are strained from looking at screens, etc. This is the danger of the double y-axis: our minds are so powerful that they can create connections and relationships where none exist. The double y-axis just adds fuel to the fire.

A typical example

An example of the double y-axis “in the wild” can be found in this tweet from the National Weather Service of Des Moines, Iowa. The plot shows daily snowfall on the left for the 2018-19 winter season, where the scale ranges from 0-6 inches. On the right, the cumulative snowfall for that season is shown, ranging from 0-45 inches. The resulting combined plot of daily snowfall and cumulative snowfall makes the daily values seem far more extreme than they actually are. At first glance, the viewer can easily interpret the graph to mean that there are daily snowfall values of over 30 inches! Although daily snowfall this extreme is not impossible in the U.S., the maximum daily snowfall shown barely exceeds 5 inches.

Do: save your reader’s time

A revised version of the weather service’s plot is shown here. There are several reasons this plot represents the data better:

Units take up the same amount of space: one inch of daily snowfall takes up the same visual space as one inch of cumulative snowfall
Data of the same scale are grouped together: the top panel shows the cumulative snowfall for the 2018-2019 season and the average cumulative snowfall from 1981-2010, while the bottom panel shows the corresponding daily measurements.
Color corresponds to shared groups: where the first graph has three colors, the revised plot only needs two colors, dark blue for the 2018-19 season and light blue for the 30-year average

As a result, the revised plot requires less mental effort from the viewer: there are fewer axes and colors to process, and there is no deciphering which line goes with which axis. The goal of data visualization is to communicate quickly and effectively with your reader. Do not make them work harder than absolutely necessary; be gone dual y-axis!

If you learned from this post and what to know how to reconstruct these graphs, see the original post on my blog.

Author: Samantha Tyner, PhD
Dr.Tyner is a 2019-20 STPF Fellow with the Office of Survey Methods Research at the Bureau of Labor Statistics. She is an applied statistician with interests in data science, data visualization, forensic science, machine learning, text mining, and network analysis. You can follow her on Twitter at @sctyner.

Disclaimer

This blog does not necessarily reflect the views of AAAS, its Council, Board of Directors, officers, or members. AAAS is not responsible for the accuracy of this material. AAAS has made this material available as a public service, but this does not constitute endorsement by the association.

Share this article

Authors

Samantha Tyner

Tyner, Samantha: Fellowship 2019-2020

Sam Tyner-Monroe is a director in Manatt’s Digital and Technology group in the Firm’s Washington, D.C. office. With a deep background in data science, responsible artificial intelligence (AI), data visualization, advanced statistical modeling and machine learning, Sam guides clients through the complex business, legal and ethical challenges emerging from the rapid adoption of related technologies. She provides AI testing and evaluation for companies of all sizes that need to navigate the potential for inaccuracy, bias, misuse and other emerging risks associated with AI deployment.

Bringing a unique background as a data scientist and applied statistician within a legal and business context, Sam partners with the firm’s broader AI and industry teams to help clients evaluate AI outputs for performance, accuracy, bias and misuse; identify high-risk tools and systems requiring enhanced human oversight; and conduct adversarial testing to assess vulnerabilities. She also supports clients’ product development and IT teams in developing AI validation and compliance strategies, while ensuring adherence to consumer protection laws, data privacy frameworks and industry best practices through rigorous impact assessments, transparency measures, and qualitative and quantitative testing. Additionally, Sam works on the ethical design, development and implementation of AI and automated decision-making systems, ensuring they are transparent, accountable and aligned with legal and regulatory standards.

Prior to joining Manatt, Sam advised clients on responsible AI implementation as a managing director at one of the largest law firms in the world. She previously served as a data scientist at a legal technology company and as an AAAS Science & Technology Policy Fellow at the Bureau of Labor Statistics. She has also held research and teaching appointments at Iowa State University and contributed to data-driven public policy through the Data Science for Public Good program.

She earned her Ph.D. in Statistics from Iowa State University in 2017. She also holds a M.S. in Statistics from the same institution and a B.A. from Augustana College in Rock Island, IL, where she majored in Mathematics, Economics, and French. She is originally from the Chicago suburbs and is a die hard Chicago Cubs fan.

Sci on the Fly

The Dos and Don’ts of Communicating with Data: The double y-axis

Don’t: the double y-axis

A typical example

Do: save your reader’s time

Disclaimer

Tags

Share this article

Authors

Samantha Tyner

Related Articles

Do No Harm: An Ethical Data Life Cycle

It’s Time for Data Ethics Conversations at Your Dinner Table

Federal Data Sharing

Stay Connected