In his book, Dataclysm, Christian Rudder leverages his unique capabilities and access to data, as a founder of the popular dating site, OKCupid, to illustrate the types of trends and information that can be gleaned from large datasets. He advocates for large datasets to be made public, so that anyone can analyze and utilize the datasets to uncover trends and illustrations that can help humanity as a whole.

In wake of the often hyperbolic media coverage of big data and privacy concerns, Rudder adopts a different, and in my opinion, more realistic view. He admits the possibility of danger for individuals’ privacy if data is used incorrectly, but he also points out that if certain measures are followed, individuals will not be identified. What is the harm in harnessing big data to observe overall trends? This could eventually translate into huge medical breakthroughs.

One of my favorite sections of the book is the comparison of messages sent to potential dates. I never knew, and probably never would have known, that dating websites, OkCupid specifically, track the way users compose messages. The website tracks whether users delete and rewrite vs. copy and paste, spend hours composing the message or casually write it in 30 seconds, as illustrated below. He also correlates this data with who they contact and how those messages are received.

This also led to the discussion on how men find women attractive. Men find women significantly younger than themselves more attractive, while women find men roughly their own age the most attractive. Men find women ages 20, 21, 22 and 23 the most attractive overall.

3rd Image

Although the techniques used and findings presented were fascinating, I believe that there is an issue with the way the data and analysis was presented. Almost all of the data analyzed was pulled from dating websites and, in the book, it is assumed that this is a good representation of singles in the U.S. There are a lot of singles who use online dating. But, there are also a lot of singles who do not use online dating. As reported by PEW research centers, only 22% of singles ages 25 – 34 use online dating websites.  And, additionally, that is the age range in which online dating is most popular (Smith)!

4th image

Clearly, a large segment, over 80%, of the American population is missing from this sample. While we do not know for sure that there are differences, but the key is that there could be differences. Thus, the trends identified in the book are very insightful for the online dating population, but should not be extrapolated beyond that population.

In a similar vein, one of the primary premises of the book is that data shows people’s actions regardless of what they say they would do. Indeed, users’ claims often do not match up to what they do on the dating site. However, multiple studies have illustrated that individuals behave differently online than they do in real life. Chris Larson, an interviewee, says that he is “much more likely to flirt with people, much more likely to incite disagreements on political topics and much more likely to sort of take the extremes of conversation further [online] than [he] would in person” (Statesman). Multiple other interviewees in the article echoed similar sentiments.

I think that this phenomenon begs the question, should we define online as “real life” or in person as “real life?” I may be in the minority, but I still consider in person as the real deal. If someone will not do something in real life, does it really matter if they will rate someone or message someone on a dating website differently? This is a fascinating application of statistical techniques, but does the data represented really help us to understand dating behavior? It can explain a simulated, controlled, online environment, but all bets are off once it moves past that point. The data relayed could be helpful to someone desperate to attract more online interest, but they still have to conquer the real world date.

Overall, I thought that this was a wonderful book and really opens up possibilities to what can be done with a massive dataset. Despite some of the shortcomings discussed, I think the book is an interesting and informative read and even gives some analytical insight on some serious societal problems.

By: Sarah Wright

Professor: Vanja Djuric

Class: Cross-Media Database Marketing



Works Cited

Bertrand, Marianne. “Racial Bias in Hiring.” University of Chicago, Apr. 2003. Web. 04 Aug. 2015.

“Do People Really Behave Differently Online? It Depends on the person.” Statesman. Statesman, 23 Sept. 2011. Web. 28 July 2015.

Preibusch, Soren. “Privacy Behaviors After Snowden.” Privacy Behaviors After Snowden. Communications of the ACM, May 2015. Web. 28 July 2015.

Smith, Aaron, and Monica Anderson. “5 Facts about Online Dating.” Pew Research Center RSS. Pew Research Centers, 20 Apr. 2015. Web. 28 July 2015.

Wilcox, Keith and Stephen, Andrew T., Are Close Friends the Enemy? Online Social Networks, Self-Esteem, and Self-Control (September 22, 2012). Journal of Consumer Research, Forthcoming; Columbia Business School Research Paper No. 12-57.