By Christoph Rosenthal
The German journalist Lorenz Matzat sees great prospects in the future of data-driven journalism.
“Data are the new oil,” he said.
Huge amounts of data are waiting to be discovered. Matzat and his colleagues found that out when they turned a cyptic geolocation spreadsheet in an interactive map: Parliamentarian Malte Spitz (Green party) wanted to know what telecommunication data were stored about him. He sued his provider, Deutsche Telekom, for this information. In January 2011 he forwarded it to the weekly newspaper Die Zeit. Their journalists cooperated with freelance data experts to visualized the table to demonstrate the wealth of private information one can extract.
The data journalists told Christoph Rosenthal how they did this.
What story does the graphic tell?
In 2008, a German federal law instructed telecommunications providers to save records of all their customers for six months: data about phone calls, text messages and Internet access. These records contained none of the specific content of the messages and calls, but they did save metadata like the time, duration and numbers called. The German parliament had to implement this law to fit European standards by enabling investigators to fight crimes and terrorism. The police were supposed to not only be able to track a few suspects, but to screen every owner of a phone. The graphic demonstrates what data they were able to use.
A screenshot of the database provided by Deutsche Telekom
What sources did they use?
Deutsche Telekom had provided a cryptic spreadsheet of 35,830 records containing 30 bits of information per stored connection. At first glance these columns of figures look worthless. “Seen individually, the pieces of data are mostly inconsequential and harmless. But taken together, they provide what investigators call a profile – a clear picture of a person’s habits and preferences, and indeed, of his or her life,” wrote Kai Biermann, an editor at Zeit Online. To explain the data, the journalists had to combine them with other sources.
Every time Spitz’ cellphone connected to the Internet (as a smart phone, it did at least every 10 minutes), it was registered by the nearest antenna. Its coordinates and a so called “cell ID” were stored. The exact position of the poles could be found on an official public map.
To fill the tracked movements with content, the journalists combined many kinds of public information with the corresponding data sets: Tweets, Facebook posts and news releases on Spitz’ website.
What did they do with the data?
To analyse their data, the journalists first tried to get an overview: The Zeit editor asked a freelance data journalism team called OpenDataCity to visualize this. “It was not easy to visualize this kind of information,” said Michael Kreil, programmer at OpenDataCity. A static map would not tell the whole story: If they marked all the mentioned places in one map, the highly frequented spots like the German parliament would be clustered with hundreds of markers. An additional time slider was needed to show the tracked movements.
Kreil and his teammates could not find a template: “I knew we had to build the fitting tool on our own,” Kreil said. “The only good way for us to dig into the data was to create our own analysis tool specifically for this kind of data.”
Here’s what they did:
√ Prepared spreadsheets in Microsoft Excel and Google Fusion Tables.
√ Matched the geolocation data with the official map of cellphone antennas.
As an experienced coder, Kreil managed to finish the first version of the tool within one extended weekend. “At this time, the application was not planned to be published, but to enable the journalists themselves to analyze the overwhelming wealth of data,” he said. The visualization showed patterns that were not obvious in the spreadsheet: The speed of the tracked movement gave hints about whether Malte Spitz was traveling by foot, by train or by plane – interesting information about a Green politician. The graphic showed his favorite places and regularities in his appointment calendar.
How did they present the results?
Initially the graphic was only a tool for the journalists. A map of one characteristic day in the politician’s life was supposed to illustrate the article. But when they saw how usable and interesting the tool was, they decided to give users access to the entire database. They could browse though the data collected from August 2009 to February 2010. Malte Spitz gave his permission to do that. “In my view that’s what data journalism is about: giving the readers/users an environment to do their own research, follow their own interests and finally make up their own minds on an issue,” said Matzat, who conceptualized and designed the graphic. To make the user’s own investigations even more convenient, they added the raw data as a Google spreadsheet as well. In Kreil’s opinion, this transparency “is a very powerful idea for the future of journalism.”
The time slider allows users to sort the data by date
How long did they work on the project?
Matzat and Kreil estimated that they spent 80 to 100 hours on the project. They started at the end of January 2011 and finished it on Feb. 20. They published a German edition first, and followed with an English translation 20 days later. The translated version attracted international attention.
What can be learned?
The journalists credit the project’s success to the teamwork of a group of specialists. That teamwork opened up different perspectives on the task. Biermann, of Zeit Online and freelancer Lorenz Matzat of OpenDataCity presented the results in a classic journalistic article and designed the graphic. Coder Michael Kreil focused on the technology. Sascha Venohr, developing editor of Zeit Online, and Tibor Bogun, head of the design department, contributed their expertise.
Was the project successful?
Directly after its publication, the application created a vivid discussion in social media as well as print and broadcast media.
In June, the team received two well-respected awards: the Lead Award in Gold as Germany’s Webmagazine of the Year and a Grimme Online Award. The Grimme jury said, “Data journalism is still an underdeveloped genre in Germany. Zeit Online has made a first and important contribution to its cultivation. This developing journalistic field has its very own home in the web.”
This post is published under Creative Commons BY-NC-SA 3.0 License; some quotes are translated from an interview on Philip Banse’s Medienradio.org. Photo of the Grimme Online Award by Elke Wetzig. Photo of the German Bundestag by Christoph Rosenthal.