Coding is not an absolute necessity to work in Data Journalism. But it can open up a whole new world of possibilities and enable you to do projects you otherwise wouldn’t be able to pull off. This chapter explains the benefits of coding for data journalists and introduces you to the most popular programming languages R and Python.
Why Journalists Should Learn to Code:
The simple answer: It will hugely expand the horizon of what you are able to do. In Data Journalism, but also when you are doing investigative work on the internet. Two aspects of the data journalism workflow will benefit especially, when you can approach them with coding skills: The fist is data collection. Programming languages such as R or Python make it possible to systematically collect a lot of information from websites or social media platforms. Collecting the same amount of information by hand would be extremely time consuming and prone to a lot of errors.
The second aspect that will hugely benefit from your coding skills is data analysis and cleaning. If you are processing large amounts of data, programs like Excel can struggle. They might also become unnecessarily-complicated when you are working with several datasets at once.
Coding will enable you to collect more data faster, organize it easier and analyze it with more sophisticated methods. You can also benefit from programs that update charts on your website automatically whenever new data becomes available. This was a practice that many newsrooms around the world applied during the Covid19 pandemic to cope with the influx of new data on a daily basis.
What is a good coding Language for Data Journalists?
Which programming language to learn is a question that every data journalist might answer differently. However, most of them will probably recommend R or Python. They are the most popular languages for data analytics. Most newsrooms that have a data team use at least one of them for their work.
Should Data Journalists learn R or Python?
The short answer is: you will most likely be fine with either of them. They do not differ enough, that one can clearly be considered superior over the other. This short introduction into both languages and their respective strengths and disadvantages can help you decide, which one is the right choice for you:
Advantages and Functionalities of R:
R was developed in the 90s by statistical researchers as a tool for data analytics and statistical modeling. Most of its user base still consists of statisticians and academic researchers and its working methods are based on the logics of statistical analysis and mathematics.
From the beginning R was set up as an open project that actively encouraged people to participate. The program R itself and the most common environment it is run in – R Studio – are free for everybody. You can just download them on the website of the R Project.
This approach of community engagement has paid off and people are constantly contributing to the expansion of R functionalities. That contribution mainly consists in the building and publishing of so-called “R packages”. You can imagine packages as toolboxes of a programming language that come with a set of tools (aka. functions) designed to carry out specific tasks. Usually they are also accompanied by a manual (so called package documentation) on how to use the toolbox. This huge number of packages is one of the biggest advantages in R because it offers specifically tailored programming solutions to a wide variety of data and topics.
Advantages of R for data journalism:
- lots of packages to tackle specific problems
- cloud based version of RStudio, so multiple people can work on the same project
- packages for sophisticated visualizations that can be directly exported
- big and active online community
Disadvantages of R for data journalism:
- hard to learn, especially for those without any prior experience in programming or statistics
- too many packages to choose from and hard to determine which one is the best
- limited to statistics and data analysis in its applicability (although this is beginning to change)
Advantages and Functionalities of Python for Data Journalism:
While R specializes in statistics, Python can be used for a lot of different purposes. Python doesn’t have as many packages designed specifically for problem solving within data analytics, but in exchange you can also use it to build computer games or machine learning algorithms. In its syntax and functionalities it is closer to other programming languages such as C++ or Java. That is also the reason why interacting with these languages works better in Python.
Due to its bigger applicability it has a bigger community and it is especially popular among AI developers. This wider applicability comes with the disadvantage of several so-called environments in which you can run Python (we recommend Spyder for beginners) and different so-called “distributions” that are characterized by varying preinstalled packages.
Advantages of Python for Data Journalism:
- wider applicability beyond data analysis (machine learning, applications)
- easier to learn for beginners
- works better with other languages
- popular for Algorithms, AI and machine learning
Disadvantages of Python for Data Journalism:
- less specialized packages
- less sophisticated visualizations
- hard to figure out which environment and which packages to start with, number of options can be confusing for beginners
As you can see, it depends on personal preferences and task specific requirements, which language is the best choice for you. From personal experience and general knowledge we can say: Compared to R, Python seems to be easier to learn for those who have no previous experience in programming or statistics. If you are a statistic geek anyway, you might be happier with R. To learn more about the two languages and their advantages and disadvantages, these articles by Datacamp and datajournalism.com provide more details.
Learn Coding for Data Journalism:
Did you know that most data journalists are self-taught? Whilst there are academic programs and classes you could pay for, there are also many options to learn R or Python without paying a lot of money. Here is only a small overview:
Codecademy and Data Camp are online learning platforms that both offer introductional courses for R. If you are more of a bookish kind of person and have already learned some R-basics “R for data science” is a great book to expand your knowledge. It offers an introduction into the “Tidyverse” – the package-combination most often used by data analysts.
For more resources or video-tutorials on specific programs, you can also head over to Youtube.
As soons as you have learnt the basics of either language it is mostly learning by doing. When you are doing actual projects you will run into problems all the time and by looking for solutions to these hick-ups you will expand your knowledge constantly.