Digital Scholarship@Leiden

Introduction to Python: (programming) lessons learnt

Introduction to Python: (programming) lessons learnt

The second Python programming course from CDS has again become fully booked. The course covers the basics of Python in a supported hands-on environment. Ben explains what's covered in this popular course and how to find out about future courses.

In October and November 2019 the Leiden University Libraries’ Centre for Digital Scholarship organised for the first time a three-part workshop: introduction to programming with Python. Although CDS staff have used Python in workshops before, those workshops were aimed at a specific audience or did not discuss programming in depth. This was the first time the CDS organised a course with programming at its core and even though it was not promoted heavily, we easily reached the maximum capacity of 20 participants.

We based the materials on those used by Peter Verhaar in the MA programme Book and Digital Media Studies. Peter and I, both digital scholarship librarians and software engineers, taught the enthusiastic group consisting of research and teaching staff.

Why a Python course?

Research, regardless of discipline, is increasingly helped by computers and programming is the method to make computers do what we want. Some academics even argue that every PhD student should learn to program, because it expands their analysis toolbox beyond the applications they happen to stumble upon or are told to use. Moreover, understanding the basics of software internals should help to understand how common tools work and what their limitations are.

During conversations with researchers, we learned that many of them are interested in learning how to program, but that they don’t know where to start. As supporting the use of digital data is one of the core areas of expertise of the Centre for Digital Scholarship, we hoped that by setting up this course we could:

  • check that there was indeed interest to learn programming;
  • introduce a first group of researchers to working with the Python programming language; and
  • learn what researchers would want to know – what topics, how much depth – about programming.

Python is widely used to teach programming and to create software, both inside and outside academia. When you understand the basics, you can write and run Python programs pretty quickly. And there are many packages available for lots of types of problems, so that you often don’t have to start from scratch.

What was covered in the course?

There were three sessions of three hours each, on three consecutive Friday mornings. We tried to cover the basics of programming, or what many people would consider data science nowadays:

  • Python language basics (variable, data types, flow control, functions);
  • getting data through APIs and Web scraping;
  • working with tabular data with Pandas, and visualisation with Matplotlib.

We did not cover machine learning, a major topic in data science, as it first requires a good understanding of working with data.

20200225 python classroom
Peter Verhaar teaches regular expressions in the Introduction to Programming with Python classroom.

We used Carpentries-style lessons consisting of learning through hands-on experience. We explained code snippets, running them live as well, and provided some exercises for participants to play with. When one person was in front of the room, the other acted as a helper, going around to help with small issues.

Responses from participants

Our participants came from all over the University: PhD students, assistant professors and support staff, from the Humanities, Social Sciences, Governance and Global Affairs, and Science faculties, as well as colleagues from the Centre for Innovation, Library and Administration, and Central Services. Some people mentioned they hoped to collect data through APIs or Web scraping after the course, or use Python for data analysis outside of spreadsheet applications, though others would like to understand enough of the language to allow them to more comfortably discuss possibilities of software applications with the software developers who do the programming.

It was clear from the outset that the people who signed up for the course ultimately aimed to apply what they had learned in their research or in their teaching. The fact that most participants also teach may have helped, but it was a joy to teach this group.

Although we did not do a full survey among our participants yet, they were happy to provide constructive feedback in a short discussion at the end of the last session. Together with our own notes about potential improvements of the course, the feedback has already been taken into account during the design of the next course.

Some participants found that we tried to cover quite a lot of topics in a short time, leaving little time for practicing during the sessions. One week between sessions was enough for some people, while others would have liked some more time to do exercises for themselves outside the classroom.

We learnt along the way how our teaching materials and methods could be improved. In the first two sessions, we used a combination of PowerPoint slides with theory and examples, a code editor and command line for live coding demonstrations and a PDF file with exercises. All materials were available on the course website for participants to use and follow along, but at times it was quite a hassle to copy example code from the PDF into the code editor, while also following the explanations in the presentation.

Jupyter notebooks were helpful in teaching, as explanations and code can be shared together in a notebook and the Jupyter environment also runs your code. That meant we – teachers and students – no longer needed to switch between slides, code and exercises.

While the course was already quite ambitious from the outset, we also learnt that there is always more to teach. Some people indicated they would like to attend follow-up courses, even though we don’t have plans for these as yet. Some topics probably should have been discussed in more depth. During the class on data acquisition, for example, it became clear that a number of participants were unfamiliar with data formats such as HTML, XML or JSON. A large portion of today's Web content has been structured according to these formats. We obviously should have given a brief explanation of HTML before demonstrating how to Web-scrape data from it.

Another suggestion was to organise separate classes on working with textual data and on tabular (“numbers”) data. This is something we will consider for the future.

How can people find out about the next course?

Since first advertising this Python course we received various questions about when there would be another, from students, academic and non-academic staff. Clearly there is plenty more interest in programming.

For academic staff, we’ve planned the next course on March 6, March 20 and April 3, 2020. Only a couple of weeks after it was announced it became fully booked with a waiting list so we know we need to plan more courses in future.

If you’re interested in registering for a free place, please watch out for announcements on our @CDSLeiden Twitter feed, or stay tuned for further opportunities via the CDS calendar.