Delivering data management training remotely

May 07, 2020 • 14 min read

The Centre for Digital Scholarship normally delivers training in face-to-face sessions on campus but in April we delivered our first remote training session and addressed some of the new challenges that researchers face whilst working from home.

Varieties of data management training

The University Libraries’ Centre for Digital Scholarship (CDS) delivers training on data management planning in both generic training sessions open to all, and in faculty-specific training sessions to groups of PhD candidates in the first six months of their research project.

In both types of data management training, we introduce the concepts of good data management, discuss and demonstrate the challenges a researcher will have whilst managing their data before, during and after, their research, and show how the solutions they decide upon for managing their data can be written into a data management plan.

What makes the faculty-specific training different is that it is designed and delivered in collaboration with representatives of an institute: often one or more senior researchers. It is tailored to the relevant processes and challenges for the discipline and respects the protocols that have been agreed in the institute. The training is usually delivered in a faculty space across two or three sessions during a period of six to eight weeks.

Psychology-specific training

The Institute of Psychology were one of the first institutes at Leiden to request this type of training, and CDS staff have been working with Sander Nieuwenhuis, Professor of Cognitive Neuroscience of Decision Making, to deliver two courses per year since 2016. The data management protocol of the Institute of Psychology, the requires the following:

submission of a data management plan early on in the research project;
Mandatory training for PhD candidates;
Archiving of a publication package in DataverseNL for each published article.

The training is designed to promote interaction and exchange of best practices and consists of two sessions. In the first session, we introduce data management planning – what it is, and why it’s needed, and participants do hands-on exercises on the reproducibility of research, anonymizing datasets, and drawing a data life cycle indicating the steps and challenges associated with their own project. A brief presentation of the Leiden Institute of Psychology's data management plan template prepares the participants to write their own plan and discuss it with their supervisor before submitting it to us for review approximately six weeks later. During the second session, each participant briefly presents their research and their plan, and comments on any challenges they encountered. We then give some feedback during the session (followed by a written commentary afterwards) and encourage discussion of any interesting features of the plan.

Design of remote training session

We were preparing to deliver the Psychology training in April 2020 for the eighth time so we had been confident the content of the course was relevant to the participants’ needs and worked well within the time constraints. Then from March 16th, the university and all schools were sent home, and it was unclear for a couple of weeks what that would mean for researchers and for staff in CDS: what we could do at home, how much time we would have for work, and how we could continue activities that normally involved in-person meetings.

We discussed with Sander what to do with the first slot of the next training programme scheduled for April 3rd, 2020 for fourteen registered candidates. We decided that because the institute’s protocol had made our course mandatory, and because PhD candidates were facing new data management challenges as a result of working from home, it was important to try to deliver some kind of training to them. The question was how?

“Proberen is leren” as we say in Dutch..

The university’s website for remote teaching suggested Kaltura LiveRoom as a virtual teaching environment. A colleague was using it and recommended it, and luckily, the Leiden University Centre for Innovation and ICLON were due to host a live webinar to demonstrate how it works. It looked good for what we needed so that gave us a place to ‘meet’ with the participants.

We next had to think about the agenda. We could stick with the time-slot that had been scheduled but we knew that we could not deliver our usual four-hour course online in the same way as we would in person: It would not be possible for everyone to maintain the same levels of concentration or to interact in the same way. We reviewed the original content and decided to remove some less-essential parts, reduce the number of exercises, add additional breaks to the session, and include some time for people to work alone and then come back with questions.

The new agenda therefore looked as follows, starting at 10:00 and going on until 13:45 (with a long break for lunch):

Introduction	Welcome and explanation of the virtual ‘classroom’. Get-to-know everyone. Research data methods, types and stages and what this means for data management. Refer to COVID-19 data management challenges. What is FAIR data?
Data underlying publications	Exercise
Break around 11:00 – 11:15
Research data management at the faculty of Social and Behavioural Sciences	Introduction to the faculty’s data archiving guidelines and Dataverse.
Open Science	How does good data management contribute to open science? Open Science Community Leiden.
Break around 12:00 – 13:00	Read through the Data Management Template and prepare questions
GDPR and research data	Dealing with personal data, ethical and privacy issues.
Data management planning	How to write your data management plan.
Closing and assignment	Preparation for the next session on May 15^th2020.

How we delivered the session

Part 1: the introduction

The university had just disseminated a "Code of Conduct" for online class participation which we’d passed onto the participants before the day of the session. We used a few Powerpoint slides to indicate what to do when they arrived in the virtual classroom, how to use the tools to ‘raise’ their hand if they wanted to speak, ask questions via the chat box or send questions to us privately, and how to leave the room and re-join during breaks and at the end of the session.

Because we wanted to limit potential complications from everyone speaking to introduce themselves, we decided to try out PresentersWall which lets people respond to questions and then presents a summary of responses. This way, we learned, for instance, that very few people had been in a Kaltura LiveRoom before, the type of data they collect (survey and interview data; MRI and neural data, and biosamples, for instance) and that they all deal with personal data in one way or another.

We asked the following questions to get to know everyone in the group:

How are you feeling in one word?
Have you been in a Kaltura LiveRoom before?
What types of data will you be collecting or creating? (list of options)
What other type of data are you collecting? Open text box.
Will you use data previously collected?
Does your project already have a Data Management Plan?
Will you collect personal data?
Do you have any question that you would like an answer to today?

We then invited those with specific questions to explain their question a little bit more and we let them know when and how we’d address that point. Then we presented a new set of slides to introduce and explain data management in general and the concept of FAIR data. To make this more concrete and timely we decided to start with an example of how COVID-19 data can be properly managed according to the guidelines developed by the Virus Outbreak Data Network (VODAN) in which the CDS is involved.

Part 2: the exercise.

For our first exercise we distributed two recently published Psychology papers to participants, and asked them to try to find the data related to the findings presented in the paper, and whether they thought the data were suitable for reuse. Normally we get them to work in pairs and present a summary of their findings to the group, and we use more than two sample papers. Next time, when we’re more confident, we might use the breakout-room facility in Kaltura for this specific exercise.

On announcing the re-start of the class after 20 minutes, it took a little encouragement to get the discussion started, but once it had, there were some valuable comments and contributions.

Part 3: local protocols and open science

After a 15-minute break we re-convened. Sander presented the web pages of the faculty, where information about the data management protocol can be found, and gave a live demonstration of Dataverse.NL using screen-share. Luckily, this worked beautifully, though Sander reported feeling a little anxious wondering if everyone was following him back in the room.

We then presented some slides on behalf of one of the founding members of the Open Science Community Leiden (OSCL), Anna van ‘t Weer, who normally drops into the training session in person.

Part 4: To do during lunch

In preparation for the final part of the training session, we asked all the participants to do a little work during the lunch break. At the start of the session we’d created a shared cloud folder using SURFdrive for the Institute of Psychology's data management plan template and an accompanying document containing tips for completing the plan. We asked everyone to take a look during lunch and prepare any questions they had about writing their plan.

The participants, however, weren’t the only ones to work during the lunch break. We quickly produced some new slides to address some of the questions raised at the beginning of the session. Luckily we were able to re-use and adapt slides from other training sessions.

Part 5: Preparation for writing a data management plan and closing the session

After a long lunch break (which we’re glad we scheduled!) everyone promptly re-joined us at 1pm for the final part of the training session.

Because a large majority of researchers in the Faculty of Social and Behavioural Sciences deal with personal or sensitive data, we first introduced the European GDPR and how to deal with personal data with regards to writing the data management plan. This had also been one of the questions raised by a participant during the introduction.

We then addressed some of the other questions that had arisen: about metadata and what to put in a README file. For some reason, we had difficulty loading into the LiveRoom the new slides we had prepared on these topics during lunch. As a result we had to present them using the shared-screen feature and, as Sander had reported, it is much more difficult presenting in this way than presenting slides whilst you are still in the LiveRoom.

For time's sake, we had to leave out our usual presentation of all the sections of the data management plan template. None of the participants raised any questions at this point so we do have some concerns that there will be many unanswered questions when they start to look at it in more detail. However, we hope that because everyone is getting more used to virtual communication, that by making ourselves more easily available for questions, some of them will get in touch with us before the next session.

We finished the session with some specific advice on the tools and information being made available to Leiden researchers on conducting remote research during this period. We emphasized that the data management plan should include their plans for this period of remote work, and that as the situation evolves, they can produce a revised plan to reflect the changed situation. In this way, the different versions of their data management plan will capture this evolution. We hope everyone left feeling confident about writing their data management plan.

At the end of the session, we thanked our participants for their patience whenever our inexperience showed, and for their understanding in receiving this training in a form that was perhaps less rich than it would normally have been. Most of our participants wrote a note to thank us before leaving the room, and expressed gratitude that we had delivered the course at all. We stayed behind in the LiveRoom in case anyone had a final question they wanted to ask – and one person did.

Plans for the next session of this training course

The second session in this training programme is scheduled for May 15th, 2020. This is when the PhD candidates present their data management plan to the group. It’s an opportunity to ask questions and share best practice examples, as well as receive feedback on the decisions that have been described in the plan.

Normally, we split the larger group into two groups of about eight participants, which promotes better interaction. This time, anticipating the challenges that virtual interaction presents, we will split into three sub-groups which we can then accommodate in three, one-hour sessions with fifteen minutes break between each one. We hope this makes the session less daunting for the participants.

To encourage better discussion, we decided this time to ask participants’ permission to share their data management plan with the other participants in their sub-group in a shared SURFdrive folder a week before the session. This way, everyone can prepare questions or comments in advance, and we hope the session will then be more participative.

We’ve asked for all participants to submit three slides before May 15^th which we will then load into Kaltura LiveRoom and have ready on the day. We hope this slightly-adapted session design will go smoothly, and contribute to a successful closure of the programme.

In summary: what we learnt for future virtual training events

This is a perfect opportunity to try out new ways of delivering training – everyone understands that things are new and might not work as planned so be daring and test some things out!
Don’t underestimate how tiring it will be:

- Maintaining concentration as a presenter is more difficult when you can’t see the people listening to you so limit each presentation slot to around 20 minutes.

- Start simple and break the session into small blocks. Not only do you reduce the risk of losing a large part of the programme because something went wrong, but you can switch between presenters frequently.

- Having at least two moderators is essential: whilst one person is talking you need to have at least one person available to respond to, or deal with, written questions and comments and to notice if anyone puts up their hand. We found this worked quite well because we were both familiar with the presentations and knew when there would be a good moment to interrupt.

- Schedule in more regular, and longer, breaks than you would normally.

Prepare to be flexible:

- We faced some difficult decisions during planning on what to leave out so that we could fit the programme to the virtual environment and the time available.

- We had to make some small adaptions to our plans during the session based on a feel for how things were going.

- Prepare as much as you can in advance, and rely on screen-sharing during the session if you need to.

- Expect there to be at least some minor technical difficulties that will take time away from your planned agenda. We followed the Kaltura recommendations to encourage participants to close their tabs and other apps, and plug into their power supply to avoid losing processing power or battery power during the session. Luckily, we didn’t experience any serious sound or image problems!

You need to work harder to get good interaction:

- To prevent technical issues, we disabled all the camera’s during the presentations but being unable to see everyone’s faces and look people in the eye makes it less interactive.

- Try to check regularly that participants understand, or offer different opportunities to ask questions or provide follow-up later. There is no chance to speak to everyone individually or informally so it’s difficult to get a sense of whether they are confident in their understanding.

- Using the PresentersWall poll as a starter was an efficient way to collect information but made it all a bit impersonal, especially as the participants did not all know each other.

- Use the technical features available to you in creative ways. We asked everyone to ‘raise their hand’ (virtually of course!) to show us they were back after the break.

Overall, the session went really well and we were really pleased with it. As a result, we feel confident about delivering more of our training sessions virtually in the future. We've already put our experience into practice for our first virtual version of the workshop “How to write a DMP” and we're looking forward to the next workshop for which there are many researchers already registered.

Delivering virtual training is exhausting, but exhilarating!

A note of thanks

We thank Sander and the participants for the lively session we had all together.

The Kaltura LiveRoom platform was ideal for this training session and is really straightforward with lots of online guides and tips. It proved to be very simple to use – a huge relief when you have a pile of other worries! Kaltura LiveRoom was originally only available for student teaching, so we’re very grateful that the Coronavirus crisis team made a swift decision allowing us to use it for training PhD candidates.

Stefan de Jong and Giulio Menna, from the University Libraries Kaltura support team, responded incredibly quickly to get us set up and started and to answer our teething problems. Thanks a million to you both!

Digital Scholarship@Leiden

Joanne Yeomans

Fieke Schoots

Delivering data management training remotely

We asked the following questions to get to know everyone in the group:

Related

Hackathons – not only for techies

Delivering data management training at Leiden

The Centre for Digital Scholarship responds to Coronavirus