Protocols, pilots and practical guidelines: 2nd Data Management Network meeting
The Data Management Network had its 2nd convening event on 11th November 2019. The main topic of discussion was institute data management protocols. We also heard updates on progress from various interest groups.
The Leiden University Data Management Network had its second convening event on 11th November 2019. This time members were welcomed by Wouter Kool to the Archaeology Faculty’s Van Steenis building.
At the last convening event in June we talked about the agenda for future meetings and settled on data management ‘protocols’ for this meeting. Some institutes have already defined protocols for ensuring the Leiden data management regulations are followed by their researchers; others are thinking about how to do this but are yet to make a start.
We therefore invited a couple of Leiden research institutes to explain the development of their protocol and describe its details. This was therefore the main topic of the agenda.
Image: extract from the Leiden University Research Data Managament Regulations
Data management protocols at Leiden
We began with presentations from two institutes that have established data management protocols.
First we heard from Jos Damen from the African Studies Centre. They are still in the process of agreeing a protocol and a revised draft is due to be returned to the institute’s Researchers Assembly for agreement soon. The Researchers’ Assembly consists of around twenty-five senior and junior researchers.
In order to produce the first draft, they did some background research: making contact with several other Dutch universities and also looking at the protocols of different faculties including the national guidelines for data archiving in the Social Sciences. Then they drafted a protocol to align with the Leiden University Data Management Regulations and asked for input on this from the Centre for Digital Scholarship and the Data Protection Officer. They presented this to the Researchers Assembly for comment.
The protocol has to encompass research funded mostly via Leiden University itself with about a third of all grants coming from NWO and a small number of commercially-funded research projects.
It has been difficult to get the balance between describing what researchers must do but offering flexibility for them to make choices so the writers anticipate that in the presentation of the revised version, they will need to explain to researchers why certain decisions have been taken.
Fieke Schoots presented some slides about the Psychology Institute’s protocol on their behalf and these slides are available to network members through the shared SURFdrive folder.
The Psychology protocol is very much aligned with the national guidelines for data archiving in the Social Sciences which were approved by the Deans of Social Sciences Faculties in April 2018. The protocol states that all data underlying a publication should be archived in the institute’s Dataverse. Additionally, PhD candidates follow a compulsory training course and submit a data management plan to the Research Committee within six months of starting their PhD. For each publication, researchers from the institute are obliged to archive a publication package including datasets within one month in DataverseNL. The publication package must contain seven elements according to the instructions:
- the preprint,
- instructions and materials for carrying out the data collection,
- raw data files,
- computer code for processing the data,
- processed data files,
- a “readme” file explaining the contents of the data set
- the ethics approval.
The default setting in DataverseNL is for data to be open but this may be switched for a particular dataset to offer the files ‘on request’. Only anonymized data can be put into DataverseNL so some data is retained within the university.
Finally, Fieke showed a nice slide from one of the PhD participants of the latest Psychology data management course in which she likened her research to a tree with many branches. In the illustration, datasets properly managed provide the sound roots on which she can let her research grow and bloom.
There followed a very useful discussion of about some of the issues and questions that were raised and these resulted in the sharing of some tips.
Tips for discussing protocols with researchers
- Focus on data management in the broad sense rather than ‘open’ data. Discuss the reasons for good data management, talk about open scholarship as a process that includes improving documentation so as to allow authentication of research but also re-use even by the same researcher, or within the research group, at a later date.
- When it comes to sharing data, think about which data can be shared with others to help their understanding or enable reproducibility of a publication. Not all data has to be shared, although funders increasingly expect that data generated through their funding will be made available for others to use on completion of the preliminary research. This doesn’t mean that the data needs to be open, rather that it is FAIR (findable, accessible – maybe on request, interoperable and reusable) and available usually for at least ten years.
- When discussing sensitive data, remind researchers that when people put themselves at risk to help research, then the researcher has an obligation to look after the data and therefore follow good data management practices, but also, if possible, to maximize its value.
- Make a note of what areas researchers (and data management supporters) are struggling with, for example, anonymization, and bring those back to the Data Management Network to see if we can together develop some help and advice.
- If you need explanations or counter-arguments to address any objections, ask others in the Data Management Network for tips and tricks.
- Raising awareness is the first step before a protocol can be agreed, then training and archiving solutions need to be introduced to implement the protocol.
- Responsibility for compliance with the regulations is with the Scientific Director, however, no-one has yet looked at monitoring. We discussed the possibility to check that every publication has a DOI to a dataset; this would give indicative statistics of compliance and could be linked to incentives.
We hope that as more and more institutes develop protocols we can make these available to all Network members.
Pilots and storage space
Programme Manager, Jacko Koster, also took the opportunity to bring everyone up-to-date on the data management implementation programme, and some pilots that are currently running. You can read more about the iRODS/YODA pilot on our 31 October blog entry.
He also introduced the concept of a new pricing framework for storage space which is currently under discussion. This relates the cost of space to the level of compliance with a set of minimum criteria: the more compliant the dataset, the lower the cost.
Jacko’s slides will also be shared with the Network members through the shared SURFdrive folder.
Practical guidelines to help data management supporters
Finally, members of the network shared progress on some of the guidelines they are developing to help data management supporters across the university, and discussed their confidence in dealing with questions about managing personal data.
(1) Fieldwork Interest Group
This interest group aims to exchange experiences and tips with regard to fieldwork. Both researchers and research support staff are welcome to join and Mareike Boom asked anyone with any useful links or ideas to send them to her.
The aim is to develop a practical and accessible guide for Leiden University researchers on data management aspects of data collection abroad. A brainstorm of all types of questions that arise has been put together and the group will now work on identifying ways to answer those questions and bring this together into some kind of guide.
If you wish to get involved, in any way, contact Mareike Boom, Law Faculty. A next meeting of the group will take place before the end of the year.
(2) Managing MRI data
Researchers that create MRI data need practical solutions for their data management that do not yet exist. The aim of this group is therefore to create a decision tree for someone who has MRI data and wants to share it. Around twenty people from different departments came to the initial ‘hackathon’ to brainstorm and there is a lot of interest. In order to keep things moving and manage the workload, the focus will remain within Leiden University for now, although there is interest in this topic across the country so the results may be taken through LCRDM for broader discussion later. Any questions can be directed to Dorien Huijser, Developmental and Educational Psychology Unit.
(3) Dealing with questions about personal data
Sometimes questions about managing personal data come to data management supporters and we don’t have the answers. We’re sometimes also not sure where to direct the person – to their own institutional Privacy Officer or to the central Data Protection Officer or to ISSC. We hope some of these uncertainties will be clarified when the Leiden Research Support service structure begins to be implemented. In the meantime, Michelle van den Berk from the Centre for Digital Scholarship presented a pitch to ask Network members if they also encountered problems in knowing how to answer such questions.
Image: summary of the Mentimeter survey during the Data Management Network meeting
A ‘Mentimeter’ survey of the Network members present showed that in their experience, researchers are frequently unsure how to handle personal data, and Network members therefore receive relatively many questions; the network members do not feel completely comfortable answering such questions but most did feel that they knew where to direct the questions.
Following discussion it was concluded that no new Interest Group was needed to tackle this issue, but that it would be useful to have more information presented to the Network on how questions about handling and managing personal data are answered and whether tip-sheets, checklists or other materials are available, or can be developed, to help data management supporters with these types of questions. This might therefore be something the Data Protection Officer and Network could talk about at a future meeting.
A new series of meetings will be scheduled for 2020 with a mix of shorter, regular topic-focused meetings and longer networking meetings perhaps just twice a year. The Network members identified lots of topics that are still to explore.