Skills, Role & Career Structure of Data Scientists & Curators: Assessment of Current Practice & Future Needs
This study was commissioned by JISC to specifically address two recommendations from the report by Liz Lyon on data management in the UK (Lyon, 2007). The main aim of the project was to examine and make recommendations on the role and career development of data scientists and the associated supply of specialist data curation skills to the research community.
To examine the role & career development of data scientists & the supply of specialist data curation skills to the research community
The nomenclature that currently prevails is inexact and can lead to misunderstanding about the different data-related roles that exist. We have attempted to reconcile in section 3.1 the definitions offered by authoritative organisations and the practical experience of people working in the field. We distinguish four roles: data creator, data scientist, data manager and data librarian. We define them in brief as follows:
- Data Creator Researchers with domain expertise who produce data. These people may have a high level of expertise in handling, manipulating and using data
- Data Scientist People who work where the research is carried out – or, in the case of data centre personnel, in close collaboration with the creators of the data – and may be involved in creative enquiry and analysis, enabling others to work with digital data, and developments in data base technology
- Data Manager Computer scientists, information technologists or information scientists and who take responsibility for computing facilities, storage, continuing access and preservation of data
- Data Librarian People originating from the library community, trained and specialising in the curation, preservation and archiving of data
In practice, there is not yet an exact use of such terms in the data community, and the demarcation between roles may be blurred. It will take time for a clear terminology to become general currency.
Data science is now a topic of attention internationally. In the USA, Canada, Australia, the UK and Europe, developments are occurring. It is notable that the vision in all these places is that data science should be organised and developed on a national patern rather than relying on piecemeal approaches to the issues.
Researchers in general are becoming much more aware of the issues that data-based research raise. Some already possess considerable skills in handling and managing data (so-called ‘native data scientists’), but even those less experienced in this regard show an interest in learning more. They turn, in the absence of a data scientist in their circle, to the institutional IT services or library for assistance and advice. Some UK universities are now beginning to offer taught master’s courses in data management which may help to raise the general data skill level. Just as data centres have been training data scientists for some time now and accepting that they will eventually leave for other jobs, thus helping to diffuse data skills into the research community, so increasing numbers of researchers with postgraduate training specifically in data-related matters will do the same.
Data scientists have usually ended up in their role by accident rather than by design, though this is changing as more data science posts are created. They may be qualified for their role by either being a domain expert who has acquired specialist data skills in the course of their career, or by originating as a computer scientist who has acquired domain knowledge over time. Most data scientists currently in post say they have learned their skills on the job because of the lack of proper training opportunities and the cost (in time and money) of attending suitable events. Although until recently there has been no tight specification for qualifications the trend now is increasingly for postgraduate training in informatics to be required. In practice, data scientists need a wide range of skills: domain expertise and computing skills are prerequisites but ‘people skills’ are also valued since a major part of the role is in translating the needs and practices of the researchers for the computing experts (people we have defined as data managers) and to some extent vice versa.
There is no defined career structure for data scientists and this is a major problem that must be resolved if the UK research community is to be properly supplied with data skills. Data scientists may be in tenured jobs in universities and data centres, or they may be employed on short-term research contracts. Those in tenured roles in universities may be on a variety of career grades, from technical through service or academic-related grades to full academic grades. There is no consistency across the system at present. The lack of job security is an issue in encouraging and retaining data scientists and demand currently far outstrips the supply of skilled people. Another issue causing some degree of disaffection amongst data scientists (or would-be ones) is that they can feel undervalued, a result of the lack of professionalisation of their role and of a formal, organised career structure.
People in data science roles face a big, continuing challenge in remaining properly skilled up. Data matters are moving very quickly and they need to stay abreast of general developments and developments specific to their field. In some disciplines there are international workshops that serve to assist in this, but even here these are not always enough. Data scientists favour the idea of continuing professional development in the form of regular short courses on specific topics that are ‘of the moment’ and hope that such a system will become an accepted part of their role.
As regards the question of whether there is value in extending data skills within the undergraduate curriculum, there is a dichotomy of views. Whilst many people consider this advantageous – data scientists themselves think that the earlier basic data skills are instilled in future researchers the better – many people teaching undergraduate programmes say that they are full enough as they are without adding specific data skills modules. They also point out that in disciplines where data handling skills are very pressing, the undergraduate curriculum already has elements (such as teaching how to construct and use simple relational databases) included within it. It looks likely that further data skills training will naturally become part of undergraduate training as things evolve over time, in ways appropriate to each discipline.
The role of the library in data-intensive research is important and a strategic repositioning of the library with respect to research support is now appropriate. We see three main potential roles for the library: increasing data-awareness amongst researchers; providing archiving and preservatin services for data within the institution through institutional repositories; and developing a new professional strand of practice in the form of data librarianship. In the US, advances are already being seen in this respect as the library community aligns with the demands of the data deluge and organises to provide data archiving and preservation skills formally via library school education. There is a fledgling advance in this area in the UK, too. There are, however, not enough specialised data librarians yet. In the UK there are thought to be just five at the moment, something that will need to be changed quickly. One reason why there are so few so far is a parallel with the situation for data scientists – there is no recognised career path. Attracting well-qualified – that is, pre-qualified in specific domains so that an understanding of the data structures and uses in a domain comes as a given – is also difficult at present. And, in the US, which is further along the path than the UK, a lack of suitable internships for data librarian trainees has also been identified as a factor hampering training in the profession: this may yet also prove to be an issue here.
Recommendations regarding data skills development in research domains (RD)
Major research funders in the UK should work with universities and research institutes to define properly and to formalise the role of data scientists, and to develop the means by which the work of data scientists can be recognised and remunerated.
These same bodies should work together to create the conditions that support data science, foster its study and encourage professionalisation of the role.
JISC and other organisations that commission original research should take forward a study (or studies) that cover the following issues:
- A description of the role played by data scientists and the value of the contribution they make to research
- Examples of data science careers
- The development of a set of practices that represent good practice in data science
The relevant bodies (HEFCE and the research councils) should consider the establishment and funding of a network of trainers with the skills to deliver short postgraduate training courses to researchers covering the fundamentals of data management, thus building basic data science skills into the research process. Some of the research councils have laid the foundations for this with their requirements for a data plan in grant applications.
The research councils and other research funders should consider whether, as part of the grant application and award process, they should require at least one member of the project team to be nominated as the project’s data scientist. This person should be required to attend a short course covering the fundamentals of data science and management. Research councils should consider the extent to which accrediting valid courses and proof of attendance is necessary.
Recommendations regarding data skills development in research libraries (RL)
The research library community in the UK should work with universities and research institutes to define properly and to formalise the role of data librarians, and to develop a curriculum that ensures a suitable supply of librarians skilled in data handling.
JISC should consider supporting the development of the International Data curation Education Action (IDEA) working group. This group is well-placed to play an important advisory role in the development of appropriate curricula for future data librarians, particularly those coming through the library and information science route.
Recommendations regarding data skills development in general (RG)
Because there are already a number of players active in the data area there is potential for exploiting synergies in respect of data skills training. It is recommended that a study scopes this potential, looking in particular at the activities of the UK Data Archive, universities or research groups where data science is advanced, library schools, the Digital Curation Centre and IDEA (the International Data curation Education Alliance). The study might also look internationally at initiatives in the US, Canada and Australia.