We have now all become armchair data scientists

Honing the ACTUAL AI skills we need post COVID-19

It’s 2021 and the BBC is writing articles that include the sentence “Version 3.9 of NHS Covid-19 takes advantage of a filtering process called Unscented Kalman Smoother, to screen out suspect readings”. We have now all become armchair data scientists, whether we were meaning to or not.

The World Economic Forum said that “While it was a microscopic invader rather than the rise of the robots that led to the current collapse of the labor market, it has become clear that the fallout of the pandemic will accelerate digitization and automation across a range of industries and sectors. This calls for new investments and mechanisms for upskilling and reskilling, for both deeply human skills as well as digital skills.” The WEF also predicted that demand will increase this year by 16% in data and AI, along with 12% for engineering and cloud computing skills.

A new cohort of data scientists

Importantly those most affected and where skills in data literacy are the most important to invest in aren’t young people (I’m drawing an arbitrary line at university students and below). COVID-19 has accelerated the skills gaps not only across generations but across some narrow few years. Those over the age of 35 who did not have a digital-by-default education, with the bulk of their career ahead of them, have emerged as the group most likely to diversify their skills, and embrace new learning in data science and AI. Balancing education with a workforce whose jobs will be the most affected by – if not replaced by – digitisation, automation will be key post-COVID.

Colleagues and friends working at universities have said the numbers of applications, despite the turmoil for health data science programmes, was way up this year and we’ve seen an influx of data scientists, engineers and technical architects join One HealthTech from non-health fields, many of whom cite a desire to contribute something of value to the pandemic through their technical skills (adtech is poorer for it now!)

You’re never too old to code

With the democratisation of access to learning and development opportunities, be it Massive Open Online Courses (MOOCs) like Coursera or Lynda, or online technical curricula like DataCamp and Code Academy, to in-person but free-to-access courses like CodeFirst:Girls (including a course in Cambridge) the main barrier is the privilege of time to invest and the motivation to crack on.

But, with the democratisation of access to learning, as well as the tools to develop “AI” (or really off-the-shelf models with a line or two of code) we’ve seen a boom in many “drag-and-drop” data scientists.

In my experience of going from a self-taught, drag-and-dropper, to working with real clinical data at scale and being exposed to what it could look like to implement this in practice, I have come to learn that those late nights of teaching myself how to generate a Random Forest was time poorly spent. Anyone who has had any touch point with clinical data, be it seeing a clinician input the data or actually analysing it, will know that clinical data is the poster-child for the widely known issue that real-world data analysis is 90% data cleaning and wrangling.

But if COVID has shown us anything about data, it is that seemingly basic tasks have the potential for disastrous ramifications if you use the wrong tools or your process isn’t right. Every team has their own assumptions and methods of choice, resulting in vastly different predictions when it comes to cases, mortality or R. Furthermore, regularly shipping insights from data to those who need it, is a path full of potholes.

So yes, AI skills, be them technical or not, remain really important for the future resilience of professions, and particularly those well into their careers. But some things COVID-19 has taught us about AI is that the first step is not an anthropomorphised robot, pointing deliberately into the middle distance; the second step is not a whizz bang neural network regardless of whether the problem needs one or not, and the third step is not slides with great graphics and no underlying information on the assumptions or caveats required when interpreting the results.

What has the pandemic taught us?

COVID has made many of us data nerds (those who weren’t before) and meaningful depth and appreciation of the complexity of real-world data and its analysis will hopefully shape those budding AI enthusiasts to focus more on the fundamentals: coding is important but those getting into data science need to understand why certain models are chosen, how much data quality underpins this all, and just how important clear communication of statistics is.

About Maxine

Maxine is a member of the Health Innovation East Board and a Research Associate working between The Alan Turing Institute and The Health Foundation. Her work applies machine learning to large NHS datasets. Maxine is also the co-founder of One HealthTech – a global, volunteer-led, grassroots community that supports and promotes under-represented groups in health innovation. She has worked across a range of organisations, including the World Health Organisation on artificial intelligence policy, L’Oreal’s scientific team and technology strategy in Roche Diagnostics. She is part of a number of communities and committees including the World Economic Forum’s Global Shapers, the Churchill Fellowship, the British Computer Society and previously, she sat on the DeepMind Health Independent Review Board.