The statistical model is undervalued as a machine with features which can subdued, like any other machine, to intentions. This piece explores capitals focus on predictive modelling as a disciplinary tool.
The characterising feeling of our time seems to be that no-one has any true idea of what is happening or about to happen. In his 2016 documentary Hypernormalisation, the film-maker Adam Curtis depicts a world where the neoliberal political elite, in response to the rising complexity of our world (and their policy failures), are retreating from reality into a comforting dream world. The statistical models that are used to make financial and political decisions at the highest level, replete with hegemonic capitalist economic assumptions, are collapsing around us in a spectacular fashion. The 2008 worldwide economic crisis, which very few economists predicted, resulted in the suffering of millions of people due to austerity. Yet no-one has ever been held accountable for causing it. Shortly following this came polling models that predicted, to the relief of the “sensible”, yet no less neoliberal, politicians, that Donald Trump and Brexit were ultimately interesting political phenomena but never likely to happen. A similar dystopian view is espoused by James Bridle in his book New Dark Age. He describes how the increasing use and interconnectedness of technology (which often utilises statistical models) has not lead to an improvement in our ability to understand the world: “The abundance of information and the plurality of worldviews now accessible to us through the internet are not producing a coherent consensus reality, but one riven by fundamentalist insistence on simplistic narratives, conspiracy theories, and post-factual politics”.
Yet, for all the understandable disdain for the misuse and misapplication of statistical models, as well as the rarely questioned assumption that all of our problems can be solved with further computation, there are models whose results tell us things that we trust and would do well to listen to. We have, in the worst-case scenario, 12 years left until our climate is irreparably damaged and a cascade of events begin that will make our planet far less habitable. What then, are the relevant differences between the models of our planet’s climate and the models used by financiers, pollsters, or increasingly, the state? How do the mathematical properties of the models reflect the ideology behind the goals which they are deployed to achieve?
The answer to some extent lies with a greater appreciation of statistical models as machinery, what they are used for as well as the form in which they are used reveals the beliefs and goals of those that deploy them. A statistical model is the mathematical representation of a set of assumptions regarding the data that you are analysing. I might collect data on the number of hours that a child spends studying and their corresponding result on a test. In general, we would expect children who study more to do better on the test, but there may also be very clever children who do not study at all and also do well. In my statistical model I try to determine a relationship between how studying will affect your score, i.e. you will achieve 2 more marks per hour spent studying on average. An algorithm is a set of steps that are followed by computers to achieve tasks, quite often these steps utilise the results of statistical models to govern decision-making. For example, the algorithm behind the AlphaGo computer program that has defeated the number one ranked Go player in the world uses very complex statistical models to seek out moves and strategies with the highest probabilities of winning.
Marx, in his analysis of how machines can be used as a disciplinary tool to replace and de-skill workers, leaves open a reading that machines are inherently neutral until their deployment against workers by the capitalist class. On the other hand, David Harvey argues that, seeing as advancing capitalist production is a motivating factor in the invention of more efficient machines, it is likely that machines have been purposefully developed in a way that helps to extract surplus value from workers and maintain the conditions for capitalist accumulation. Early during its conception in the 1940s, the microwave was rebranded from a high-tech gadget for men to a kitchen aid that would save women time when cooking meals for their husbands. Marxist social reproduction theory understands this as capitalism developing in such a way to ensure optimal social relations in the population to allow for the continuation of the capitalist mode of production. In short, the microwave was marketed to women as a way of making them more productive at feeding and caring for their family, capitalism requires them to do this because it needs a fresh supply to healthy workers to extract surplus value from.
Statistical models seemingly inhabit a similar role whereby they replace workers through automation, help to extract maximum surplus value by running quickly and continually, and are also thought to have no inherent ideology. When U.S. Congresswoman Alexandria Ocasio-Cortez stated in an interview that algorithms reproduce racial inequalities, conservative commentators were quick to point out that algorithms “are driven by math” and therefore cannot be racist. As with machines made for factory-based production, it is very likely that certain statistical methods and modelling approaches enjoy their popularity due to their ability to directly leverage further surplus value for capitalists. The statistical properties of the models that are prioritised because of this and how this alters our understanding of the world shows how capitalist logic will continue to reproduce itself as algorithms rise to prominence throughout society. This is especially so when the types of models championed in the private sector cross over to being used on socio-economic problems (the very kind generated by capitalism).
I argue that, in the name of efficiency, capitalists and sympathetic technocrats have come to favour the use of statistical models that aim to predict the outcomes of events rather than models that aim to explain how and why these events occur. I will do my best to explain for a non-mathematical audience how these two methodological approaches differ and how both types of models would ideally be used with an example from my own scientific field, mathematical models of infectious disease. Then we can turn to the issues that arise when models for predicting are used in a solutionist way by local authorities, focusing specifically on local councils using socio-economic data to predict child abuse and neglect. These models are often criticised for reproducing the subconscious biases of the people that create them, but I argue that this criticism puts too much emphasis on locating biases and oppression solely within individuals. Instead we should directly criticise models for not acknowledging how the structure of society itself contributes to class, race and gender-based oppressions. Finally, I try to clarify the demands that the left should be making: a renewed emphasis on models that explain, putting front and centre again an understanding of the world where the capitalist mode of production is the primary causal agent in the material conditions of people’s lives.
Explanatory and predictive modelling
When people talk about how algorithms “rule our world”, they are usually describing an implementation of statistical models in a computer system. Our interactions with these models are becoming more frequent, models recommend us further products as we shop online, help us identify potential romantic partners, and generate news articles from press releases for us to read. Statistical models themselves can broadly be separately into two groups depending on the role that they are intended to perform – whether they aim to primarily to explain something or whether they primarily aim to predict something. The difference between predictive and explanatory models is nuanced, but both types of model play important roles in the majority of scientific fields, especially those that are themselves building mathematical models of the physical and biological worlds around us.
As the name suggests, explanatory models are used to try and understand causal relationships and associations between events. The most obvious examples of explanatory models involve testing hypotheses about events that we can observe in the physical world. To return to the example of climate change, we can test hypotheses on what is causing sea temperature rises by building models that explain sea temperature in terms of collections of other things that we can measure. We can then collect data on what we think is causing the sea temperature rise, say carbon dioxide emissions, and see how well our assumptions (that are explicit in the mathematics of the model that we construct) correspond to the data we have collected. We can use a rigorous framework to test whether my CO2 model explains sea temperature better than random guesswork, or better than a competing model that uses different data to explain sea temperature changes. Much science involves incrementally improving and shoring up hypotheses by collecting or generating additional data and adjusting assumptions in statistical models to better correspond to what we observe.
In contrast, predictive models are concerned only with making the most accurate predictions. They differ from explanatory models in that they are constructed using a process that aims to minimise prediction error but is not concerned with the actual mathematical structure of the model or what data is used to make the prediction. If I ran an online retailer and wanted an algorithm to generate product suggestions, I could make a model that predicts the best products to show based on the likelihood of you buying them. To maximise my profit I’d construct the model by comparing how well many different bits of data (such as your purchasing history or keywords from your google searches) perform at predicting the best products to recommend and I’d pick the bits of data based on how well they predict and not because they help me understand why you are more likely to buy one thing after another. Sometimes having no underlying hypothesis about the causality between what you are predicting and what you are using to predict it can lead predictions astray. In complex “black box” models, where you cannot discern what properties of the data that the model is using to make its decisions, it can be very hard to understand why a model is making incorrect predictions. A good example of this is the algorithm that chooses substitute products for your online grocery orders when things are out of stock, such as replacing a particular brand of dog food with cat food by the same brand. The algorithm does not understand what the products are and why dog food cannot be replaced with cat food, it is merely guessing what might be a good replacement based on the products having similar names, or being bought together frequently, and so on. However, it should be noted that this does not mean that predictive models are “less correct” than explanatory models, sometimes an explanatory model that is built around a hypothesis about the data can also give very bad results if that hypothesis is incorrect.
In my own field of malaria epidemiology, we use both explanatory and predictive models to achieve the overall goal of understanding, controlling and hopefully eradicating malaria as a public health concern. We use explanatory models to test hypotheses about things that influence malaria transmission, such as recent high temperatures or heavy rainfall being related to subsequent rises in malaria disease burden. These statistical relationships in the data we collect (recorded hospital cases and weather data from satellites) are then related back to what we know about biological processes in the real world, i.e. that heavy rainfall provides ample breeding grounds for the mosquitoes that transmit malaria. More available breeding sites leads to more mosquitoes being born, more mosquito bites happening on humans, and eventually more malaria transmission. More recently, predictive models have been used to produce detailed maps of malaria risk that are used by national malaria control programmes to allocate resources effectively. Predictive models are “fitted” using optimisation procedures that try to find the best bits of data to use and best model structure to give the most accurate predictions. The predictive models for these maps are fitted using large datasets collected by the US government called Malaria Indicator Surveys. These surveys are done in most African countries every few years and record very detailed socioeconomic information, as well as whether people tested positive for malaria. This predictive model is then used to “fill in the gaps” for the areas where there is no data on malaria transmission.
These maps are undoubtedly useful from the perspective of policy-makers in national malaria control programmes in African countries, but the predictive model is less useful for advancing our understanding of the causal mechanisms of malaria risk. One of the socio-economic variables used to predict malaria prevalence in a location is how accessible the nearest cities are. In general, there are no reasons relating to the biology of the parasite or mosquito that explain why there is less malaria in places with shorter travel times to nearby cities. Rather, having greater accessibility to nearby cities is associated with “household wealth, educational attainment, and healthcare utilization”, which do have an impact on malaria transmission. The scientific community can explore how greater household wealth usually entails better quality housing, so mosquitoes cannot get inside to bite and infect people. Experiments, forming hypotheses and using explanatory models achieves this understanding of how household wealth can play a causal role in the risk of malaria. In this case, we would say that the malaria map model is “confounded” by household wealth, which is linked to both accessibility to the nearest city and level of malaria. Confounders are factors that causally affect both the data we are using to build our model, as well as the phenomenon that we are trying to explain or predict. Richer households have better access to nearby cities, as well as less malaria because of better access to healthcare and preventative methods. As you might imagine, practically all socioeconomic variables that can be measured are confounded in some way, making the causal associations between them very hard to identify.
Another example from malaria is that household chicken ownership is often associated with lower prevalence of malaria. There are two competing hypotheses here, one is that wealth is again acting as a confounder because wealthier houses will be more likely to own chickens and less likely to have malaria. Alternatively, it has been proposed that odours released by chickens are off-putting to mosquitoes, which sometimes stops them from biting people in households that own chickens. Researchers can compare these competing hypotheses by running experiments and using explanatory models. Do people sat near chickens record receiving fewer bites when sat outdoors at night in experimental conditions? Can we identify the chemical compound released by chickens that might cause their mosquito repellent effect? Answering these types of questions is vitally important, if my policy recommendation is to hand out chickens to all households in Sub-Saharan Africa and it turns out that chicken ownership is merely a confounder for household wealth then all of the poor households that I have just given a chicken to will not see any benefit in terms of malaria protection.
This section has fleshed out explanatory and predictive modelling approaches for non-statisticians, with the most important distinction between the two approaches being that they have different priorities regarding identifying causal mechanisms and confounders. Companies using predictive models to maximise sales don’t necessarily need to know or care about identifying confounders. Scientists are often more interested in trying to understand the ultimate causes of phenomena in the biological or physical systems that they study, so they are more concerned about confounders. Neither type of modelling is “correct”, it just depends on your motives. We can now start to explore the context in which local authorities have begun to use predictive models and some of the issues that this brings up.
Many local authorities have begun to deploy predictive models when planning policy. For scientists, such as those making the malaria maps described above, using a predictive model is a conscious decision to use a hypothesis-free model to achieve optimal predictive accuracy for specific practical ends. Local authorities on the other hand seem to derive their justification for using predictive models from the private sector, since neoliberal dogma holds that local authorities should be run as if they were a company to achieve the best outcomes. As a result, local authorities have begun to deploy predictive models, with no explicit consideration of causality or confounding, to tackle various social problems.
In September 2018, The Guardian newspaper broke the story that at least five local authorities were developing, or had already implemented, predictive models that use local authority data to predict the risk of child abuse in a household. The sorts of data considered for inclusion in these predictive models ranges from “school attendance and exclusion data, housing association repairs and arrears data, and police records on antisocial behaviour and domestic violence”. The model can then be used to trawl through council records and flag up families to refer to the government’s Troubled Families programme. The result of being referred to the Troubled Families programme varies by local authority, but in general it entails more intensive and co-ordinated interaction with the family by several council services. The programme was launched in 2011 to specifically target families with multiple, often interrelated social issues such as “crime, anti-social behaviour, truancy, unemployment, mental health problems and domestic abuse”. The rationale of the Troubled Families programme was that the initial money spent on targeting and helping these families would be less than the amount that it would have cost social services to help with the consequences of their social issues had they not been “turned around”. The predictive model speeds up the process of identifying families that are likely to meet two of the six eligibility criteria for referring families to the Troubled Families scheme. A computer searches the council database and provides a list of families that are likely to be eligible to be referred far faster than a person going through cases manually. Xantura, the data analytics company who developed the predictive model used by Thurrock and Hackney councils, advertise the use of their predictive model as a way to “maximise PBR [payment-by-result] payments”. Councils are given £1000 per year for each family signed up to the scheme and a further £800 for families that the authority reckons have “achieved significant and sustained progress” or have “had an adult move into continuous employment”.
Cost savings are a top priority for nearly all local authorities given relentless ideological budget cuts imposed by the conservative government. The Revenue Support Grant, a major source of funding for local councils has shrunk from £9.9bn in 2015 to £2.2bn in 2019. In response, local councils have imposed their own extreme reductions in the scale and diversity of local services provided, as well as drastically reducing staff numbers. Previously, identifying families for referral would have been the job of caseworkers (or suggestions from external voluntary groups). Statistical models now take on the classic role of machinery in replacing workers by greatly increasing the productivity of searching for families eligible for referral (although Xantura say that the final decision is always made by a human). It is hard to know whether humans and the Xantura model would refer the same families since these results are not publicly available. The desired outcome for the council of making the most accurate referrals is corrupted by the incentive to receive as much funding as possible, if a statistical model is cheaper to purchase than employing caseworkers then they will do this. Xantura describe themselves as “a leading provider of data sharing and advanced analytics to the public sector”, they specialise in providing software that councils can use to analyse council-held data to generate a list of households for further investigation. Advanced analytics deals with fitting and optimising complex predictive models. Xantura will have had to train their predictive model using existing data, most likely data that the council has on previous cases and corresponding knowledge about the employment, truancy, and offending records of family members. Fitting the model to this data would determine the most useful predictors for abuse risk and could flag up potential referrals from new data in the future.
Criticising bias in algorithms
Criticism of the use of predictive models by local authorities in the way outlined above usually emerges from the political left in the form of concerns about human biases making their way into the model development process. These biases then affect the subsequent predictions and real-world implications of using the model, often leading to further discrimination against oppressed groups. For example, see a recent audit of a predictive model used to select candidates for interview, which predicted future job performance using job applications that candidates filled in. It turned out that the top two best variables chosen during the model fitting process included being called ‘Jared’ and playing lacrosse. What has happened is that rich, confident men who played lacrosse at their public school would often score well on performance reviews (often unrelated to their actual performance, given a subconscious bias towards rating men more highly occurs in many other scenarios). When the model was fitted, it detected this pattern in the data and reflected this in its predictions. The subconscious human bias that was present in the data used to build the model led to a reinforcement of this bias by preferring rich, male applicants.
In the hiring algorithm scenario described above, had anyone looked briefly at the Jared and lacrosse variables then they probably would have identified that this probably wasn’t a good basis to make interview decisions based on. However, accounting for confounders in statistical models is not always so easy. Biases present in predictive models can derive from two different mechanisms. Firstly, subconscious biases present in human decisions about data collection, model construction, as well as the framing of the problem that the model is being used to solve, can introduce biases which then solidify in the mathematical structure of the model, reproducing the bias in the model predictions. This is the case in with the “Jared” model above where the data used to fit the model had subconscious biases in it because people tend to rate the performance of men more highly than women. The finished model, by seeking to copy patterns in the data used to train it as accurately as possible, reproduced this bias in its own predictions. Secondly, the structures of oppression in society can act as confounders, leading the model to make predictions that are biased even if the technical training process of the model remains untarnished by subconscious human biases. This is best explained with an example: in a dystopian future, if someone created an algorithm that predicted the likelihood of young men in the US to go to prison, it would undoubtedly use the fact that a young man is black to assign a higher probability of going to prison. The data used to train the model would involve the fact that 33% of the US prison population is black even though only 12% of the American population is black. This data is not warped by the subconscious biases of the people who recorded it. Rather, young black men in America live in material conditions that push them towards crime and they are then punished for it disproportionately by a racist justice system. In this case, it is obvious that the bias in the predictive model emerges from the causal role of the structures of oppression in society.
We can explore this line of thought in more detail for the model predicting child abuse and neglect used by local authorities. There is evidence to suggest that the model is heavily confounded by class, and the corresponding criticism of using a predictive model in this way should reflect that. The Joseph Rowntree foundation, reviewing the evidence for an association between poverty and child abuse and neglect, found that there was evidence of bias in the data collection of child protection systems, but that “this is insufficient to explain or undermine the core association between poverty and the prevalence of childhood abuse and neglect”. Instead, explanatory models of child abuse and neglect propose that household poverty plays a causal role directly, where families do not have enough money to be able to care for their children, and indirectly, where poverty causes parental stress and environmental conditions that put children at risk of abuse and neglect. While it is unethical to perform randomised controlled studies to investigate the relationship between poverty and childhood abuse and neglect (since it would potentially expose children to higher risk of childhood abuse and neglect), these causal poverty models are supported by experimental studies that observed a lower prevalence of childhood abuse and neglect in families which had their incomes raised. Class is likely a confounder in this case then, influencing both the prevalence of childhood abuse and neglect as well as the socio-economic data used to predict it.
If class does play a causal role in the prevalence of child abuse and neglect, then its inclusion in the predictive model is not merely due to the “human bias during model development” criticism outlined above. In the Guardian article revealing the use of predictive models by local authorities, Virginia Eubanks, an associate professor researching how automated systems can reinforce inequality, is quoted as saying “We talk about them as if no human decision has been made, and it’s purely objective, but it’s human all the way through, with assumptions about what a safe family looks like”. The human decisions that Professor Eubanks talks about can enter the model in the two ways described above: they lead to biased data which influences the model, or they lead to unbiased data that looks biased because it is accurately reflecting structural inequality in society. Taking Professor Eubanks’ criticism further would involve stressing the need to understand how poverty can play a causal role in childhood abuse and neglect. This means that anyone serious about tackling child abuse would need to consider alleviating poverty, which puts children at risk. Instead, what we have now is local authorities using poverty to predict childhood abuse and neglect and touting this a “technological solution”, combined with a national government that has plunged half a million children into poverty. A more substantive critique of predicting child abuse using social data is that predictive models should not be used in this way without a complementary understanding of the causal mechanisms of the social issue that you are predicting (of the kind that would be provided by explanatory modelling). Through only utilising predictive modelling, the causal role of class oppression is being shorn away from the state’s approach to tackling the problems that such a structure brings about.
The political motivation for ignoring causation
Unsurprisingly, favouring an approach to solving social problems that ignores the causal role of class is politically beneficial for the conservative party and capitalism more widely. If you conceive of crime as the product of the wrong decisions stemming from people’s bad character, rather than the material conditions in which they live, then it makes sense for your approach to solving crime to be to address each instance of it as efficiently as possible. As a result, we spend time and resources reacting to crime or predicting it occurring quickly, rather than trying to understand the causes of crime and eventually trying to build a world where it no longer happens.
Predictive models and machine learning methods have risen to prominence through their use by analytics companies, who apply them to consumer data from commercial companies to pry out insights into how companies can make more money. From a commercial point of view, it is not important to understand the processes behind why a customer who just viewed product A should be shown a link to product B, just that by doing this you are predicted to maximise your profit. When the state takes this approach to social problems, it directly applies capitalist logic to social problems and rejects the task of understanding the structure of reality. Marx warned how the capitalist process of production, where workers purchase all their necessities on the market and performed increasingly specialised jobs, reduced the overall knowledge of workers. A similar process is at work here. People who should be using their knowledge of society to shape it are reduced to finding hypothesis-free correlations between local authority performance metrics. This is the erosion of knowledge about how the conditions in which we live play a causal role in how we live our lives, packaged and sold to us a technological improvement. The fact that class, race and gender are selected as variables again and again in predictive models for social phenomena demonstrates that socialists are correct to assert that society is structured to reproduce and reinforce these inequalities. We should be pushing for a greater understanding and appreciation of how these power structures cause social problems, because then tackling these problems would entail addressing class inequality and dismantling capitalism. Ignoring causality, predicting, sorting and categorising our lives with ever greater speed just brings forward what Jameson calls “sheer theme and content”, all that is solid continues to melt into air.
In summary, we should be more precise in our understanding of how social inequalities enter into (and become entrenched in) statistical models. So far, we have mostly relied on a person-based understanding, where subconscious bias enters during data collection and model construction. However, we should instead favour a structure-based approach, where biases in data and models reflect the power structures inherent in global capitalism. We should also consider another distinction: between employing predictive and explanatory models, which as I have tried to show, have different goals. Predictive models are deployed as a technological fix to improve our response to social issues, however the way they are deployed promotes the idea that we should be using poverty as a way of predicting social problems rather than understanding it as a cause of social problems (which is necessary if you envision a world where we eventually overcome the problem). For Labour councils, while the implications of the widespread adoption of predictive modelling have not been immediately clear, they are clearly intolerable for a party that wants to fundamentally redress the balance of power. What we are fighting for on the left, as the use of predictive modelling permeates more and more aspects of our lives, is still an understanding of the world where class, race, gender and so on shape the outcomes and conditions of our lives. The tools through which capitalism tries to obscure this understanding may have increased in technological complexity, but their deployment to protect class interests remains the same.