Cyberleagle: Safety by design or systems for content moderation?

The Online Safety Act Network (OSAN) recently published a 10-point plan to amend the Online Safety Act. The plan includes:

“Insert a definition of safety by design into the Act to make clear to Ofcom and services what Parliament intended”.

From a technical drafting perspective clarification might be welcome. The Act says that it “seeks to secure that regulated services are safe by design”. That was added at the last minute in a clause describing the overall purpose of the legislation, but which lacked any definition of ‘safe’ or ‘safe by design’. I discussed here the undoubted difficulties in interpreting what is now Section 1 of the Act.

Of course, before we can craft a definition of safety by design we have to know what it is intended to mean. For myself, I have always regarded much of the theory underlying safety by design as fragile, at least within the context of the Online Safety Act. But putting those doubts on one side, I did think that I had a reasonable idea of what safety by design was intended to be about.

Now I am not so sure.

To recap, this is how I thought safety by design was meant to apply to regulation of online platforms:

Safety by design requires safety to be considered at the design stage, not as an afterthought.

Safety by design should then be applied iteratively via periodic risk assessments, incorporating feedback learned during operation of the service.

Safety by design focuses on platform systems and processes, identifying and addressing those that create or exacerbate a risk of harm (however defined).

Safety by design is not, or at least not primarily, about systems for content moderation.

Safety by design favours non-content-specific, systems-focused measures.

Safety by design is not about automated content detection and filtering.

Safety by design proponents have long criticised the Online Safety Act for being too content-focused. Rather than more and better content moderation, platforms should have to design safety into their systems and processes from the outset. This, so the theory goes, would result in less harm (however that might be conceptualised) occurring on platforms and less need for ex-post content moderation.

There have been variations on these themes: for instance, that a systems focus can include friction measures targeted at specific kinds of content, but which stop short of requiring removal; or that measures can focus on harm arising from certain kinds of content without focusing specifically on the content itself.

Nevertheless, as a general proposition I understood safety by design (a.k.a. ‘systems and processes’) to be about addressing from the outset the design of risk-creating system features, combined with a preference for non-content-related or content-agnostic measures over content-specific measures.

If that is right, safety by design has two elements. It articulates a general approach to safety but is also exclusionary: systems for content moderation (or at least automated filtering systems) are not a safety by design measure. The UK government, it should be said, has taken the opposite view. It regards automated content filtering as a safety by design measure. That is the most obvious difference of view that, if I am right in my understanding of safety by design, would have to be resolved in crafting a statutory definition.

To the extent that safety by design proponents embrace systems for content moderation, that has tended to be as a fall-back for where safety by design measures have not squeezed harm out of the system.

Thus Professor Lorna Woods’ October 2024 paper for OSAN, Safety by Design, although allowing for the possibility of ex post measures as a residual measure, differentiated that from a primary focus on design choices:

“At the moment, content moderation seems to be in tension with the design features that are influencing the creation of the content in the first place, making moderation a harder job. So a ‘by design’ approach is a necessary precondition for ensuring that other ex post responses have a chance of success.

While a “by design” approach is important, it is not sufficient on its own; there will be a need to keep reviewing design choices and updating them, as well as perhaps considering ex post measures to deal with residual issues that cannot be designed out, even if the incidence of such issues has been reduced.”

She distinguished safety by design from techno-solutionism:

“Designing for safety (or some other societal value) does not equate to techno-solutionism (or techno-optimism); the reliance on a “magic box” to solve society’s woes or provide a quick fix. Rather, what it acknowledges is that each technology may have weaknesses and disadvantages, as well as benefits. Further, the design may embody the social values and interests of its creators. A product (or some of its features) may be part of the problem. The objective of “safety by design” is – like product safety – to reduce the tendency of a given feature or service to create or exacerbate such issues.” (emphasis added)

One might think that automated content filtering is the paradigm example of regulatory techno-solutionism. Indeed, as to the Online Safety Act itself, Professor Woods noted its emphasis on systems for content moderation:

“What is rather more explicit in the [Online Safety Act] safety duties is the focus on filtering and moderation, which may have a design element (i.e. the tools are made available within the system and designed to work with the system) but seem more ex post in the way they work.”

Elsewhere Professor Woods has included reactive content take-down systems within safety by design, but as the “last port of call”. (Introducing the Systems Approach and the Statutory Duty of Care (chapter in Perspectives on Platform Regulation, Nomos, 2021).)

We can find other examples of safety by design proponents expressing concern about the Online Safety Act’s focus on systems for content moderation.

Carnegie UK was OSAN’s online safety policy predecessor and, through the work of Professor Woods and William Perrin, was the originator of the proposal for a statutory duty of care. Carnegie UK said in its June 2019 submission to the Online Harms White Paper consultation:

“Worryingly, there are references to proactive action in relation to a number of forms of content (and not just the very severe child sexual abuse and exploitation and terrorist content) which in the light of the emphasis in the codes could be taken to mean a requirement for upload filtering and general monitoring to support that.”

Demos’ submission to the draft Online Safety Bill Committee in September 2021 identified as a primary risk:

“A focus on regulation and moderation of content rather than platform systems which affect the risk of harm arising from that content” (emphasis in original)

and said:

“Although the Bill sets out a systems-based approach, there is a focus on reducing harm through content takedown measures, measuring the incidence of harms online and a focus on enforcing terms and conditions. ... we are concerned that in implementation this will turn into a ‘content-based approach’ by proxy, by prioritising the regulation of content moderation systems above other systems and design changes.”

Demos' April 2022 position paper on the Online Safety Bill argued that:

“The Bill treats a ‘systems’ approach as meaning a ‘systems for dealing with content’ approach…”

The Demos position paper also expressed particular concern about the “strong risk of infringing on either privacy or freedom of expression” in Ofcom’s ability to require use of proactive content moderation technology.

The 5Rights Foundation’s response to Ofcom’s final Illegal Harms Code of Practice in December 2024 said:

“The legislation has a clear objective that services are made “safe by design” but the majority of Ofcom’s proposed measures are not designed to prevent harm occurring in the first place – instead focusing on content moderation and reporting tools. While greater requirements on governance and accountability are welcome, this in itself will not ensure safety by design.”

If content-focused measures, or at least automated filtering, are not a variety of safety by design then a definition of safety by design for insertion in the Act could be expected to exclude measures of that kind; albeit how it could do so when the Act specifically contemplates the imposition of automated content detection and filtering is a conundrum.

But since the government officially regards automated content filtering as a safety by design measure, it would seem highly unlikely that a definition contradicting that could find its way into the Act.

Ofcom’s Online Safety Act implementation

With the closing of Ofcom’s Summer 2025 consultation on additional safety measures, we can assess how far the Code of Practice measures recommended or proposed by Ofcom to date are – and are not – focused on systems for content moderation.

The consultation in fact provides a dual opportunity: to analyse Ofcom’s existing and proposed measures from a safety by design perspective, and to look at how safety by design proponents have reacted to Ofcom’s newest proposals for automated content filtering (in Ofcom terminology, ‘proactive technologies’).

Non-content, reactive content-related and proactive content-related How far have non-content safety by design principles found expression in Ofcom’s implementation of the Online Safety Act?

Regardless of whether there is overlap between systems measures and content moderation measures, we can still conceive of functionality-oriented measures that do not require the platform to make judgements about content, nor involve directly limiting dissemination of content at all. A friction measure such as a warning ‘Did you mean to post without reading the linked article?’ would be an example of such a non-content measure.

Thus we can break down the measures so far recommended or proposed by Ofcom into non-content and content-related. The latter can be further divided into reactive and proactive.

In total, across the Illegal Content, Protection of Children and draft Additional Measures Codes for U2U services, there are (on my reckoning) 73 non-content measures, 27 reactive content-related measures and 12 proactive content-related measures. For illegal content, most of the proactive measures are contained in the Additional Measures consultation and are based on content detection and filtering technology of various kinds.

However, a closer look at the 73 non-content measures reveals that 50 of them are administrative, procedural or information provision: appointing an accountable individual, preparing various written documents, training, complaints and appeals procedures, publishing user support materials and so on. Whilst those are aspects of wider systems design, non-content measures addressed to features and functionality are of more immediate interest.

That leaves 23 non-content measures: 11 in the Illegal Content codes, all of which relate to children (in two cases only partially), and 12 in the Protection of Children codes.

Most of the 23 non-content measures concern technical functionality of the platform. The measures are limited (as required by the Act) to UK users and relate to:

Implementing an age-assurance process (ICU B1, PCU B1)
Use of highly effective age assurance (HEAA) (PCU B2 to B7) (Age assurance does of course indirectly affect the content available to users who are not verified as over-18, as the result of content-related measures predicated on age assurance.)
Safety defaults for child users concerning connection lists, account recommendations and direct messaging (ICU F1)
Removal of five kinds of functionality from child-user livestreams (ICU F3)
Options for user account blocking, disabling comments (for child users, or in some circumstances all registered users) (ICU J1, ICU J2)
Enabling children to give negative feedback on content recommender systems (PCU E3)
Providing information to children, when they restrict content or interactions with other accounts, as to the effect of doing so and further options available (PCU F2)
Options for user blocking and muting, disabling comments (users not determined to be adults by use of HEAA) (PCU J1, PCU J2)
Positive consent to group chat invitations (users not determined to be adults by use of HEAA) (PCU J3)

These examples illustrate that non-content measures are feasible, albeit some of those measures are, at least in part, precursors to content-related measures. Most obviously, age assurance underpins not only some of the non-content measures listed above, but also measures about content that should be hidden from under-18s.

Generally, it is striking how many of Ofcom’s non-content functionality measures are concerned with denying functionality to, or to interactions with, under-18s.

As to content-based measures, the Additional Measures consultation marks a decided shift towards automated content detection. Should these be welcomed as a version of safety by design, deprecated as systems for content moderation, or regarded as a means of addressing residual issues that cannot be designed out?

Safety by design or ex-post? OSAN’s cross-cutting response to Ofcom’s Additional Measures consultation takes issue with Ofcom’s description of some content-related measures, including proactive technology, as being ‘safety by design’:

“While some of the proposed measures - including automated content moderation (para 1.51) and livestreaming (p27) - are framed by Ofcom as being “safer by design”, these are primarily about ex-post mitigations for harmful content (reporting content, or relying on user action after harm has occurred) or introducing a form of safety tech (proactive tech measures) rather than embedding safe design at the level of systems and processes. There is still no understanding of what good service redesign should look like to ensure a more holistic orientation towards safety.” (emphasis added)

However, OSAN’s companion detailed response to the Additional Measures Consultation characterises Ofcom’s proactive technology proposals as safety by design:

“We broadly support the move towards requiring proactive technology as a safety-by-design approach to user safety”.

The detailed response (but not the cross-cutting response) would therefore seem to endorse the government’s view of safety by design.

OSAN also suggested that Ofcom’s principles-based proactive technology proposals could be extended to include intimate image abuse.

Recommender systems The Demos Digital submission endorsed Ofcom’s proposed content-specific approach to recommender systems:

“The Demos Digital team agrees with Ofcom’s proposal to exclude illegal content from recommender systems until the content has been reviewed by content moderation teams.”

After pointing out that “Automated content identification tools are known to struggle with reliability and bias”, Demos Digital then suggested improvements including:

“Because of these risks of inconsistency, Ofcom should provide specific guidance for platforms’ responsible use of automated content identification tools, including: transparency reporting; quality control standards for automated identification systems, including bias, reliability and accuracy; impact assessments for evaluating the automated systems; and model parameters for identifying illegal content. We believe this would alleviate some of the risks of automated content identification systems – such as inconsistencies, inaccuracies, and bias – which could result in the over-exclusion of legal content, or under-exclusion of illegal content.”

At the level of principle it is difficult to see how this reflects a systems-based approach, other than in the sense of systems for moderating content.

Parenthetically, even if a tendency to bias could be alleviated, there is still the insoluble problem that automated content identification tools do not have access to off-platform contextual information that can affect legality of the user content in question.

In its comments on recommender systems OSAN supports limitations on the reach of “content that is harmful in nature”, if accompanied by freedom of expression safeguards such as explanations of how the systems work in practice, and notification of creators when their content is affected so as to allow them to use complaints and appeals processes.

Live-streaming For live-streaming, OSAN has suggested some concrete ways in which Ofcom’s proposed Additional Measures could go further: building in a delay to livestreaming and turning off livestreaming by default for under-18s or under-16s. It describes these as safety by design measures:

“15. Ofcom’s proposals focus on responding to harm after it occurs and content moderation rather than preventing it in the first place. There is no requirement for live-feed delays, which are standard practice in traditional broadcasting, to prevent harmful or illegal content from being aired in real time. Safety-by-design means including proactive measures such as time-delay buffers and real-time risk assessment. There is plenty of guidance available to broadcasters on this topic.” (emphasis added)

However, it then describes them as ex-post measures:

“17. More broadly, we would recommend that Ofcom consider a greater array of ex-post features - e.g. borrowing from broadcasting good practice and building more delay into a live stream as a feature.” (emphasis added)

Is time delay an example of safety by design or an ex-post feature? The distinction would not necessarily matter much, were it not for the fact that a statutory definition of safety by design is proposed. But either way, although a time delay is of itself a non-content measure, its purpose is to enable the platform to make judgements about the content being live-streamed and (if thought necessary) to shut down the stream. OSA describes that as real-time risk assessment. In the context of the Act, those would have to be judgements about illegality or (for child-accessible streams) content harmful to children.

For children, OSAN contemplates a non-content-related measure: turning live-streaming off by default for children, whether under-16 or under-18. It also observes that “A strong understanding of safety-by-design would mean that where livestreaming cannot be delivered safely it shouldn’t be in place.”

Finally, OSAN cites Ofcom’s proposed limitation on livestream screen capture and recording for under-18s (part of ICU F3) as an example of friction.

Safety by design in context

As implementation of the Online Safety Act has progressed, it is perhaps not surprising if it has become less clear how safety by design should translate into concrete measures. The theory of online safety by design, founded on the notion of risk-creating features, was formulated in the context of a range of services and harms that differed greatly from those in scope of the Online Safety Act. The range of services within the Act is far broader and the kinds of harm are much more specific.

In July 2018 Woods and Perrin, working with Carnegie UK, proposed a:

“Virtuous circle of harm reduction on social media. Repeat this cycle in perpetuity or until behaviours have fundamentally changed and harm is designed out.” (Harm Reduction in Social Media, 17 July 2018 )

As to kinds of services, the proposal was aimed at around 10 social media companies each with over 1 million users. By January 2019, after discussion with various stakeholders, the authors had decided to extend the proposal to cover ‘social media and other internet platforms’ regardless of size. Now the Act covers an estimated 25,000 UK services (100,000 or more worldwide), 80% of which are micro-businesses (less than 10 employees).

On the face of it the underlying premise of the harm reduction cycle seems to be that what a user does on a platform is primarily the result of its design. However, the authors of the proposal say that their argument is not that we are 'pathetic dots' in the face of engineered determinism, but that the architecture of the platform nudges us towards certain behaviour (Woods and Perrin, Online harm reduction - a statutory duty of care and a regulator, April 2019.)

Even if it can be said that algorithmically driven social media platforms nudge us towards certain behaviour, how would that apply outside that specific milieu, for instance to plain vanilla discussion forums? And if, even on those large social media platforms, design only nudges rather than determines user behaviour, how far can harm really be designed out of the system?

As to kinds of harms, the safety by design theory is premised on platforms being risk creators. We always then have to ask, risk of what? In the context of the Online Safety Act that means connecting a given feature to a created or exacerbated risk of one of the specific kinds of criminality in scope of the Act, or of specific kinds of content harmful to children.

Within the context of the Act, the theory has never been easy to render into concrete expression:

If the idea is that a user’s decision to post, say, an illegal offer to ferry illegal immigrants across the Channel is down to the design of the platform, that seems implausible.

If the idea is that platform design can prevent such content being encountered, but without descending into content moderation and filtering, how is that to be done? Similarly if the concern is to prevent specific kinds of content being repeated or stimulated.

If it means that recommender algorithms could be designed in ways that lessen the likelihood of their disseminating illegal content, it would have to be explained how that can be achieved without trespassing into content filtering.

If the idea is that platform functionality can be designed to make it harder or slower to post, share or comment on user content generally, or to impose volume limits (a ‘circuit-breaker’), that would fit the theory. However, that kind of friction measure would necessarily strike against desirable and undesirable content alike, raising human rights proportionality issues.

If the idea is that some functionalities should be banned, that would fit a version of the theory that holds that some functionalities cannot be designed safely. But the more general purpose the functionality in question, the greater the impact on legitimate content and the greater the human rights challenge.

If the idea is that harm to children can be prevented by platform design which, for instance, reduces opportunities for adults to contact children, that would fit the theory.

If no connection can be found between a given technical or business model feature of a platform and a risk of a user deciding to behave illegally in a particular way, then the regulator will look somewhere other than those design features to counter illegality: to other design features or, failing that, to systems for moderation.

Professor Woods has suggested that designers should ask themselves: ‘What happens when the bad people get hold of this feature?’ (Introducing the Systems Approach and the Statutory Duty of Care, ibid.) However, that question could be asked of any general purpose functionality, risk-creating or not. On the face of it the question is about possible uses, not whether the feature in question creates or exacerbates a risk of a particular illegal or harmful use. It could be asked of the very act of providing a forum to which users can post. If we are not careful, we rapidly fall into the trap of characterising speech as a risk, not a fundamental right.

It is telling that Ofcom adopted that same approach in its statutory Risk Register: rather than attempt to identify functionalities that inherently create or exacerbate risk of illegality or content harmful to children, it sought to identify features that are used by malefactors as well as by law-abiding users: correlation rather than causation. That led it to list as risk factors general purpose functionality such as the ability to create hyperlinks.

If safety by design turns out to be a poor fit with much of the Online Safety Act, it should be acknowledged that the originators of the safety by design theory never wanted illegality to be the touchstone in the first place. Professor Woods said:

“These categories of harm should be identified by reference to their impact on the victim, not by reference to whether the speech might be considered illegal or not.” (Introducing the Systems Approach and the Statutory Duty of Care, ibid.)

That risks a leap from the frying pan (attributing risk of illegal behaviour to a platform feature) into the fire (pursuing nebulous and subjective kinds of harm). That aside, it would be no surprise if the theory turns out not to map easily on to the Act. It is one thing to say that, for instance, chasing ‘Likes’ trains users to produce ‘response-creating content’ (Introducing the Systems Approach and the Statutory Duty of Care, ibid). It is something else to show that a feature creates a risk of a user committing a specific criminal offence.

It may not be fanciful to think that something has got lost along the way from the 10 or so large social media platforms that the Carnegie UK authors had in mind for their original 2018 proposals, to the broad variety of 100,000 UK and overseas services in scope of the Online Safety Act. If, in essence, the theory was always really about large social media companies, their curation and engagement algorithms and their data-driven business models, it would not be a shock to find that it turns out to have little or no application beyond that.

For platforms where user agency is the predominant factor, and design decisions cannot realistically be regarded as likely to increase or decrease the likelihood of illegality or relevant content harm, logic would suggest that issues that cannot be designed out would most likely be at the forefront, not residual. A fruitless quest for specific illegality- or harm-inducing features could then easily result in a theoretical focus on systems and processes lapsing into systems for content moderation, thence to proactive content filtering technologies.

As to a statutory definition of safety by design, if systems for content moderation, including automated content filtering, are now to some extent embraced as an aspect of safety by design, it is difficult to see how a corresponding statutory definition could place meaningful limits on the kinds of concrete measures contemplated. It would also seem to have moved a very long way from the original conception of safety by design.

If the reality is that we do not have a clear idea of how safety by design is meant to translate into concrete regulatory measures within the context of the Act, that would not be a good starting point for crafting a statutory definition.

The alternative, of course, is that I have always had safety by design wrong and that Parliament knew exactly what it intended in Section 1. If so, mea culpa.

Cyberleagle

Monday, 23 February 2026

Safety by design or systems for content moderation?

No comments:

Post a Comment

Get new posts by email:

Find me

Top posts (30 days)

Find my book

Top Posts (All time)