The Chaos of Retractions
Retractions are often in name only, and not realized consistently or predictably online
In the print era, retractions might have been more effective than they are today.
Because print archives were so inaccessible, those that were used most often — something I found out through market research — were often personal and carefully curated, and not pushed into the public sphere in the ways they are now. In this scenario, a retraction probably landed where intended, and stuck. That is, when an article was retracted, the community would know it, personal archives might be purged or marked, citation patterns would adapt, and the event would shift perceptions and awareness in those in the field, which is the goal.
In the Digital Age, retracting an article doesn’t work the same, if it works at all, as we’ll discuss. In fact, it can even backfire with contrarian humans, and the weaknesses of current approaches make the era of more centralized information processing quite problematic.
A recent paper explored the prevalence of articles on Sci-Hub which have been retracted but which do not carry any retraction notice on the versions hosted by this pirate site.
The authors found that 84.83% of retracted articles available via Sci-Hub do not mention their retraction status. More worrisome is that the number of retracted articles marked as “retracted” in Sci-Hub has been declining over the years, while the overall total number of retracted articles and the number unlabeled have been increasing.
Sci-Hub is getting worse at marking retracted articles.
But this is really just the tip of a long-growing and dirty iceberg.
In 2012, Phil Davis published an analysis finding numerous sources for persistent versions of or links to retracted articles which would not contain any notice of a retraction decision:
- Institutional repositories
- Mendeley
- PMC
- Commercial web sites
- Advocacy web sites
- Educational web sites
Davis wrote, “. . . decentralized access to scientific articles may come with the cost of promoting incorrect, invalid, or untrustworthy science.”
During the peak of the Covid-19 pandemic — and to this day, from what I can tell — Covid conspiratorialists have weaponized online archives to cite retracted studies in ways that make it difficult-to-impossible to know a paper was retracted, especially for their audiences, who are unlikely to question the assertions or seek our retraction notices. As the authors of one study of the phenomenon argued in 2020 that:
. . . archived web resources from the Internet Archive’s Wayback Machine and subsequent screenshots contribute to the COVID-19 “misinfodemic” in platforms.
They also found that bad faith human actors and bad faith bots were archiving pages selectively to provide a basis for downstream misinformation assertions, and that using IA and screenshots of archived content allowed retracted content to circulate longer on social media platforms, because such approaches stymie automated moderation approaches.
And the hits keep on coming. A study of retractions in the public health literature published just last year showed that out of 2,841 records of retracted publications:
. . . less than half indicated that the article had been retracted. Less than 5% of publications were identified as retracted through all resources through which they were available. Within single resources, if and how retracted publications were identified varied. Retraction notices were frequently incomplete, with no notices meeting all the criteria.
Scratching the surface of linking to retracted articles provides evidence of further chaos:
- Even links within lists of retracted articles on Retraction Watch link to versions that show no sign that a retraction has occurred
- Item #23 on their list of retracted Covid-19 articles links to a version of the listed article on the Internet Archive (IA) where there is no indication a retraction has occurred
- This makes it clear that the IA holds copies of articles not reflecting retraction decisions
- Item #26 has an actual link (hover to see), which resolves to the journal’s home page, and if you search and find the retraction notice, the reference to the retracted article also links with the same link and resolves to the home page.
- Since the retracted article is not surfaced by the search engine, this makes it very difficult to see the retracted article
- You can find it via URL on the IA, and sans retraction notice
- Item #23 on their list of retracted Covid-19 articles links to a version of the listed article on the Internet Archive (IA) where there is no indication a retraction has occurred
- I didn’t go through every link on this list at Retraction Watch, as these examples popped up immediately with just a little sampling of the links.
Perhaps the most vexing aspect of retractions relative to conspiracies is that retracting an article can itself become fodder for conspiracy theorists — “they” don’t want you to believe this.
- A way to recognize a conspiracy theory is when everything is always about the conspiracy theory, with no other explanation possible
- Or is that just what “they” want you to think?
But now we get to the modern technological problem of unmarked versions of retracted papers in LLMs and other AI-leaning technologies. With a decentralized approach to the scientific literature, a centralized approach to AI and LLM development, and an incentive for AI companies to avoid licensing and expert labor costs around content ingestion, a perfect storm may be gurgling in the guts of these vaunted AI tools, which may one day barf up retracted science at just the wrong time.
Where do we go from here? It’s hard to say, but what I’ll call “retraction infrastructure” is woefully lacking, and very print-centric. There is no real consistency to how retractions are handled by publishers on the technological front. Some publishers put up deflector URLs to steer users away. Some leave the materials in the open, but heavily watermarked — maybe not the HTML, where the retraction notices are often too subtle, but PDFs usually get painted. And while the Retraction Watch database is a good step, there seems to be the seed of an idea for something more robust there.
Retracting an article was never a perfect solution, but technologies, decentralized storage systems, pirate sites, and mirrored instances have all made it retraction far too chaotic to be a reliable check on science we think needs to be retracted, for whatever reason.