Wasteback Machine: a method for quantitative measurement of the archived web

Authors

DOI:

https://doi.org/10.47989/ir31iConf64185

Keywords:

Wayback machine, Internet archives, Sustainable web design, Sustainability, Climate change

Abstract

Introduction. Web archives are traditionally viewed as repositories of cultural memory, yet they have been theorised as computational sources for quantitative, longitudinal analysis of the web. This paper examines their potential for mapping the structural and environmental impacts of web pages, demonstrating broader applicability for web analytics research.

Method. We introduce Wasteback Machine, an open-source, extensible framework that operationalises the analytical potential of web archives. It enables reproducible, scalable measurement of page size and composition through programmatic access, structured resource extraction and mechanisms to mitigate distortions introduced during archiving and replay.

Analysis. The method is demonstrated through a case study of the United Nations Climate Change (UNFCCC) homepage, performing longitudinal analyses to capture temporal dynamics in size and compositional evolution. By situating web content within socio-technical and infrastructural contexts, the approach allows consistent comparison over time while accounting for archival limitations.

Results. Findings reveal trends in page growth, complexity and cumulative digital resource use. Despite their fragmentary nature, web archives provide sufficient fidelity to reconstruct historical practices and estimate relative environmental impacts.

Conclusion. Wasteback Machine demonstrates that web archives function as computational infrastructures, enabling rigorous, evidence-based investigation of web evolution and the environmental footprint of digital content.

References

Agata, T., Miyata, Y., Ishita, E., Ikeuchi, A., & Ueda, S. (2014). Life span of web pages: A survey of 10 million pages collected in 2001. IEEE/ACM Joint Conference on Digital Libraries, 463–464. https://doi.org/10.1109/JCDL.2014.6970226

Ainsworth, S. G., Nelson, M. L., & Sompel, H. V. de. (2014). A Framework for Evaluation of Composite Memento Temporal Coherence (arXiv:1402.0928). arXiv. https://doi.org/10.48550/arXiv.1402.0928

Ainsworth, S. G., Nelson, M. L., & Van De Sompel, H. (2015). Only One Out of Five Archived Web Pages Existed as Presented. Proceedings of the 26th ACM Conference on Hypertext & Social Media - HT ’15, 257–266. https://doi.org/10.1145/2700171.2791044

Björneborn, L., & Ingwersen, P. (2004). Toward a basic framework for webometrics. Journal of the American Society for Information Science and Technology, 55(14), 1216–1227. https://doi.org/10.1002/asi.20077

Brügger, N. (2017). The Web as History. UCL Press. https://doi.org/10.14324/111.9781911307563

Brügger, N., & Milligan, I. (2019). The SAGE Handbook of Web History. SAGE Publications Ltd. https://doi.org/10.4135/9781526470546

Dawson, A. (2023). The Carbon Impact of Web Standards. https://web.archive.org/web/20250912172751/https://websitesustainability.com/cache/files/research23.pdf

EcoTree. (n.d.). How much CO2 does a tree absorb? Let’s get carbon curious! EcoTree. Retrieved 24 April 2025, from https://web.archive.org/web/20250805095905/https://ecotree.green/en/how-much-co2-does-a-tree-absorb

Giampietro, M., & Mayumi, K. (2018). Unraveling the Complexity of the Jevons Paradox: The Link Between Innovation, Efficiency, and Sustainability. Frontiers in Energy Research, 6, 26. https://doi.org/10.3389/fenrg.2018.00026

GSMA. (2024). The Mobile Economy 2024. GSMA. https://web.archive.org/web/20250906132723/https://www.gsma.com/solutions-and-impact/connectivity-for-good/mobile-economy/wp-content/uploads/2024/02/260224-The-Mobile-Economy-2024.pdf

HTTP Archive. (n.d.). HTTP Archive: Page Weight. Retrieved 4 November 2025, from https://httparchive.org/reports/page-weight?start=earliest&end=latest&view=list

Hu, T.-H. (2015). A Prehistory of the Cloud. The MIT Press. https://doi.org/10.7551/mitpress/9780262029513.001.0001

Internet Archive. (2025). Help: Using the Wayback Machine. In Wikipedia. https://web.archive.org/web/20250915082515/https://en.wikipedia.org/w/index.php?title=Help:Using_the_Wayback_Machine&oldid=1275603720

Internet Archive. (2013). Wayback CDX Server API - BETA — Internet Archive Developer Portal. https://archive.org/developers/wayback-cdx-server.html

Internet Archive Blogs. (2025, November 21). Celebrating 1 Trillion Web Pages Archived | Internet Archive Blogs. https://blog.archive.org/trillion/

ISO. (n.d.). BS ISO/IEC 21031:2024. ISO. Retrieved 23 September 2025, from https://www.iso.org/standard/86612.html

ITU. (2025). Greening Digital Companies 2025: Monitoring emissions and climate commitments. https://www.itu.int:443/en/ITU-D/Environment/Pages/Publications/GDC-25.aspx

ITU. (2024, December 11). Statistics: Individuals using the Internet. https://web.archive.org/web/20250912085250/https://www.itu.int/en/ITU-D/Statistics/pages/stat/default.aspx

Jevons, W. S., & Flux, A. W. (1965). The Coal Question; an inquiry concerning the progress of the Nation, and the probable exhaustion of our coal-mines. New York, A. M. Kelley. http://archive.org/details/coalquestionani00jevogoog

Lahiri Choudhury, D. K. (2010). Telegraphic imperialism: crisis and panic in the Indian Empire, c.1830. Palgrave Macmillan.

Masanés, J. (2006). Web Archiving. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-46332-0

McGovern, G. (2020). World Wide Waste: how digital is killing our planet - and what we can do about it (First edition). Silver Beach Publishing.

Mightybytes, & Wholegrain Digital. (n.d.). Sustainable Web Design. Sustainable Web Design. Retrieved 12 September 2025, from https://web.archive.org/web/20250912091249/https://sustainablewebdesign.org/

Nyce, C. M. (2024, January 15). The Internet Is Being Ruined by Bloated Junk. The Atlantic. https://web.archive.org/web/20250406081749/https://www.theatlantic.com/technology/archive/2024/01/long-youtube-videos-tiktok/677130/

Scott, J. (2020, November 19). Flash Animations Live Forever at the Internet Archive | Internet Archive Blogs. https://blog.archive.org/2020/11/19/flash-animations-live-forever-at-the-internet-archive/

Sompel, H. V. de, Nelson, M. L., Sanderson, R., Balakireva, L. L., Ainsworth, S., & Shankar, H. (2009). Memento: Time Travel for the Web (arXiv:0911.1112). arXiv. https://doi.org/10.48550/arXiv.0911.1112

Sompel, H. V. de, Nelson, M., & Sanderson, R. (2013). HTTP Framework for Time-Based Access to Resource States – Memento (Request for Comments RFC 7089). Internet Engineering Task Force. https://doi.org/10.17487/RFC7089

Turing, A. M. (1950). I.—Computing Machinery and Intelligence. Mind, LIX(236), 433–460. https://doi.org/10.1093/mind/LIX.236.433

UNFCCC (Ed.). (2016). Report of the Conference of the Parties on its 21st session, held in Paris from 30 November to 13 December 2015: addendum. UN. https://digitallibrary.un.org/record/831052

United Nations. (2015). Transforming our world: the 2030 Agenda for Sustainable Development. UN. https://digitallibrary.un.org/record/1654217

United Nations Trade and Development. (2024). Digital Economy Report 2024: Shaping an Environmentally Sustainable and Inclusive Digital Future. United Nations. https://doi.org/10.18356/9789213589779

W3C. (n.d.). Web Sustainability Guidelines (WSG). Retrieved 8 December 2025, from https://www.w3.org/TR/web-sustainability-guidelines/#set-goals-based-on-performance-and-energy-impact

WBCSD, & WRI. (2013). GHG Protocol Technical Guidance for Calculating Scope 3 Emissions. World Resources Institute & World Business Council for Sustainable Development. https://web.archive.org/web/20250906182341/https://ghgprotocol.org/sites/default/files/standards/Scope3_Calculation_Guidance_0.pdf

Downloads

Published

2026-03-20

How to Cite

Mahoney, D. (2026). Wasteback Machine: a method for quantitative measurement of the archived web. Information Research an International Electronic Journal, 31(iConf), 448–464. https://doi.org/10.47989/ir31iConf64185

Issue

Section

Conference proceedings

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.