Wasteback Machine: a method for quantitative measurement of the archived web
DOI:
https://doi.org/10.47989/ir31iConf64185Keywords:
Wayback machine, Internet archives, Sustainable web design, Sustainability, Climate changeAbstract
Introduction. Web archives are traditionally viewed as repositories of cultural memory, yet they have been theorised as computational sources for quantitative, longitudinal analysis of the web. This paper examines their potential for mapping the structural and environmental impacts of web pages, demonstrating broader applicability for web analytics research.
Method. We introduce Wasteback Machine, an open-source, extensible framework that operationalises the analytical potential of web archives. It enables reproducible, scalable measurement of page size and composition through programmatic access, structured resource extraction and mechanisms to mitigate distortions introduced during archiving and replay.
Analysis. The method is demonstrated through a case study of the United Nations Climate Change (UNFCCC) homepage, performing longitudinal analyses to capture temporal dynamics in size and compositional evolution. By situating web content within socio-technical and infrastructural contexts, the approach allows consistent comparison over time while accounting for archival limitations.
Results. Findings reveal trends in page growth, complexity and cumulative digital resource use. Despite their fragmentary nature, web archives provide sufficient fidelity to reconstruct historical practices and estimate relative environmental impacts.
Conclusion. Wasteback Machine demonstrates that web archives function as computational infrastructures, enabling rigorous, evidence-based investigation of web evolution and the environmental footprint of digital content.
References
Agata, T., Miyata, Y., Ishita, E., Ikeuchi, A., & Ueda, S. (2014). Life span of web pages: A survey of 10 million pages collected in 2001. IEEE/ACM Joint Conference on Digital Libraries, 463–464. https://doi.org/10.1109/JCDL.2014.6970226
Ainsworth, S. G., Nelson, M. L., & Sompel, H. V. de. (2014). A Framework for Evaluation of Composite Memento Temporal Coherence (arXiv:1402.0928). arXiv. https://doi.org/10.48550/arXiv.1402.0928
Ainsworth, S. G., Nelson, M. L., & Van De Sompel, H. (2015). Only One Out of Five Archived Web Pages Existed as Presented. Proceedings of the 26th ACM Conference on Hypertext & Social Media - HT ’15, 257–266. https://doi.org/10.1145/2700171.2791044
Björneborn, L., & Ingwersen, P. (2004). Toward a basic framework for webometrics. Journal of the American Society for Information Science and Technology, 55(14), 1216–1227. https://doi.org/10.1002/asi.20077
Brügger, N. (2017). The Web as History. UCL Press. https://doi.org/10.14324/111.9781911307563
Brügger, N., & Milligan, I. (2019). The SAGE Handbook of Web History. SAGE Publications Ltd. https://doi.org/10.4135/9781526470546
Dawson, A. (2023). The Carbon Impact of Web Standards. https://web.archive.org/web/20250912172751/https://websitesustainability.com/cache/files/research23.pdf
EcoTree. (n.d.). How much CO2 does a tree absorb? Let’s get carbon curious! EcoTree. Retrieved 24 April 2025, from https://web.archive.org/web/20250805095905/https://ecotree.green/en/how-much-co2-does-a-tree-absorb
Giampietro, M., & Mayumi, K. (2018). Unraveling the Complexity of the Jevons Paradox: The Link Between Innovation, Efficiency, and Sustainability. Frontiers in Energy Research, 6, 26. https://doi.org/10.3389/fenrg.2018.00026
GSMA. (2024). The Mobile Economy 2024. GSMA. https://web.archive.org/web/20250906132723/https://www.gsma.com/solutions-and-impact/connectivity-for-good/mobile-economy/wp-content/uploads/2024/02/260224-The-Mobile-Economy-2024.pdf
HTTP Archive. (n.d.). HTTP Archive: Page Weight. Retrieved 4 November 2025, from https://httparchive.org/reports/page-weight?start=earliest&end=latest&view=list
Hu, T.-H. (2015). A Prehistory of the Cloud. The MIT Press. https://doi.org/10.7551/mitpress/9780262029513.001.0001
Internet Archive. (2025). Help: Using the Wayback Machine. In Wikipedia. https://web.archive.org/web/20250915082515/https://en.wikipedia.org/w/index.php?title=Help:Using_the_Wayback_Machine&oldid=1275603720
Internet Archive. (2013). Wayback CDX Server API - BETA — Internet Archive Developer Portal. https://archive.org/developers/wayback-cdx-server.html
Internet Archive Blogs. (2025, November 21). Celebrating 1 Trillion Web Pages Archived | Internet Archive Blogs. https://blog.archive.org/trillion/
ISO. (n.d.). BS ISO/IEC 21031:2024. ISO. Retrieved 23 September 2025, from https://www.iso.org/standard/86612.html
ITU. (2025). Greening Digital Companies 2025: Monitoring emissions and climate commitments. https://www.itu.int:443/en/ITU-D/Environment/Pages/Publications/GDC-25.aspx
ITU. (2024, December 11). Statistics: Individuals using the Internet. https://web.archive.org/web/20250912085250/https://www.itu.int/en/ITU-D/Statistics/pages/stat/default.aspx
Jevons, W. S., & Flux, A. W. (1965). The Coal Question; an inquiry concerning the progress of the Nation, and the probable exhaustion of our coal-mines. New York, A. M. Kelley. http://archive.org/details/coalquestionani00jevogoog
Lahiri Choudhury, D. K. (2010). Telegraphic imperialism: crisis and panic in the Indian Empire, c.1830. Palgrave Macmillan.
Masanés, J. (2006). Web Archiving. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-46332-0
McGovern, G. (2020). World Wide Waste: how digital is killing our planet - and what we can do about it (First edition). Silver Beach Publishing.
Mightybytes, & Wholegrain Digital. (n.d.). Sustainable Web Design. Sustainable Web Design. Retrieved 12 September 2025, from https://web.archive.org/web/20250912091249/https://sustainablewebdesign.org/
Nyce, C. M. (2024, January 15). The Internet Is Being Ruined by Bloated Junk. The Atlantic. https://web.archive.org/web/20250406081749/https://www.theatlantic.com/technology/archive/2024/01/long-youtube-videos-tiktok/677130/
Scott, J. (2020, November 19). Flash Animations Live Forever at the Internet Archive | Internet Archive Blogs. https://blog.archive.org/2020/11/19/flash-animations-live-forever-at-the-internet-archive/
Sompel, H. V. de, Nelson, M. L., Sanderson, R., Balakireva, L. L., Ainsworth, S., & Shankar, H. (2009). Memento: Time Travel for the Web (arXiv:0911.1112). arXiv. https://doi.org/10.48550/arXiv.0911.1112
Sompel, H. V. de, Nelson, M., & Sanderson, R. (2013). HTTP Framework for Time-Based Access to Resource States – Memento (Request for Comments RFC 7089). Internet Engineering Task Force. https://doi.org/10.17487/RFC7089
Turing, A. M. (1950). I.—Computing Machinery and Intelligence. Mind, LIX(236), 433–460. https://doi.org/10.1093/mind/LIX.236.433
UNFCCC (Ed.). (2016). Report of the Conference of the Parties on its 21st session, held in Paris from 30 November to 13 December 2015: addendum. UN. https://digitallibrary.un.org/record/831052
United Nations. (2015). Transforming our world: the 2030 Agenda for Sustainable Development. UN. https://digitallibrary.un.org/record/1654217
United Nations Trade and Development. (2024). Digital Economy Report 2024: Shaping an Environmentally Sustainable and Inclusive Digital Future. United Nations. https://doi.org/10.18356/9789213589779
W3C. (n.d.). Web Sustainability Guidelines (WSG). Retrieved 8 December 2025, from https://www.w3.org/TR/web-sustainability-guidelines/#set-goals-based-on-performance-and-energy-impact
WBCSD, & WRI. (2013). GHG Protocol Technical Guidance for Calculating Scope 3 Emissions. World Resources Institute & World Business Council for Sustainable Development. https://web.archive.org/web/20250906182341/https://ghgprotocol.org/sites/default/files/standards/Scope3_Calculation_Guidance_0.pdf
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 David Mahoney

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
