The manager of Arquivo.pt, Daniel Gomes, reveals some of the steps taken to accomplish the mission of "preserving the information published on the Web for scientific and academic purposes".
Arquivo.pt makes it possible to search and access web pages that have been archived since 1996. The work carried out by this infrastructure is managed by the FCCN Unit of the Foundation for Science and Technology (FCT), focusing on the preservation of information published on the Web for scientific and academic purposes.
This connection to the world of research is also illustrated by the presence of Arquivo.pt in the Registry of Research Data Repositoriesand is used by international researchers as a source of open data. For this reason, the Arquivo.pt has been developing several activities to identify online data related to Research & Development (R&D) projects, in order to preserve them in a systematic way.
One of the key ways to achieve this goal is the digital preservation of R&D project websites. These web pages are increasingly used to provide important scientific information that complements the published literature (e.g. datasets or documentation, software, etc.).
However, online information regarding R&D projects has not been exhaustively documented. The information regarding the addresses of the websites of the projects funded in the 7th Framework Programme (FP7), for example, made available through European Union Open Data Portal (EU Open Data Portal) is missing for 92% of the projects. In this sense, the Arquivo.pt has already automatically identified and preserved more than 52 million files (7 TB) originating from 53 993 sites of R&D projects funded by the European Union since FP4 (1994).
This is also a priority as far as Portuguese research projects are concerned - in total, 600,721 files (72 GB), collected from 7,956 sites related to projects funded by the Foundation for Science and Technology, have been preserved.
Other forms of preservation
Since 2020, online information regarding projects funded by FCT is documented in the progress and final reports. The goal is that this information will be systematically preserved. Arquivo.pt has carried out special collections aimed at preserving national scientific information available online cited from open access scientific publications (RCAAP) and scientific curricula (Science Vitae).
On the other hand, the Arquivo.pt Memorial service has preserved websites of events, projects or scientific portals that are no longer updated, such as Degois.pt. The websites of Research and Development units are periodically collected for preservation. These activities mainly aim at maintaining the validity of scientific references for online resources in peer-reviewed publications and academic CVs.
Training for preservation is another strategic path, within the scope of this mission. In this sense, Arquivo.pt has been providing a training program that prepares trainees to publish open data online (so that they can be preserved), preserve data from their online research sources (and self-preserve the derived scientific results that are published online), search access and reuse historical data from the web and automatically process large volumes of historical data preserved on the web (through Application Programming Interfaces - APIs).
Preservation and innovation
Likewise, Arquivo.pt has contributed to the production of datasets and software in open access. All the software that supports the Arquivo.pt service and the research experiments carried out is available through a GitHub account. Thus, Arquivo.pt provides valuable open data for research, such as historical records of collections, temporal research by text and image (unique in the world) and data preserved since 1996 through proactive web harvesting and integration of historical collections.
Finally, it is important to highlight the role of the Arquivo.pt Prize - an award that, since 2017, distinguishes works that use the open data preserved by Arquivo.pt. Throughout its three editions, this award has already supported about a dozen innovative projects with diversified scopes and objects: applications, platforms, browser extensions or academic work or scientific research are some examples of the different applications of the data preserved by Arquivo.pt. As a condition of the regulation, these works are made available in open access.