The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The value of research data not only resides in its content but in how it is made available to users. Research data is often presented interactively through a web application, the design of which is often the result of years of work by researchers. Therefore, preserving the data and the application's functionalities becomes equally important. However, preserving web applications, which are commonly...
Assigning global unique persistent identifiers (GUPIs) to datasets has the goal of improving their accessibility and simplifying how they are referenced and reused. However, as repositories receive more and complex data, attesting for the identity of datasets attached to persistent identifiers over time is becoming more challenging. This is due to the nature of scientific research data, which is generated...
A dataset from the field of High Performance Computing (HPC) was curated with the focus on facilitating its reuse and to appeal to a broader audience beyond HPC specialists. At an early stage in the research project, the curators gathered requirements from prospective users of the dataset, focusing on how and for which research projects they would reuse the data. Users needs informed which curation...
Data management entails a continuum of tasks to develop sustainable and reusable collections throughout their lifecycle. Large collections with complex data formats and structures may require what we define as "multitasking data management," involving a combination of manual and automated iterative tasks. When conducted in a desktop computing environment by curators, these tasks can be labor-intensive...
We present a case of archival analysis using a combination of data mining methods. The team of researchers, composed by archivists and computer scientists, used a collection of declassified Department of State Cables as a case study. The methods implemented included Support Vector Machine (SVM) and Association Rule Mining. Combined in an analysis workflow, the results of the different methods allowed...
At the forefront of big data in the Humanities, collections management can directly impact collections access and reuse. However, curators using traditional data management methods for tasks such as identifying redundant from relevant and related records, a small increase in data volume can significantly increase their workload. In this paper, we present preliminary work aimed at assisting curators...
To make decisions about the long-term preservation and access of large digital collections, archivists gather information such as the collections' contents, their organizational structure, and their file format composition. To date, the process of analyzing a collection — from data gathering to exploratory analysis and final conclusions — has largely been conducted using pen and paper methods. To...
Large document collections containing multiple topics can be overwhelming to understand, requiring librarians and archivists significant time and efforts to develop access points. Efficient computational methods can aid this process by uncovering groups of documents that can be described for access. We investigate the use of density based clustering with document segmentation to identify points of...
High-resolution display environments consisting of many individual displays arrayed to form a single visible surface are commonly used to present large scale data. Using these displays often involves a control paradigm where interactions become cumbersome and non-intuitive. By combining high-resolution displays with multi-touch and gesture interactive hardware, researchers can explore data more naturally,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.