The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
"Conference proceedings front matter may contain various advertisements, welcome messages, committee or program information, and other miscellaneous conference information. This may in some cases also include the cover art, table of contents, copyright statements, title-page or half title-pages, blank pages, venue maps or other general information relating to the conference that was part of the...
In October 1991 the National Science Foundation (NSF) sponsored a workshop to examine the role of the Information Retrieval research community in the emerging environment of Internet, high performance text processing capabilities and ever-increasing volumes of digitized documents. Ed Fox, Michael Lesk and Michael McGill drafted a White Paper, calling for a National Electronic Science, Engineering,...
In 2016, Digital Public Library of America is celebrating the third year of its cultural heritage metadata aggregator service. Since its launch, the DPLA collection has grown to represent over 13 million objects and over 1900 institutions, from small historical societies to large research libraries. With onramps, or hubs, in over 20 states, DPLA is well on its way to complete the coverage map by the...
Museum libraries came late to the digitization party — primarily because of perceived copyright issues. Since 2010 the three libraries of the New York Art Resources Consortium (NYARC) have embarked on a series of niche, boutique digitization projects, pushing the boundaries of fair use, but they have also embraced the born-digital, establishing a program to capture art-history-rich websites and to...
Bias in the retrieval of documents can directly influence the information access of a digital library. In the worst case, systematic favoritism for a certain type of document can render other parts of the collection invisible to users. This potential bias can be evaluated by measuring the retrievability for all documents in a collection. Previous evaluations have been performed on TREC collections...
Wikipedia is the result of a collaborative effort aiming to represent human knowledge and to make it accessible for everyone. As such it contains lots of contemporary as well as history-related information. This research looks into historical data available in Wikipedia to explore its various time-related characteristics. In particular, we study Wikipedia articles on historical persons. Our analysis...
As Wikipedia became the largest human knowledge repository, quality measurement of its articles received a lot of attention during the last decade. Most research efforts focused on classification of Wikipedia articles quality by using a different feature set. However, so far, no “golden feature set” was proposed. In this paper, we present a novel approach for classifying Wikipedia articles by analysing...
While off-the-shelf OCR systems work well on many modern documents, the heterogeneity of early prints provides a significant challenge. To achieve good recognition quality, existing software must be “trained” specifically to each particular corpus. This is a tedious process that involves significant user effort. In this paper we demonstrate a system that generically replaces a common part of the training...
The HathiTrust Research Center (HTRC) is engaged in the development of tools that will give scholars the ability to analyze the HathiTrust digital library's 14 million volume corpus. A cornerstone of the HTRC's digital infrastructure is the workset — a kind of scholar-built research collection intended for use with the HTRC's analytics platform. Because more than 66% of the digital corpus is subject...
A dataset from the field of High Performance Computing (HPC) was curated with the focus on facilitating its reuse and to appeal to a broader audience beyond HPC specialists. At an early stage in the research project, the curators gathered requirements from prospective users of the dataset, focusing on how and for which research projects they would reuse the data. Users needs informed which curation...
We present the results of an experiment which indicate that automated alignment of electronic learning objects to educational standards may be more feasible than previously implied. We highlight some important deficiencies in existing alignment systems and formulate suggestions for improved future ones. We consider how the changing substance of newer educational standards, a multi-faceted view of...
This paper focuses on the follow-up actions triggered by college students' mobile searches, which involved 30 participants conducting an uncontrolled experiment in fifteen days. We collected the mobile phone usage data by an app called AWARE, and combined with structured diary and interviews to perform a quantitative and qualitative study. The results showed that, there were three categories of follow-up...
The Memento protocol provides a uniform approach to query individual web archives. Soon after its emergence, Memento Aggregator infrastructure was introduced that supports querying across multiple archives simultaneously. An Aggregator generates a response by issuing the respective Memento request against each of the distributed archives it covers. As the number of archives grows, it becomes increasingly...
The Web has been around and maturing for 25 years. The popular websites of today have undergone vast changes during this period, with a few being there almost since the beginning and many new ones becoming popular over the years. This makes it worthwhile to take a look at how these sites have evolved and what they might tell us about the future of the Web. We therefore embarked on a longitudinal study...
Web archives are a valuable resource for researchers of various disciplines. However, to use them as a scholarly source, researchers require a tool that provides efficient access to Web archive data for extraction and derivation of smaller datasets. Besides efficient access we identify five other objectives based on practical researcher needs such as ease of use, extensibility and reusability. Towards...
Most existing digital libraries use traditional lexically-based retrieval techniques. For established systems, completely replacing, or even making significant changes to the document retrieval mechanism (document analysis, indexing strategy, query processing and query interface) would require major technological effort, and would most likely be disruptive. In this paper, we describe ways to use the...
Web archiving initiatives around the world capture ephemeral web content to preserve our collective digital memory. In this paper, we describe initial experiences in providing an exploratory search interface to web archives for humanities scholars and social scientists. We describe our initial implementation and discuss our findings in terms of desiderata for such a system. It is clear that the standard...
Any preservation effort must begin with an assessment of what content to preserve, and web archiving is no different. There have historically been two answers to the question “what should we archive?” The Internet Archive's broad entire-web crawls have been supplemented by narrower domain-or topic-specific collections gathered by numerous libraries. We can characterize this as content selection and...
Academics have relied heavily on search engines to identify and locate research manuscripts that are related to their research areas. Many of the early information retrieval systems and technologies were developed while catering for librarians to help them sift through books and proceedings, followed by recent online academic search engines such as Google Scholar and Microsoft Academic Search. In...
Data-driven approaches have become increasingly popular as a means for analyzing transaction logs from web search engines and digital libraries, for example using cluster analysis to identify common patterns of search and navigation behavior. However, steps must be taken to ensure that results are reliable and repeatable. Although clustering patterns of user interaction behavior has been previously...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.