Who is asking? Humans and machines experience a different scholarly web

Abstract

Libraries and archives are motivated to capture and archive scholarly resources on the web. However, the dynamic nature of the web in addition to frequent changes at the end of scholarly publishing platforms have crawling engineers continuously update their archiving framework. In this paper we report on our comparative study to investigate how scholarly publishers respond to common HTTP requests that resemble typical behavior of both machines such as web crawlers and humans. Our findings confirm that the scholarly web responds differently to machine behavior on the one hand and human behavior on the other. This work aims to inform crawling engineers and archivists tasked to capture the scholarly web of these differences and help guide them to use appropriate tools.

Details

Creators
Harihar Shankar; Lyudmila Balakireva; Martin Klein
Institutions
Date
Keywords
Publication Type
paper
License
CC BY 4.0 International
Download
445685 bytes

View This Publication