Although marketing researchers are increasingly using web data, the idiosyncratic and sometimes insidious challenges in its collection have received limited attention. How can researchers ensure that the datasets generated via web scraping and APIs are valid?
This Journal of Marketing study proposes a novel framework that highlights how addressing validity concerns requires the joint consideration of idiosyncratic technical and legal/ethical questions. The framework covers the broad spectrum of validity concerns arising from the automatic collection of web data for academic use along the three stages of collecting web data: selecting data sources, designing the data collection, and extracting the data.
Interested researchers can access the database developed for this review on the authors’ companion website at https://web-scraping.org/. This website also features additional useful resources and tutorials for collecting web data via web scraping and APIs.
Featured speakers: Johannes Boegershausen (Erasmus University), Hannes Datta (Tilburg University), and Abhishek Borah (INSEAD)
Full Journal of Marketing article: https://doi.org/10.1177/00222429221100750
Read the Scholarly Insight for this study here.