Skip to Content Skip to Footer
Fields of Gold: Scraping Web Data for Marketing Insights

Fields of Gold: Scraping Web Data for Marketing Insights

Johannes Boegershausen, Hannes Datta and Abhishek Borah

JM Insights in the Classroom

Teaching Insight:

While marketing researchers increasingly employ web data, the idiosyncratic and sometimes insidious challenges in its collection have received limited attention. How can researchers ensure that the datasets generated via web scraping and APIs are valid? A new article in the Journal of Marketing proposes a methodological framework that highlights how addressing validity concerns requires the joint consideration of idiosyncratic technical and legal/ethical questions. The framework covers the broad spectrum of validity concerns arising from the automatic collection of web data for academic use along the three stages of collecting web data: selecting data sources, designing the data collection, and extracting the data.

Access Classroom Lecture Slides

Related Marketing Courses:
Digital Marketing; Marketing Analytics; Marketing Research; Social Media Marketing

Full Citation:
Boegershausen, Johannes, Hannes Datta, Abhishek Borah, and Andrew T. Stephen (2022), ” Fields of Gold: Scraping Web Data for Marketing Insights,” Journal of Marketing, 86 (5), 1–20. doi:10.1177/00222429221100750

Abstract:
Marketing scholars increasingly use web scraping and Application Programming Interfaces (APIs) to collect data from the internet. Yet, despite the widespread use of such web data, the idiosyncratic and sometimes insidious challenges in its collection have received limited attention. How can researchers ensure that the datasets generated via web scraping and APIs are valid? While existing resources emphasize technical details of extracting web data, the authors propose a novel methodological framework focused on enhancing its validity. In particular, the framework highlights how addressing validity concerns requires the joint consideration of idiosyncratic technical and legal/ethical questions along the three stages of collecting web data: selecting data sources, designing the data collection, and extracting the data. The authors further review more than 300 articles using web data published in the top five marketing journals and offer a typology of how web data has advanced marketing thought. The article concludes with directions for future research to identify promising web data sources and embrace novel approaches for using web data to capture and describe evolving marketplace realities.

Advertisement

Special thanks to Demi Oba, Ph.D. candidate at Duke University, for support in working with authors on submissions to this program.

Search other Insights in the Classroom​

Read a managerial summary of this paper

More from the Journal of Marketing​​​​​​​

Johannes Boegershausen is Assistant Professor of Marketing, Erasmus University, The Netherlands.

Hannes Datta is Associate Professor of Marketing, Tilburg University, The Netherlands.

Abhishek Borah is Assistant Professor of Marketing, University of Washington.