Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Pablo De Pedraza
  • Stefano Visintin
  • Kea Tijdens
  • Gábor Kismihók

External Research Organisations

  • University of Amsterdam
  • German National Library of Science and Technology (TIB)
View graph of relations

Details

Original languageEnglish
JournalIZA Journal of Labor Economics
Volume8
Issue number1
Publication statusPublished - 13 Sept 2019
Externally publishedYes

Abstract

This paper studies the relationship between a vacancy population obtained from web crawling and vacancies in the economy inferred by a National Statistics Office (NSO) using a traditional method. We compare the time series properties of samples obtained between 2007 and 2014 by Statistics Netherlands and by a web scraping company. We find that the web and NSO vacancy data present similar time series properties, suggesting that both time series are generated by the same underlying phenomenon: the real number of new vacancies in the economy. We conclude that, in our case study, web-sourced data are able to capture aggregate economic activity in the labor market.

Keywords

    data collection, Labor demand, statistical inference, time series, vacancies, web crawling

ASJC Scopus subject areas

Cite this

Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data. / De Pedraza, Pablo; Visintin, Stefano; Tijdens, Kea et al.
In: IZA Journal of Labor Economics, Vol. 8, No. 1, 13.09.2019.

Research output: Contribution to journalArticleResearchpeer review

De Pedraza, P, Visintin, S, Tijdens, K & Kismihók, G 2019, 'Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data', IZA Journal of Labor Economics, vol. 8, no. 1. https://doi.org/10.2478/izajole-2019-0004
De Pedraza, P., Visintin, S., Tijdens, K., & Kismihók, G. (2019). Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data. IZA Journal of Labor Economics, 8(1). https://doi.org/10.2478/izajole-2019-0004
De Pedraza P, Visintin S, Tijdens K, Kismihók G. Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data. IZA Journal of Labor Economics. 2019 Sept 13;8(1). doi: 10.2478/izajole-2019-0004
De Pedraza, Pablo ; Visintin, Stefano ; Tijdens, Kea et al. / Survey vs Scraped Data : Comparing Time Series Properties of Web and Survey Vacancy Data. In: IZA Journal of Labor Economics. 2019 ; Vol. 8, No. 1.
Download
@article{9d0b48a088c04861badb496fefb6061c,
title = "Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data",
abstract = "This paper studies the relationship between a vacancy population obtained from web crawling and vacancies in the economy inferred by a National Statistics Office (NSO) using a traditional method. We compare the time series properties of samples obtained between 2007 and 2014 by Statistics Netherlands and by a web scraping company. We find that the web and NSO vacancy data present similar time series properties, suggesting that both time series are generated by the same underlying phenomenon: the real number of new vacancies in the economy. We conclude that, in our case study, web-sourced data are able to capture aggregate economic activity in the labor market.",
keywords = "data collection, Labor demand, statistical inference, time series, vacancies, web crawling",
author = "{De Pedraza}, Pablo and Stefano Visintin and Kea Tijdens and G{\'a}bor Kismih{\'o}k",
note = "Funding The authors acknowledge the financial contribution of the SERISS project (H2020 No: 654221). ",
year = "2019",
month = sep,
day = "13",
doi = "10.2478/izajole-2019-0004",
language = "English",
volume = "8",
number = "1",

}

Download

TY - JOUR

T1 - Survey vs Scraped Data

T2 - Comparing Time Series Properties of Web and Survey Vacancy Data

AU - De Pedraza, Pablo

AU - Visintin, Stefano

AU - Tijdens, Kea

AU - Kismihók, Gábor

N1 - Funding The authors acknowledge the financial contribution of the SERISS project (H2020 No: 654221).

PY - 2019/9/13

Y1 - 2019/9/13

N2 - This paper studies the relationship between a vacancy population obtained from web crawling and vacancies in the economy inferred by a National Statistics Office (NSO) using a traditional method. We compare the time series properties of samples obtained between 2007 and 2014 by Statistics Netherlands and by a web scraping company. We find that the web and NSO vacancy data present similar time series properties, suggesting that both time series are generated by the same underlying phenomenon: the real number of new vacancies in the economy. We conclude that, in our case study, web-sourced data are able to capture aggregate economic activity in the labor market.

AB - This paper studies the relationship between a vacancy population obtained from web crawling and vacancies in the economy inferred by a National Statistics Office (NSO) using a traditional method. We compare the time series properties of samples obtained between 2007 and 2014 by Statistics Netherlands and by a web scraping company. We find that the web and NSO vacancy data present similar time series properties, suggesting that both time series are generated by the same underlying phenomenon: the real number of new vacancies in the economy. We conclude that, in our case study, web-sourced data are able to capture aggregate economic activity in the labor market.

KW - data collection

KW - Labor demand

KW - statistical inference

KW - time series

KW - vacancies

KW - web crawling

UR - http://www.scopus.com/inward/record.url?scp=85072704016&partnerID=8YFLogxK

U2 - 10.2478/izajole-2019-0004

DO - 10.2478/izajole-2019-0004

M3 - Article

AN - SCOPUS:85072704016

VL - 8

JO - IZA Journal of Labor Economics

JF - IZA Journal of Labor Economics

IS - 1

ER -