Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autoren

  • Pablo De Pedraza
  • Stefano Visintin
  • Kea Tijdens
  • Gábor Kismihók

Externe Organisationen

  • Universiteit van Amsterdam (UvA)
  • Technische Informationsbibliothek (TIB) Leibniz-Informationszentrum Technik und Naturwissenschaften und Universitätsbibliothek
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
FachzeitschriftIZA Journal of Labor Economics
Jahrgang8
Ausgabenummer1
PublikationsstatusVeröffentlicht - 13 Sept. 2019
Extern publiziertJa

Abstract

This paper studies the relationship between a vacancy population obtained from web crawling and vacancies in the economy inferred by a National Statistics Office (NSO) using a traditional method. We compare the time series properties of samples obtained between 2007 and 2014 by Statistics Netherlands and by a web scraping company. We find that the web and NSO vacancy data present similar time series properties, suggesting that both time series are generated by the same underlying phenomenon: the real number of new vacancies in the economy. We conclude that, in our case study, web-sourced data are able to capture aggregate economic activity in the labor market.

ASJC Scopus Sachgebiete

Zitieren

Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data. / De Pedraza, Pablo; Visintin, Stefano; Tijdens, Kea et al.
in: IZA Journal of Labor Economics, Jahrgang 8, Nr. 1, 13.09.2019.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

De Pedraza, P, Visintin, S, Tijdens, K & Kismihók, G 2019, 'Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data', IZA Journal of Labor Economics, Jg. 8, Nr. 1. https://doi.org/10.2478/izajole-2019-0004
De Pedraza, P., Visintin, S., Tijdens, K., & Kismihók, G. (2019). Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data. IZA Journal of Labor Economics, 8(1). https://doi.org/10.2478/izajole-2019-0004
De Pedraza P, Visintin S, Tijdens K, Kismihók G. Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data. IZA Journal of Labor Economics. 2019 Sep 13;8(1). doi: 10.2478/izajole-2019-0004
De Pedraza, Pablo ; Visintin, Stefano ; Tijdens, Kea et al. / Survey vs Scraped Data : Comparing Time Series Properties of Web and Survey Vacancy Data. in: IZA Journal of Labor Economics. 2019 ; Jahrgang 8, Nr. 1.
Download
@article{9d0b48a088c04861badb496fefb6061c,
title = "Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data",
abstract = "This paper studies the relationship between a vacancy population obtained from web crawling and vacancies in the economy inferred by a National Statistics Office (NSO) using a traditional method. We compare the time series properties of samples obtained between 2007 and 2014 by Statistics Netherlands and by a web scraping company. We find that the web and NSO vacancy data present similar time series properties, suggesting that both time series are generated by the same underlying phenomenon: the real number of new vacancies in the economy. We conclude that, in our case study, web-sourced data are able to capture aggregate economic activity in the labor market.",
keywords = "data collection, Labor demand, statistical inference, time series, vacancies, web crawling",
author = "{De Pedraza}, Pablo and Stefano Visintin and Kea Tijdens and G{\'a}bor Kismih{\'o}k",
note = "Funding The authors acknowledge the financial contribution of the SERISS project (H2020 No: 654221). ",
year = "2019",
month = sep,
day = "13",
doi = "10.2478/izajole-2019-0004",
language = "English",
volume = "8",
number = "1",

}

Download

TY - JOUR

T1 - Survey vs Scraped Data

T2 - Comparing Time Series Properties of Web and Survey Vacancy Data

AU - De Pedraza, Pablo

AU - Visintin, Stefano

AU - Tijdens, Kea

AU - Kismihók, Gábor

N1 - Funding The authors acknowledge the financial contribution of the SERISS project (H2020 No: 654221).

PY - 2019/9/13

Y1 - 2019/9/13

N2 - This paper studies the relationship between a vacancy population obtained from web crawling and vacancies in the economy inferred by a National Statistics Office (NSO) using a traditional method. We compare the time series properties of samples obtained between 2007 and 2014 by Statistics Netherlands and by a web scraping company. We find that the web and NSO vacancy data present similar time series properties, suggesting that both time series are generated by the same underlying phenomenon: the real number of new vacancies in the economy. We conclude that, in our case study, web-sourced data are able to capture aggregate economic activity in the labor market.

AB - This paper studies the relationship between a vacancy population obtained from web crawling and vacancies in the economy inferred by a National Statistics Office (NSO) using a traditional method. We compare the time series properties of samples obtained between 2007 and 2014 by Statistics Netherlands and by a web scraping company. We find that the web and NSO vacancy data present similar time series properties, suggesting that both time series are generated by the same underlying phenomenon: the real number of new vacancies in the economy. We conclude that, in our case study, web-sourced data are able to capture aggregate economic activity in the labor market.

KW - data collection

KW - Labor demand

KW - statistical inference

KW - time series

KW - vacancies

KW - web crawling

UR - http://www.scopus.com/inward/record.url?scp=85072704016&partnerID=8YFLogxK

U2 - 10.2478/izajole-2019-0004

DO - 10.2478/izajole-2019-0004

M3 - Article

AN - SCOPUS:85072704016

VL - 8

JO - IZA Journal of Labor Economics

JF - IZA Journal of Labor Economics

IS - 1

ER -