Introduction

The following packages makes use of the SPARQL1 language queries in order to query the data available through the statistics.gov.scot portal2. In effect, the app works as a wrapper for SPARQL queries providing a suite of common functionalities relevant to how the Scottish public data is being utilises.

Workflow

The proposed workflow looks at obtaining life expectancy data for the Glasgow City. The workflow focuses on the following steps:

  1. Data Selection: Identification of the relevant data
  2. Geography Selection: Verifying name of geography for which we want to source data
  3. Data Sourcing: Querying statistics.gov.scot SPARQL endpoint
  4. Pre-processing and Analysis: Fixing data types and removing redundant columns

1. Data Selection

The package comes with an embedded data sets containing list of data and geographies available through the statistics.gov.scot endpoint. To see the list of available data sets refer to the data("available_data_sets") objects. The available_data_sets contains subject column that can be used to identify relevant data, as shown below.

data("available_data_sets")
subset.data.frame(x = available_data_sets, 
                  subset = grepl(pattern = "expectancy", x = subject))
Available Life Expectancy Data
dataset_value subject
119 http://statistics.gov.scot/data/healthy-life-expectancy healthy-life-expectancy
120 http://statistics.gov.scot/data/healthy-life-expectancy-deprived healthy-life-expectancy-deprived
155 http://statistics.gov.scot/data/life-expectancy life-expectancy

3. Data Sourcing

Most of the data sourcing work is done through get_geography_data() function. Following from the example below the function can be applied as follows

4. Pre-processing and Analysis

The pre-processing of the results already took place and was achieved by passing pre_process_results = TRUE to get_geography_data() function. The results can be also pre-processed independently by calling pre_process_data() on a returned object. The provided object is lends itself well for analysis.

Conclusion

For the sake of efficiency the objects data("available_data_sets") and data("standard_geography_code_register") are provided in the package. Those objects are not updated too frequently and providing them as statics data sets makes sense from the efficiency perspective. It is however possible to source live the mentioned objects. This is discussed in vignette("support-functions").


  1. Detailed information on the SPARQL is available through W3C pages.

  2. The implementation follows part of the Open Data Strategy implemented by the Scottish Government that can be accessed through the Scottish Government pages.