Presumed Open Data (POD)
|Funding mechanism||Network Innovation Allowance (NIA)|
|Duration||Jan 2020 - Jun 2021|
|Research area||Customer and Stakeholder Focus|
The project has two objectives:
1. Maximise the visibility of data. The data hub will make data discoverable, searchable, and provide visibility through standardised metadata.
2. Maximise the value of data. The data hub will make understandable by employing common structures and interfaces.
There is a large amount of useful data which is published about DNOs through mandatory reports, innovation trials and consumer tools. However, datasets are often published on standalone webpages with limited descriptions. This makes it very difficult for both incumbents and innovators to discover, search and understand datasets.
Data should be easy to find and accompanied by the information needed to understand their content. Reducing barriers to access data will attract innovators who can create operational efficiencies, develop new business models and define new value propositions. Increasing the speed at which new markets can be developed may improve and attract investment in the energy system due to the improved ability to understand the risks and opportunities up front.
Issues can occur across privacy, security, consumer impact and commercial domains but these can be mitigated using anonymisation, aggregation, redaction or introduction of noise. If issues cannot be resolved through the above techniques, it may be appropriate to limit rights (public data) or limit access (shared data), such that key parties can safely utilise data to create value whilst protecting consumers.
The project will use WPD data as a worked example of how the recommendations of the Energy Data Task Force Digitalised Energy System Report could be implemented by a DNO in the following stages:
1. Data Discovery and Classification
- Identify data that is currently being collected and held by us.
- Review available metadata standards and apply the most appropriate one to the data catalogue.
2. Use Case Development
An iterative process whereby our data users and the Energy Systems Catapult (ESC) will develop compelling use cases for the data identified in stage 1. These use cases will then be taken to two or three half-day consultation workshops that will be open to third-parties (e.g. The ENA Data Group, Councils, Community Groups, Academics, I/C generation/demand site developers, suppliers and energy service companies) and WPD data owners/users. The workshops aims are to:
- provide an overview of the project;
- further develop the established use cases so they optimally benefit third-parties; and
- establish the extent that preliminary processing and context is required to make data accessible for parties with lesser resource for such activity.
The use cases will be assessed against a range of criteria (Net Zero impact, customer goals, innovation impact, etc.) to prioritise use cases. This will in turn be used to rank the importance of our data and drive the creation of a data value assessment.
3. Data Openness Assessment & Processing
Review the sensitivity issues related to each data set (such as consumer privacy, negative consumer impact, security, and commercials) and develop a generic methodology that can be used by all LNOs to classify the openness of each datasets in accordance with the Open Data Institute’s Data Spectrum:
- Open: Data is made available for all to use, modify and distribute with no restrictions
- Public: Data is made publicly available but with some restrictions on usage
- Shared: Data is made available to a limited group of participants possibly with some restrictions on usage
- Closed: Data is only available within a single organisation.
The datasets required to facilitate the use cases identified in stage two go through:
- Data quality upscaling to rectify issues with incomplete datasets that could render them unfit for use by third-parties.
- Openness upscaling to see if minor changes (aggregation, redaction, or adding noise) can be made to datasets to make them less sensitive.
- Preliminary processing identified it stage 2 to promote third-party accessibility.
If data required to facilitate use cases is not currently collected by WPD the method and format in which it would be collected would go through the same review process.
4. Hub Development
Development of recommendations for a public-facing data hub to be hosted online where:
· all data is stored in one central location;
· appropriate means of access (registration/verification of identity) are required for datasets than can be considered Public or Shared;
· data can be easily downloaded upon necessary verification; and
· stakeholders can register to be notified when new data sets are published.
The hub development will look to exploit techniques that are already available regarding the automation of data correction, meta-tagging, and the flagging of data issues when uploading new data.
5. Data Science Challenge
This will include publicly launching the proactive data publication roadmap, launching the data hub implementation phase. In addition, a specific data use case will be selected and launched as a data science challenge to drive engagement with our data and deliver immediate value.