With the explosion in the volume of data collected and generated by organizations, many organizations have realized the greater importance of certain types of data. Among the critical data is master data.

Reference data, also known as master data for companies, or high value dataset (HDV) for the public sector, are trusted data shared in an ecosystem.

Centralized in a single repository, they support the current activity of the economy of a company, a community or a country. Their main characteristics are: stability over time, reliability, uniqueness, quality and freshness. Easily accessible, they allow interaction with other information and are part of the organization's heritage.

In this article, we will review the different typologies of reference data related to the enterprise and the public sector.

 

The reference data of the companies

The reference data for companies is the essential information carried by customers, suppliers, partners, employees, products... Their management in a central repository provides a unique view and authority in the proper functioning of the company.

In an environment of volume, speed and variety, they bring coherence and rationalization of information in communicating information systems. They also strengthen partner and supplier relationships where business software is able to interact and collaborate with each other.

They facilitate value creation in their reuse including mixing with other datasets in real time. This interoperability around reference data facilitates the volumetric exchange of data flows between applications via APIs - Application Programming Interfaces or web services, two data exchange protocols.

 

MDM the reference data tool

Thus, Master Data Management (MDM) is a priority for companies to ensure the homogeneity of information distribution and sharing.

MDM also refers to the set of governance processes that enable the construction of a consolidated repository of reference data with integrity and quality. In their construction, these essential data are cleaned, deduplicated, enriched, and then regularly updated in order to be one step ahead of the competition.

These reference data also cross business processes such as third party data - customer or supplier - around the national legal identifier, the VAT number, the company name, the address of the head office, the telephone, the APE code, the legal form, the intra-community VAT number...

Thus, by definition, reference data must be consistent, complete, up-to-date, of good quality and correct, in order to achieve a performance objective. For example: deliver the right product to the right customer, in the right quantity, at the right price, at the right place and with the right invoice, within a short time.

 

Reference data - a competitive issue

Centralized and shared within the organization, they are a key issue for the competitiveness of companies and promote their agility. Without this trusted data, the consequences for the company are significant risks in day-to-day management and decision making. This environment requires companies to have up-to-date, relevant, traceable and historic reference data in order to be able to cross-reference and share it across different business applications, both internally and externally. The objective is to be able to predict behaviors or anticipate the design of new offers, with the aim of satisfying all stakeholders, customers and shareholders.

Initializing a master data repository is the first major step in good data governance.

The question to ask yourself in this identification is: what are the essential, most critical and most relevant data for my company? Because the reference data are not the same for all companies. They are specific according to the size, the market, the sector of activity where the company evolves.

 

European public reference data or "high value dataset

Directive 2019/1024 PSI - Public Sector Information - on open data and its re-use, was adopted on June 20, 2019. It encourages EU member states to make certain public sector "high value data sets" available as open data.

"Open" means free in reuse with minimal legal restrictions in a machine-readable format, via APIs for mass downloads.

The 6 thematic categories of high-value datasets defined by the European Directive are:

  • Geospatial (e.g., road network, river configuration, elevation and landform representation, etc.)
  • Earth and environmental observation (improving environmental management, supporting the understanding and mitigation of climate change effects...)
  • Meteorological (air temperature, partial pressure of water vapor in the air, wind speed, global solar radiation, rainfall ...)
  • Statistics (population, trade and services, agriculture, fishing...)
  • On businesses and business properties
  • On mobility (static: stops, timetables, fares, accessibility for the disabled... Dynamic: real-time timetables, information on disruptions...)

This provision of reference data, combined with other data sets, is vital to generate new innovative services, create opportunities, realize the full potential, and improve transparency and dynamism in the economy.

 

French state reference games

Within the framework of the directive, the French government has identified and made available to redistributors nine fundamental reference data sets to facilitate their reuse. These datasets have a strong economic and social impact for which availability and quality are critical in their use and redistribution.

Thus, the Sirene database and the National Address Database (BAN) are part of the reference data managed by the public data service. This state service was created by Article 14 of the law for a digital Republic and is managed by the Etalab mission. It aims to:

  • To make data available, in order to facilitate their re-use; each producer of public service data must publish its commitments on the conditions of availability through a documentation, subject to frequent updates
  • Simplify administrative procedures for the French by systematizing the sharing of data between administrations, thus avoiding repeated requests for the same document
  • Improve the effectiveness of public policies by strengthening data-driven management
  • Stimulate innovation and the creation of new services of general and private interest

 

What conclusion?

In the digital era, the governance of reference data is a major strategic issue, both for companies and public organizations, in a context of internationalization. This private or public data is used in the core of the information system, as well as for the peripheral applications of the organizations.

They constitute a unique reference to be linked and interact with other data. By their uniqueness, they create value and enrich transactional data.

Finally, a good management of reference data improves the actions deployed by each department of the organization, in a context where many regulations and laws require more transparency and justification.