Chapter 2 Introduction

An observatory is a location used for observing terrestrial or celestial events. The European Commission and the Council of Europe are supporting numerous data observatories to support research and development and evidence-based policymaking. We are creating automated observatories following the best practices of reproducible research.

  • A private observatory is a data integration system, which automatically collects external information, processes it and professionally joins it with internal data resources. We offer this to business, scientific research, think tank and NGO and journalism partners.

  • A collaborative observatory is a data integration system that has a map to collaborating institutions’s data resources, and is able to exploit their synergies by automatically combining their data, triggered by an authorization of all involved parties. We offer this to Consortia of various entities. Daniel’s CEEMID project developed since 2014 is a good example of a collaborative observatory.

  • A public observatory is a collaborative observatory that intends to make at least some of its data assets available as open data. ode chunk to prevent printing of the R code that generated the plot. We offer this for the European Union and its Consortia. See some observatories »

2.1 Evidence-based, Open Policy Analysis

In the last two decades, governments and researchers have placed a growing emphasis on the value of evidence-based policy. However, while the evidence generated through research to inform policy has become more rigorous and transparent, policy analysis — the process of contextualizing evidence to inform specific policy decisions remains opaque.

We believe that a modern data observatory must improve how evidence is created and used in policy reports, and pass on the efficiency gains from increasing reproducibility and automation, therefore we pledge that the music.dataobservatory.eu will comply with the Open Policy Analysis standards developed by the Berkeley Initiative for Transparency in the Social Sciences & Center for Effective Global Action. These standards are applied by the World Bank.

Reproducible research is a scientific concept that can be applied to a wide range of professional designations, for example, reproducible finance in the investment process or reproducible impact assessment in policy consulting. Based on the computational reproducibility we believe that the following principles should be followed.

  • Reviewability means that our application’s results are can be assessed and judged by our user’s experts, or experts they trust. We help reviewability with a full transparency: we publish the software code that created the indicators, our methodology, and an automatically refreshing statistical description of the indicator each day when it receives new data or corrections from the original source.

  • Reproducibility means that we are providing data products and tools that allow the exact duplication of our results during assessments. This makes sure that all logical steps can be verified. Reproducibility ensures that there is no lock-in to our applications. You can always chose a different data and software vendor, or compare our results with them.

  • Confirmability means that using our applications findings leads to the same professional results as other available software and information. Our data products use the open-source statistical programming language R. We provide details about our algorithms and methodology to confirm our results in SPSS or Stata or sometimes even in Excel.

  • Auditability means that our data and software is archived in a way that external auditors can later review, reproduce and confirm our findings. This is a stricter form of data retention that most organizations apply, because we do not only archive results step-by-step but all computational steps – as if your colleagues would not only save every step in Excel but also their keystrokes. While auditability is a requirement in accounting, but we are extending this approach to all the quantitative work of a professional organization in an advisory or consulting capacity.

  • Reviewable findings: The descriptions of the methods can be independently assessed, and the results judged credible. In our view, this is a fundamental requirement for all professional applications. CEEMID’s music data is used to settle royalty disputes in judicial procedures, or in grant and policy design. We believe that the future European Music Observatory should aim at the same bar, making its data & research products open for challenges in the publicity of science, courts, and professional peers.

  • Replicable findings: We are presenting our findings and provide tools so that our users or auditors or external authorities can duplicate our results.

  • Confirmable findings: The main conclusions of the research can be obtained independently without our software, because we describe in detail the algorithms and methodology in supplementary materials. We believe that other organizations, analysts, statisticians must come to the same findings with their own methods and software. This avoids lock-in and allows independent cross-examination.

  • Auditable findings: Sufficient records (including data and software) have been archived so that the research can be defended later if necessary or differences between independent confirmations resolved. The archive might be private, as with traditional laboratory notebooks. See Open collaboration with academia, auditors, and industry.

These computational requirements require a data workflow that relies on further principles.

  • Record retention: all aspects of reproducibility require a high level of standardized documentation. The standardization of documentation requires the use of standardized metadata, metadata structures, taxonomies, vocabularies.

  • Best available information / data universe: the quality of the findings, their confirmation and auditing success will improve with better data and facts used.

  • Data validations: The quality of the findings will greatly depend on the factual inputs. While the reproducible findings may have many problems, inputting erroneous data or faulty information will likely lead to wrong conclusions, and in all cases will make confirmation and auditing impossible. Especially when organizations use large and heterogeneous data sources, even small errors, such as erroneous currency translations or accidental misuse of decimals, units can cause results that will not pass confirmation or auditing.

  • Indicators that were used with all known royalty valuation methods (PwC 2008), for both author’s and neighbouring rights, and fulfil the IFRS fair value standards, incorporated in EU law and the recent EU jurisprudence (InfoCuria 2014, 2017).

  • Indicators that can be used for calculating damages, or calculating the value of the value gap (Daniel Antal 2019a, 2019c).

  • Indicators that quantify the development needs of musicians, and can set objective granting aims and grant evaluations (Antal 2015).

  • Intelligent, AI-based applications, including machine learning, to predict the best scheduling or likely audience.

  • Understanding how music is taxed, how music contributes to the local and national GDP, and how music creates jobs directly, indirectly and with induced effects (Daniel Antal 2019b).

  • Providing detailed comparison of the differences of music audience among countries.

  • Measuring exporting success on streaming platforms, and preparing better targeting tools.

2.2 History

dataobservatory.eu grew out of a collaborative observatory, CEEMID, and from its open-source, open data-based automation technology. CEEMID is aiming to transfer thousands of indicators and a verifiable, open-source software that creates them to the European Music Observatory to give Europe-wide access timely, reliable, actionable statistics and indicators for the music industry, policymakers and music professionals. (Read more about our data coverage)

Over 6 years, CEEMID became a logical starting point of the planned European Music Observatory, because it is a pan-European music data integration system based on open data, open-source software using best statistics, data science and AI practices. CEEMID has created thousands of high-value, hard music industry indicators using open data sources, industry data sources, surveys and various APIs to relevant other data sources. (Read more about our data coverage and our pan-European geographical coverage.) In this pilot project, we are aiming to transfer 50-250 indicators out of the more than 2000 indicators of CEEMID to a Pilot Music Observatory.

We believe that this could a very logical continuation of the work of CEEMID, which came to existence in less data rich countries of the EU with the same purpose in 2014. Our work was also put on stage on as a good example of evidence-based policy making on CCS Ecosystems: FLIPPING THE ODDS Conference – a two-day high-level stakeholder event jointly organized by Geothe-Institute and the DG Education, Youth, Sport and Culture of the European Commission with the Creative FLIP project. (See a brief summary of the presentation and our use case, the reproducible research document Central & Eastern European Music Industry Report 2020.)

Our start-up (codename: Satellite Report, but our true name will be revealed in a few days after the registration of the company) and CEEMID are inviting former CEEMID partners and other interested parties to build a system of Creative Observatories. While we hope to keep serving their individual needs, as CEEMID has been serving many creative organizations in the last 7 years, we believe that creating “data republics” among non-competing creative organizations can create much value for all of them.

We would like to form a consortium to build an Exploratory Music Observatory and offer it to the European Commission as a foundation of the European Music Observatory. We hope that once a tender will be called for the European Music Observatory, we will be in a pole position to win that tender with our concept. Even if we do not win the tender, we believe that our Exploratory Music Observatory will be used in the final European Music Observatory, because we will offer the best value for money.

2.3 European Music Observatory

We believe that the European Music Observatory must rely on open-source statistical software written in the R statistical language like CEEMID, and it must be funded on the principle of open collaboration with the industry, public authorities and academia.

In our vision the European Music Observatory should be based on open data, open-source software in open collaboration with the industry, statisticians and academia, using best statistics, data science and AI practices. It uses many data sources about the audience, the creators of music, music works and recordings, its circulation globally and its economy. CEEMID has created thousands of high-value, hard music industry indicators via integrate using open data sources, industry data sources, surveys and various APIs to relevant other data sources.

It uses many data sources about the audience, the creators of music, music works and recordings, its circulation globally and its economy. CEEMID has created thousands of high-value, hard music industry indicators via integration using open data sources, industry data sources, surveys and various APIs to relevant other data sources.

CEEMID is aiming to transfer thousands of indicators that are reproducible and verifiable, together with the open-source statistical software that creates them to the European Music Observatory to give Europe-wide aces timely, reliable, actionable statistics and indicators for the music industry, policymakers and music professionals.

2.3.1 Data Integration Principle

Instead of creating expensive and unproven new data assets, we believe that the European Music Observatory should rely on proven industry data assets, and it should put efforts into making the existing data well-documented, validated, and easy to build upon in a statistically ‘tidy’ format that allows quick automated data joins.

We believe that more insights can be gained from joining existing, known, proven data assets than increasing the size of new ones. For example, both CISAC and IFPI help the author’s and neighbouring rights’ societies with data assets to fulfil their obligations to their members and regulators, bearing in mind the often restrictive conditions set in the jurisprudence (InfoCuria 2013). However, the increased activity of licensees and competition authorities have significantly increased the burden of proof required to justify collectively managed royalties and private copying compensation (InfoCuria 2014, 2017). Collective management organizations must be able to professionally join data from each other, and about the market demand and macroeconomic conditions of the entire Single market with many currencies and reporting standards. CEEMID is providing them with hundreds of indicators that comply with this jurisprudence, and automate correct currency, unit conversion, data processing and other tasks that most CMOs do not have data science competences.

  • We would like to introduce the achievements of CEEMID in integrating numerous data sources of the European music industry, i.e. building programmatic interfaces based on a thorough understanding of data and accompanying metadata into more advanced data products and services.

  • We would like to address owners and managers of known, high-quality data resources to provide at least a minimal, valuable sample to the Demo Music Observatory sand elaborate on the conditions of providing more data for the future European Music Observatory.

2.3.2 Open Data

CEEMID was based on the regulatory framework provided by the Directive on open data and the re-use of public sector information provides a common legal framework for a European market for government-held data (public sector information). It is a regulatory framework that is built around two key pillars of the internal market: transparency and fair competition. In our view, these principles should apply to European Music Observatory, too. (See more details in the plans of the Demo Music Observatory open data data sources.)

Open data is not free data. It is usually collected for a public mandate, such as measuring the inflation level or the social conditions in the European countries, and it is processed for the public purpose. The raw data is valuable for scientific or business uses, but requires significant investment into data processing.

The following two indicators are use cases of our eurobarometer software package that allows us to create statistical indicators from the usually unpublished, unused questionnaire data of Eurobarometer.

Our more general retorharmonize and more specific eurobarometer software package allows us to create statistical indicators from the usually unpublished, unused questionnaire data of Eurobarometer – and even create comparative indicators with with Latin-America, Africa and the Arab world.

We placed two examples in 3.2.2.1 Ownership of CD players and 3.2.2.1 Ownership of smartphones as examples of the use of questionnaires not processed for the purposes of a music observatory with open data and our open source software.

2.3.3 Open Source Code

Open-source software (OSS) is a type of computer software in which source code is released under a license in which the copyright holder grants users the rights to study, change, and distribute the software to anyone and for any purpose. Open-source software is often developed in a collaborative public manner, and which is a prominent example of open collaboration.

Generally, the use of open source software, including the open source R language and its software packages or libraries in the national statistical offices are encouraged by four important considerations:

  1. lower cost,

  2. higher level of security,

  3. no vendor ‘lock in’ and

  4. better data quality; as all data manipulations can be revised by expert statisticians and programmers.

We believe that the number of data scientists in the music domain is so few that only an open collaboration can guarantee adequate data quality.1.

We believe that the European Music Observatory must rely on open-source statistical software written in the R statistical language, which is rather generally used for the making of official statistics (Templ and Todorov 2016), too, like CEEMID, and it must be funded on the principle of open collaboration with the industry, public authorities and academia.

We believe that most of the software code producing the indicators of the future European Music Observatory must be made public. However, open source software has many licensing standards with various advantages and disadvantages. For example, certain licenses are more favourable for fostering commercial music tech development, others are better at integrating more scientific results.

  • Parallel to the creation of the Demo Music Observatory, we will make available for peer review some critical elements of the source code that produced our indicators.

  • Find an appropriate, long-term sustainable licensing policy for software developed for the Demo Music Observatory and the future European Music Observatory, to find a proper balance between validation transparency, fostering music tech innovation, and keeping costs manageable.

2.3.4 Indicator Design

The Demo Music Observatory will follow the guidelines of Eurostat Towards a harmonised methodology for statistical indicators series (Eurostat 2014, 2014; Kotzeva et al. 2017) to create high-quality, validated indicators that receive appropriate feedback from users, i.e. music businesses, their trade associations and policy-makers.

Because music is often a very local business, artists often have a local or regional fan base, and they are helped by local policies, we will show how our rich data assets can be produced on regional and city level following the best practices and guidelines set out by Eurostat and OECD (Münnich et al. 2019).

References

Antal, Daniel. 2019b. “Správa o slovenskom hudobnom priemysle.” https://doi.org/10.17605/OSF.IO/V3BE9.

Antal, Daniel. 2019c. “The Competition of Unlicensed, Licensed and Illegal Uses on the Markets of Music and Audiovisual Works [A szabad felhasználások, a jogosított tartalmak és az illegális felhasználások versenye a zenék és audiovizuális alkotások hazai piacán].” Artisjus - not public.

Antal, Dániel. 2015. “Javaslatok a Cseh Tamás Program pályázatainak fejlesztésére. A magyar könnyűzene tartós jogdíjnövelésének lehetőségei. [Proposals for the Development of the Cseh Tamas Program Grants. The Possibilities of Long-Term Royalty Growth in Hungarian Popular Music].” manuscript.

Eurostat. 2014. Towards a Harmonised Methodology for Statistical Indicators — Part 1: Indicator Typologies and Terminologies. 2014th ed. Vol. 1. Towards a Harmonised Methodology for Statistical Indicators 1. Luxembourg: Publications Office of the European Union. https://ec.europa.eu/eurostat/web/products-manuals-and-guidelines/-/KS-GQ-14-011.

InfoCuria. 2014. “OSA – Ochranný svaz autorský pro práva k dílům hudebním o.s. v Léčebné lázně Mariánské Lázně a.s. Case C‑351/12.” http://curia.europa.eu/juris/document/document.jsf?text=&docid=150055&pageIndex=0&doclang=en&mode=lst&dir=&occ=first&part=1&cid=1996526.

InfoCuria. 2017. “Autortiesību un komunicēšanās konsultāciju aģentūra /Latvijas Autoru apvienība v Konkurences padome.” http://curia.europa.eu/juris/liste.jsf?language=en&num=C-177/16.

InfoCuria. 2013. “T-442/08 CISAC V Commission.” http://curia.europa.eu/juris/liste.jsf?num=T-442/08&language=EN.

Kotzeva, Mariana, Anton Steurer, Nicola Massarelli, and Mariana Popova, eds. 2017. Towards a Harmonised Methodology for Statistical Indicators — Part 2: Communicating Through Indicators. 2017th ed. Vol. 2. Towards a Harmonised Methodology for Statistical Indicators 1. Luxembourg: Publications Office of the European Union. https://ec.europa.eu/eurostat/web/products-manuals-and-guidelines/-/KS-GQ-17-001.

Münnich, Ralf, Juan Pablo Burgard, Florian Ertz, Simon Lenau, Julia Manecke, and Harolf Merkle. 2019. Small Area Estimation for City Statistics and Other Functional Geographies — 2019 Edition. 2019th ed. Luxembourg: Publications Office of the European Union. https://ec.europa.eu/eurostat/web/products-statistical-working-papers/-/KS-TC-19-006.

PwC. 2008. “Valuing the Use of Recorded Music.” IFPI PricewaterhouseCoopers. http://www.ifpi.org/content/library/valuing_the_use_of_recorded_music.pdf.

Templ, Matthias, and Valentin Todorov. 2016. “The Software Environment R for Official Statistics and Survey Methodology.” Austrian Journal of Statistics 45 (March): 97–124. https://doi.org/doi:10.17713/ajs.v45i1.100.


  1. In our CEEMID documentation you find more information about the [R statistical language](https://documentation.ceemid.eu/index.php?title=R_(programming_language)↩︎