Chapter 1 Demo Music Observatory

This is the demonstration of a modern data observatory, a permanent observation point for social and economic data. It is a proof of concept for the best practices of research automation. (If you are not familiar with Europe’s data observatories, take a look on a few of them in our annex >>)

Currently the website is updated about every day. Each time a refreshment is made, the entire website in html, the downloadable data catalogue in pdf and epub ebook is re-created with new data tables, latest citation information and visualizations. Unfortunately, during these daily updates the goes blank for a few minutes — we are working on a solution to avoid this inconvenience and schedule updates for late night hours (in Europe.)

  • A demonstration and proof of concept that a modern, European data observatory can be in large part automated, and adhere to the highest standards of statistical disclosure, reproducible research and open policy analysis (see 2.1 Evidence-based, Open Policy Analysis).

  • This observatory is going to be fully automatic by 10 September 2020. This website is building itself automatically, and over the next weeks, it will be filled with data that is self-refreshing daily from its source, being tested, validated, and made downloadable in high quality.

  • Applying the highest standards of open collaboration and open policy analysis, we make the entire source code of our data creation open source, and subject it around 10 September 2020 for peer-review as statistical software. Gradually the entire code for creating this mini-observatories indicators will be published under Our critical software components, such as for creating historical sub-national (provincial, regional) statistical comparisons, to create gross value added, employment and tax multipliers for all EU countries, or our tools to harmonize survey data with European or global surveys is not only open-source, but either peer-reviewed, or is under review.

  • We are presenting some exclusive indicators that were compiled from pan-European statistical questionnaires, originally not intended for the music industry (see Ownership of CD players and Ownership of smartphones as examples).

  • We will present some indicators where we have no publishing rights in visualization, but most of the indicators will be available in automatically, daily refreshing tables, with daily refreshing bibliographical citation files in this website for all four pillars of the planned European Music Observatory.

  • Furthermore, their authoritative copy will be refreshed upon each change on figshare, where currently IVIR stores all its research data. Each table will receive a doi identifier, and whenever new data comes in (automatically), a new version will be sent automatically to figshare, and a new version doi will be retrieved. This also means that our demo observatory will automatically generate and update a few dozen low-level, but still valid scientific data publications.

  • CEEMID, originally starting out from Central Europe (see more in Appendix), has grew over 7 years into a truly pan-European music and creative industry observatory, with creating about 2000 hard indicators on how music is produces, sold, priced, about musicians, companies and audiences. We would like to find partners to continue an open up this as a public data observatory via grants and industry partnerships.

  • We believe that this could a very logical continuation of the work of CEEMID, which came to existence in less data rich countries of the EU with the same purpose in 2014. Our work (see Central European Music Industry Report 2020) was also put on stage on as a good example of evidence-based policy making on CCS Ecosystems: FLIPPING THE ODDS Conference – a two-day high-level stakeholder event jointly organized by Geothe-Institute and the DG Education, Youth, Sport and Culture of the European Commission with the Creative FLIP project. We would like to find a way into the Creative FLIP program with a grant application.

  • We believe that will provide a useful proof of concept for a future European Music Observatory, and will find many contributors from the earlier CEEMID partners and beyond.

  • Parallel to this observatory, we will created the experimental observatory on the entire creative, cultural and copyright-based industry sector. This will be a parallel observatory which will rely on mainly on the same data resources, and the same technology and program code as the They will be jointly developed as parts of our Creative Observatories programme, because from a data science point of view, music is just a very interesting case of the broader creative, cultural and copyright-based sectors (CCS).

  • We are applying with our partners to implement the Grant CALL FOR PROPOSALS EAC/S14/2019 Pilot Project - Measuring the Cultural and Creative Sectors in the EU (See the call >>>). We believe that we would have an outstanding application if we would not only create yet another ‘study’, but a practical, experimental statistical observatory where all the ideas and suggestions within the project can be tried out, opened up for the statistical and user community, and be documented. (This Grant call is a call within the same program where CEEMID was presented as a good case.)

  • From the point that we submit our application, a minimal demo version of our and will daily produce about 50 statistical indicators, which will be placed in a scientific repository with a doi, will be fully downloadable in Excel, CSV, with downloadable, full citation references, and high-quality charts. They will also offer some minimal interactive analysis.

  • In our grant proposal, apart from delivering the deliverables, we commit ourselves that all experimental ideas for new statistics, better statistics, will always be available for the users and for the statistical community in two formats: as downloadable and visualized open data, and also as a statistical algorithm with source code. We believe that, if selected, we would give an absolutely unprecedented transparency and ease of stakeholder engagement throughout Europe – especially if COVID disallows big public events. The developments of our grant deliverables in several version will be daily published, as if we were the data warehouse of Eurostat or ECB itself.

  • Our start-up, Reprex B.V. (website:, short introduction in the Annex) has applied to the Artificial Intelligence Validation Lab of Yes!Delft, which is considered to be one of the second best university-backed high-tech startup incubator program in the world. best incubator programs in the world. This is a very competitive program, and all wishes and support letters are welcome, as we believe that we could get real help there to make our products long-term viable and available for our partners. We will also introduce new team members among the founding team very soon.

1.1 Data Sources

1.1.1 Open Data

The data retrieval is done by the musicobservatory R package in the case of public data sources. We will treat private data sources with the same care, but obviously do not publish sensitive access code or data.

There is no free lunch. Using these data sources makes avoids the data collection cost, but processing datasets originally created for governmental or scientific use requires a very significant investment. Our statistical software that helps in this process has been developed over years.

  • CEEMID has been creating since 2014 thousands of music and audiovisual industry related indicators from raw data collected for other purposes, such as inflation measurement or public policy assessment in the EU using our proprietary software and some open source software.


  • fully open source and available for review or modification on github with full source - contributions, PR requests are welcome and will be credited for.

  • goes through unit-testing, i.e. automated validation of indicator results

  • the software code will be peer-reviewed;

  • fully documented on

Beware that this is not yet a working code, the relevant CEEMID codes will be documented in published until 10 September 2020.

This means that the software code that produces the indicators stored in the will be fully available for cultural statisticians, data scientists of music organizations, researchers and all interested parties. This is the software code that will, for example, check every day if Eurostat has published any new tables, data points or corrections in its data warehouse that is relevant for the observatory.

1.1.2 Industry & Partner Data

CEEMID has worked with all parts of the music industry - record labels and distributors, their collective management organizers, performers, composers, publishers and their collective management, granting authorities, trade associations, export offices. We helped them professionally join, integrate public data, our data and data confidential to them. This means that we have a detailed data map of the industry and its main organization in recordings, publishing and live music.

We are not a data vendor or re-selling organization, we believe that our know-how and added value lies in integrating many data sources into a more complete information and providing state-of-the art predictions, forecasts, valuation and other professional uses of the data. We will never publish either data maps or data from these sources, but we will seek partnership with industry organizations to make some of their data visible or available for the observatory, in a form, frequency and under terms comfortable for them.

In this demonstration, we will show some visualizations of publicly available data from various industry associations. We do not have the right to re-publish the data, but we have the know-how to enhance, better and join these data to be more useful information. Our users who have a legal access to the databases of CISAC, IFPI, GESAC or other industry associations can ask to join these private sources with our observatory for analysis. We provide this service only for the benefit of legal users of such data.

1.1.3 Proprietary Data

CEEMID and its successor, Reprex B.V. (website:, short introduction in the Annex) had been collecting data from primary sources, mainly via harmonized surveys for 7 years. CEEMID has been surveying music professional and film professionals for 7 years, and has been conducted probably the most Cultural Access & Participation Surveys in Europe.

Probably the most comprehensive, and fully reproducible report that CEEMID did is the Central European Music Industry Report 2020, that was presented as best case for evidence-based policy design in the cultural and creative sectors in the CCS Ecosystems: FLIPPING THE ODDS Conference two-day high-level stakeholder event jointly organized by Geothe-Institute and the DG Education, Youth, Sport and Culture of the European Commission within the Creative FLIP programme.

Some of this data is the asset of CEEMID and we will release indicators from those data assets. Some of them belong to CEEMID partners and we will seek their permission to release examples and seek funding to make the relevant, high-value data open in the future.

Our proprietary statistical software retroharmonize is currently under peer-review. It will be fully open source and it will make it possible that well-designed audience or musician surveys can be integrated with existing European (and African, Latin-American) policy, statistical and other surveys.

We will also publish retrospectively harmonized data from individual responses from pan-European surveys concerning music, willingness to pay and the use of entertainment technology. An early version of this work already created some unique indicators for our work with IVIR colleagues Open Access is not a Panacea, even if it’s Radical – an Empirical Study on the Role of Shadow Libraries in Closing the Inequality of Knowledge Access (See Metadata: Regional Eurostat Variables For Understanding Piracy Of Books)