We should aim at having the best data for analyses of APC payments. Therefore, the data should be as complete as possible. In some cases, contributors can - for pragmatic reasons - only contribute parts of their data (e.g. payments to a particular publisher) and have to postpone contributing other data. This leaves us with a bias in terms of 'biggest recipients of APC payments'. Examples: MPDL has added more and more publishers to their data (https://github.com/OpenAPC/openapc-de/commits/master/data/mpg), while for LMU München there’s just Springer/BMC data (https://github.com/OpenAPC/openapc-de/tree/master/data/lmu). See in this picture the huge impact this has on visualizing the APC recipients (German universities to publishers data with LMU being turquoise ):

While these issues might improve over time, there’s another concern I’d like to add to the picture: Institutions choosing to deliberately hold back parts of their data. See e.g. this statement of U of Leipzig (cc: @vielera):
Data on hybrid OA and APC above 2.000 EUR are not included (exceptions due to currency exchange rates). (https://github.com/OpenAPC/openapc-de/blob/0901f552ca166c5cba8e702c9e6807a443b39f19/data/unileipzig/README.md)
We all know that we can only gather parts of the APC payments – there is a huge gray area. But if the data is available to a data provider, then I’d really suggest that this data is provided to the Open APC initiative as complete as possible. There might we problems or concerns I did not see, but maybe we can at least agree on the general goal?
Maybe we can have a discussion here and/or during the workshop in April? What do you think?
We should aim at having the best data for analyses of APC payments. Therefore, the data should be as complete as possible. In some cases, contributors can - for pragmatic reasons - only contribute parts of their data (e.g. payments to a particular publisher) and have to postpone contributing other data. This leaves us with a bias in terms of 'biggest recipients of APC payments'. Examples: MPDL has added more and more publishers to their data (https://github.com/OpenAPC/openapc-de/commits/master/data/mpg), while for LMU München there’s just Springer/BMC data (https://github.com/OpenAPC/openapc-de/tree/master/data/lmu). See in this picture the huge impact this has on visualizing the APC recipients (German universities to publishers data with LMU being turquoise ):
While these issues might improve over time, there’s another concern I’d like to add to the picture: Institutions choosing to deliberately hold back parts of their data. See e.g. this statement of U of Leipzig (cc: @vielera):
We all know that we can only gather parts of the APC payments – there is a huge gray area. But if the data is available to a data provider, then I’d really suggest that this data is provided to the Open APC initiative as complete as possible. There might we problems or concerns I did not see, but maybe we can at least agree on the general goal?
Maybe we can have a discussion here and/or during the workshop in April? What do you think?