EPA’s proposed regulation more-or-less banning the Agency’s use of so-called “secret science” has received a lot of attention, much of it negative. What has largely been missed is the positive impact that this rule might have on open science generally.

The proposed regulation can be found here, along with supporting documents and comments to date. Comments are presently due by May 30, 2018 but many people are requesting that the present 30 day comment period be extended to 90 days. This is fairly typical for major rules like this one.

The most common criticism is that the rule rules out the use by EPA of health studies that include data on individuals. This sort of data cannot be shared, due to privacy laws and these sorts of studies can be very important. There is also the issue of proprietary business data, etc.

But in fact the proposed rule allows for these studies, in two different ways. First, it allows for what is called “masking” of data. If the data is properly structured then masking technologies enable the computer to easily remove or replace the sensitive stuff. Second, in extreme cases the EPA Administrator can simply exempt the study from the regulations.

Leaving the regulatory issues aside, consider the positive aspects for open science. The EPA rule is likely to finally establish specific standards for openness. Moreover, these standards will set a potential precedent for other Federal Agencies, possibly even other Governments, or even for scientific journals. Open science is a major issue throughout the global scientific community. In other words this relatively small action by EPA is potentially a very big pilot project for the whole world.

The Federal Government already has rules about sharing data that is developed in federally funded research. These rules are part of what is called “Public Access,” a program which began in 2013. Every federal science agency has implemented a Public Access Plan mandating that research data be shared.

EPA’s proposed rules extend Public Access in a big way. It basically extends the access and availability requirements of the Public Access Program from research that is federally funded to research that is federally used. In fact EPA specifically cites their Public Access Plan as a supporting document for this new regulation. The researcher is basically required to provide access to everything technical that is involved in getting the research result.

But with any big groundbreaking project comes big challenges. The present proposal is pretty vague when it comes to saying what is actually required. It reads as though the concept of replicability were already well defined, which it most certainly is not.

This is a common problem with ground breaking new laws and regulations. They use language which is clear in its way but which has no operational definition. Working out what these new rules mean is then a complex and difficult matter.

I have been studying this messy phenomenon for almost 50 years, beginning with the 1970 National Environmental Policy Act. NEPA required all Federal Agencies to do Environmental Impact Statements for all physical projects. But it did not say what these looked like or how to do them, so it took several years of confusion to work these questions out. (I eventually developed a diagnostic system of 126 different regulatory confusions, which anyone can use.)

EPA’s open science rule has the same broad impact and the same degree of vagueness as NEPA did. A great deal of work will have to be done before we know what these new rule actually require in practice. Some of this hard work can be done by EPA in formulating its final rules, but much of it is probably going to be done by the scientific community.

At some point EPA is going to have to say which research can be considered and which cannot. This is when the rules get very specific.

First they have to figure out what “using” a given research result even means. For example, proposed major rules are typically accompanied by a voluminous Technical Support Document. It may cite hundreds, or even thousands, of research journal articles, each of which reports on a given project. Does each of these projects have to meet the availability and replicability standard? Or is regulatory usage confined to just a few key studies?

Second, what does availability mean? For example, does the researcher have to document their data, or just provide it? What about the decisions made as the research progressed, which can be numerous. Does each of these have to be explained? How documented does the software have to be, etc. Here the danger is that the availability requirement might become too burdensome.

But assuming that these deep challenges finally get worked out, consider what this regulation does. In the environmental field a lot of research is done with federal policy in mind, so this is potentially a very broad mandate. It in effect creates the new category of “EPA usable research.”   Many researchers (or their institutions) are likely to want their work to be EPA usable, even if EPA does not fund it. They will then adopt usability practices from the beginning, which may be a new way of doing research. This is exactly what the open science movement is calling for.

To sum up, this regulation is a big extension of Public Access. It is also a big step forward for open science. But it will be a big job for EPA and the research community to work out.