Doing the same thing over and over and expecting different results might be the definition of insanity, but at the same time, doing the same thing over and over and getting the same result is the bread and butter of science. That’s how you know that a new idea is good, that a theory is correct, that a process or product works.
And artificial intelligence, while turbocharging a lot of scientific research, is posing a threat to that standard of proof. For all their power, AI and machine learning’s methods are often inscrutable, which is creating a “reproducibility crisis” in science because research conducted with AI and machine learning can’t be repeated.
"The question is, ‘Can we really trust the discoveries that are currently being made using machine-learning techniques applied to large data sets?’" asked Rice University statistician Genevera Allen, who recently addressed the crisis at the 2019 Annual Meeting of the American Association for the Advancement of Science (AAAS). Allen said findings generated by machine learning will have to be checked and rechecked until a new generation of AI and machine-learning systems can do it on their own.
“Often these studies are not found out to be inaccurate until there's another real big dataset that someone applies these techniques to and says, ‘Oh my goodness, the results of these two studies don't overlap,'" Allen told BBC News. “There is general recognition of a reproducibility crisis in science right now. I would venture to argue that a huge part of that does come from the use of machine-learning techniques in science.”
Behind the Curtain
Machine-learning systems learn from data they analyze, rather than from following a specific set of instructions, and often operate in a predictive model. In fact, that’s the appeal of a lot of machine-learning systems, whether they’re designed for predictive policing, vehicle maintenance, traffic patterns or healthcare outcomes. But looking into the future involves a few blind spots.
In science, technologies like deep learning, AI and machine learning have the power to take discovery where no human can because of their ability to explore data sets and possibilities beyond the reach of the human brain. One example is in the development of drug research where, at the molecular level, there are more possible drug molecules (the number is estimated to be 10^60) than there are atoms in the solar system. An AI system can sort through all the possibilities; human researchers can’t. And for drug researchers, it’s more than just crunching numbers, but also AI’s ability to follow unexpected threads.
“Maybe it will go in a different direction that a human wouldn’t go in,” Angel Guzman-Perez, a drug researcher for Amgen, the drug discovery group for the AI research company Barzilay, told MIT Technology Review. “It thinks differently.”
And therein lies the rub. An AI system currently can’t explain in human terms how its complex set of algorithms reached a conclusion, which is a barrier in the way of plans by government agencies for human-machine teams, among other uses. That kind of teaming depends on developing trust, which can be lacking in high-impact situations.
A number of research projects are attempting to solve the problem, including those by the OpenAI consortium and the Defense Advanced Research Projects Agency’s Explainable AI (XAI) and Competency-Aware Machine Learning (CAML) programs, which are working to get machines to use natural language to describe their processes. A breakthrough in XAI and similar research could also help solve science’s reproducibility crisis.
Another part of the problem is that AI and machine-learning systems are designed to project certainty. "A lot of these techniques are designed to always make a prediction," Allen said. "They never come back with, 'I don't know,' or, 'I didn't discover anything,' because they aren't made to."
She offered an example in cancer research that involved studying clusters of patients with similar genomic profiles. A machine-learning system might confidently identify a group of similar patients when, in fact, it’s certain about only a few members of the group, leading to inconsistent and unreliable findings in research that then would have to be redone.
Can AI be the Solution, Too?
AI isn’t the only field with a reproducibility crisis, of course. One also exists in the social and behavioral sciences, which the Defense Department, for instance, uses to “design plans, guide investments, assess outcomes and build models of human social systems and behaviors as they relate to national security challenges in the human domain.” Many of these sciences results vary wildly in their ability to be replicated, which can have real-world consequences for the military. So the Defense Advanced Research Projects Agency (DARPA), in yet another program, is looking to solve it with AI.
The agency’s Systematizing Confidence in Open Research and Evidence (SCORE) program aims to develop automated tools that can assign "confidence scores" — as accurate as the best human expert methods — about the reproducibility of different social and behavioral sciences research results and claims.
“For the last eight years, the research community has been scrutinizing the reproducibility of its findings and the quality of its research practices,” said Brian Nosek, executive director of the Center for Open Science, which signed a three-year, $7.6 million cooperative agreement with DARPA to work on the project. “Now, that learning is being translated into opportunities to improve research practices to accelerate the pace of discovery.”
If SCORE’s system can explain how it came up with its confidence scores, that could be a step (along with explainable AI and other research) toward solving science’s unfolding reproducibility crisis.