Experimentation is a valuable tool in computer science. Random experimentation is, however, a danger. This post examines current research practices and casts doubt on their benefits. Inspired by the French writer Rousseau (1972)—society corrupts the individual—, my conviction is that the experimental community corrupts the researcher.
The scientific method has evolved throughout the centuries, and philosophers have had a distinguished role in that change by questioning the beliefs used to guide discoveries; by challenging new ways of thinking, sometimes without providing any answers, for the pleasure of asking questions (Russell 1997). This passion and awakened mind are missing in the current computer science community. Empiricists, especially, write large amounts of plain technical reports tracing experiments, oblivious to the beauty of essays, the excitement of sharing outlandish ideas. Going through the literature has often become a mechanical skimming/scanning task, seeking for numbers that highlight those decimals that the proposed techniques outperform their competitors by. These results sustain a sort of research, based on a mere parameter tuning of established algorithms.
The purpose of this viewpoint is to show the disenchantment that would-be researchers—even tenured researchers—suffer and to denounce the proliferation of questionable practices that are killing innovation.
We first review the effect of the modern obsession for publishing and to what extent academic research has distorted experimental science. Next, we see what calls the current methodology into question and why approaches are simply ignored or incomprehensibly revived.
The perversion of the community
Pure research is becoming less attractive nowadays. Many research lines are abandoned since investors are more interested in applications—despite the relevance of fundamental investigation. Hence, the groups that subsist are because either their research is leading in applicative domains or their volume of publications is high. What is behind these numbers? Parnas (2007) makes a strong point about them and guts every single perversion of the community—authorship in pacts, monthly instalments, tailor-made conferences—that have encouraged superficial research made from overly large groups, repetition, insignificant studies, half-baked ideas… Publish or perish has wreaked havoc on daily investigation, at least in the halls of European academia.
Eventually, impact factors, h-index, g-index have become the fallacious indicators of good research/ers and fired up the paper factory. Fresh Ph.D. students are burdened junk writers as soon as they learn that their career will be measured by these statistics. The pressure is intensified by supervisors, assessed by the same yardstick, who need to keep CVs up-to-date or repay colleagues in the favour chain which promotes quantity over substance.
This compulsive publishing has plagued conferences and journals with so many papers that it is getting difficult to track innovative ideas. The more one reads, the more one bumps into similar attempts, déjà vus—which slow down the learning curve and discourage further reading. Showing off the abilities of regular methods to non-technical experts and cherry-picking results from much wider experimentation are the most common schemes. The raison d’être of empiricism has been abused and now entails repeated preliminary results with no further continuation.
Experimental computer science
Experimental computer science, defined as “an apparatus to be measured, a hypothesis to be tested, and systematic analysis of the data (to see whether its supports the hypothesis)” by Denning (1980), is recurrent in machine learning, algorithm development, and software engineering. Nevertheless, experimental methodology has been twisted; instead of sustaining conjectures, experiments are run to provide material to decide them retroactively—to build a posteriori theories.
Machine learning, for instance, is based on trials with performance measures, learners, and data. The combination of these elements made Langley (1988) encourage practitioners to join empirical testing, as a process of theory formation. Competition testing—a term coined by Hooker (1995) in relation to heuristics—has been the subsequent chaos of such a call. Many years later, no new learning paradigm has been introduced, some progress in standards has been made, and micro-tuning of the existing techniques is the trendy research—the latter being the gold mine for publications. Superiority of techniques is claimed usually following a three-step procedure: selection of a few data sets, selection of referenced learners to compare with, and extraction of performance conclusions supported by erroneous statistical tests. With a pessimistic but very realistic description of the scene, Demsar (2008) warned of the misuse of such experimentation. Conventional statistical models are designed to test single learners in isolation; they are ill-suited to perform multiple comparisons.
Hypothesis testing is useful to say whether the probability of the apparent accuracy of a learner is due to chance, but its power goes down as the number of data sets examined increases. Then, it is worth determining what the ideal size of the test set is, what problems have to be involved, and empowering the testing methodology by sufficient data analysis. These—old claims—are things that one expects to be delighted with when reading papers. Yet, they are complicated milestones and many negative results are derived from the studies. Although these are meaningful to lead progress as well, the community does not consider them. This forces researchers to move back to the classical developments. In addition, groundless rejections cause frustration in researchers, which is reflected in their subsequent reviews. In turn, after being taught that going against the mass culture is not profitable, they will unwittingly stop promising ideas, frustrating new generations again.
Gaming the system in lieu of research
In validating incoming contributions, the clout of journals and reviewers, and the inertia of the scientific community as a society have a lot to do.
Current research is like politics—each tendency has its own press. No matter the thoroughness of the content, if the work submitted to a journal is not aligned with the thought of its staff, it will never get the green light. This results in contributions focused on pre-empting reviewers’ opinion than disseminating the work. Demsar (2008) suggests the web-to-peer review. This unlikely idea, which appears to enable critical and fair evaluations of “correctness, interestingness, usefulness, beauty, novelty“, also evidences the urge to adopt other measures of productivity and recognition to end with the fake tenure of rigour and biased opinions. The new peer-review process should give back credibility to publications, and researchers should not be able to game it.
Indeed, references have a crucial role in the shallow statistics above. Everyone knows they provide the information for the productivity computation. Thus, self-citations, citations to friends and the community clique, or citations to particular journals are some of the mechanisms to scale. Citing has lost its sense: guiding the reader to obtain the background necessary to understand the paper.
A reinterpreted experimental science and a deep knowledge of the system have been the mean for academic researchers to satisfy a demanding productivity. Unfortunately, this praxis is learnt by the new generation of researchers who will mistake research for poor scientific journalism/scientific patter. Publications should be the recognition to mature works and should slow down to gain in quality.
Demsar, J. “On the appropriateness of statistical tests in machine learning.” Proceedings of the 3rd Workshop on Evaluation Methods for Machine Learning. 2008.
Denning, P.J. “What is experimental computer science?” Communications of the ACM 23, no. 10 (1980): 543-544.
Hooker, J. N. “Testing heuristics: We have it all wrong.” Journal of Heuristics 1, no. 1 (1995): 33-42.
Langley, P. “Machine learning as an experimental science.” Machine Learning 3, no. 1 (August 1988): 5-8.
Parnas, D.L. “Stop the numbers game.” Communication of ACM 50, no. 11 (2007): 19-21.
Rousseau, J.J. Les confessions. Paris: Librairie Générale Française, 1972.
Russell, B. The problems of philosophy. New York: Oxford University Press, 1997.