A new version of the data complexity library (DCoL) in C++ is available at http://dcol.sourceforge.net/.
DCoL provides the implementation of a set of measures designed to characterize the apparent complexity of data sets for supervised learning, which were originally proposed by Ho and Basu (2002). More specifically, the implemented measures focus on the complexity of the class boundary and estimate (1) the overlaps in the feature values from different classes, (2) the class separability, and (3) the geometry, topology, and density of manifolds. In addition, two other complementary functionalities, (4) stratified k-fold partitioning and (5) routines to transform m-class data sets (m > 2) into m two-class data sets, are included in the library. The source code can be compiled across multiple platforms (Linux, MacOS X, and Ms Windows) and can be easily configured and run from the command line.
Practitioners are encouraged to consider the use of this software in the analysis of their data. A closer reading of data complexity can help them to understand the performance of machine learning techniques and their behavior.