Data complexity in supervised learning

My thesis, Data complexity in supervised learning: A far reaching implication, is finally available online.

This thesis takes a close view of data complexity and its role shaping the behaviour of machine learning techniques in supervised learning and explores the generation of synthetic data sets through complexity estimates. The work has been built upon four principles which have naturally followed one another. (1) A critique about the current methodologies used by the machine learning community to evaluate the performance of new learners unleashes (2) the interest for alternative estimates based on the analysis of data complexity and its study. However, both the early stage of the complexity measures and the limited availability of real-world problems for testing inspire (3) the generation of synthetic problems, which becomes the backbone of this thesis, and (4) the proposal of artificial benchmarks resembling real-world problems.

The ultimate goal of this research flow is, in the long run, to provide practitioners (1) with some guidelines to choose the most suitable learner given a problem and (2) with a collection of benchmarks to either assess the performance of the learners or test their limitations.

DCoL: New release v1.1

A new version of the data complexity library (DCoL) in C++ is available at http://dcol.sourceforge.net/.

DCoL provides the implementation of a set of measures designed to characterize the apparent complexity of data sets for supervised learning, which were originally proposed by Ho and Basu (2002). More specifically, the implemented measures focus on the complexity of the class boundary and estimate (1) the overlaps in the feature values from different classes, (2) the class separability, and (3) the geometry, topology, and density of manifolds. In addition, two other complementary functionalities, (4) stratified k-fold partitioning and (5) routines to transform m-class data sets (m > 2) into m two-class data sets, are included in the library. The source code can be compiled across multiple platforms (Linux, MacOS X, and Ms Windows) and can be easily configured and run from the command line.

Practitioners are encouraged to consider the use of this software in the analysis of their data. A closer reading of data complexity can help them to understand the performance of machine learning techniques and their behavior.

Universitat d’Estiu d’Andorra

After a first-rate opening in May with the talk given by Prof. Cirac, the 27th edition of the Universitat d’Estiu d’Andorra officially starts today, Aug, 30 with a promising agenda:

6:00 pm: Equilibri climàtic del planeta Terra (Climate balance on planet Earth), presented by Josefina Castellví Piulachs, oceanographer specialized in marine bacteriology (Barcelona).
7:30 pm: La bellesa és dins el cervell? (Is beauty in the mind?), presented by Jean-Pierre Changeux, doctor in biology and pioneer of modern neurobiology (Paris).

For five days, Andorra will offer, under the interesting title Del cosmos a l’àtom passant per la vida, a series of talks focussed on science and society.

ICPR 2010 – Contest: Extended Deadline May, 26

Call for Contest Participation – Classifier domains of competence: The landscape contest (ICPR 2010)

Classifier domains of competence: The landscape contest is a research competition aimed at finding out the relation between data complexity and the performance of learners. Comparing your techniques to those of other participants on targeted-complexity problems may contribute to enrich our understanding of the behavior of machine learning techniques and open further research lines.

The contest will take place on August 22, during the 20th International Conference on Pattern Recognition (ICPR 2010) at Istanbul, Turkey.

We encourage everyone to participate and share with us your work! For further details about dates and submission, please see http://www.salle.url.edu/ICPR10Contest/.

Continue reading

ICPR 2010 – Contest

Classifier domains of competence: The landscape contest is a research competition aimed at finding out the relation between data complexity and the performance of learners. Comparing your techniques to those of other participants may contribute to enrich our understanding of the behavior of machine learning and open further research lines. Contest participants are allowed to use any type of technique. However, we highly encourage and appreciate the use of novel algorithms.

The contest will take place on August 22, during the 20th International Conference on Pattern Recognition (ICPR 2010) at Istanbul, Turkey.

We are planning to have a day workshop during the ICPR 2010, so that participants will be able to present and discuss their results.

We encourage everyone to participate and share with us your work! For further details about dates and submission, please visit The landscape contest webpage.

On our way toward the engineer of the future

How to design the engineer of the future was the object of the workshop EF2009, which was held on November 12 at La Salle, in Barcelona.

The two keynotes given by Ms. Lueny Morell, How can engineering education address the challenges of the 21st century?, and Prof. David E. Goldberg, The missing basics: What engineers don’t know and why they don’t know them, were enlightening and inspiring, and an excellent starting point for professors, students, and engineers eager to take part in a new era for engineering. Both told their beautiful and encouraging story about change. Stories that took place in very different lands and under different circumstances, but with a common aim, implementing approaches to train the new generation of engineers, and with a common villain, the education Establishment.

Continue reading