The alkanes data set. Alessio Micheli. E-mail: micheli@di.unipi.it http://pages.di.unipi.it/micheli/dataset/index.html - Input domain: collection of 150 alkanes CnH2n+2 with up to 10 carbon atoms (1331 vertexes) The alkanes are trees (satured acyclic hydrocarbons molecules). - Task: Prediction of the boiling point temperature in Celsius degree scale (regression). - Target values: Celsius degree / 100 There are 150 trees. The training and test data were obtained by 10-fold-CV (e.g the first fold has 15 compounds: number 1,11,21,31, ... in the test set, the second fold: 2,12,22,32,...). The representation format of these trees is based on strings. The format is then transformed by a parser into an input graph. 'max arity' is the maximum fan-out of tree vertexes. The 'symbol table' shows for each item: the symbol, the fan-out, [the vectorial label] (always the constant 1 for alkanes with a hydrogens suppressed representations). The data set has a raw for each structure: The tree. The target value. (The ID of the compound). The tree is in the parentheses (symbolic) representation. Given a tree T(r) rooted in the node r, its parentheses representation is: - rep[empty] = null string - rep[T(r)] = label(r) (rep[T(ch_1(r)]},rep[T(ch_2(r)]},...,rep[T(ch_k(r)]}) where ch_j(r) is the j-th child of r. For details (data, task, machine learning model, results) and references: - [2009]: A. Micheli. Neural Network for Graphs: A Contextual Constructive Approach. IEEE Transactions on Neural Networks. Vol. 20, n. 3, Pages 498-511, March 2009. IEEE Inc. ISSN 1045-9227. Origins: - A.M. Bianucci, A. Micheli, A. Sperduti, A. Starita. Application of Cascade Correlation Networks for Structures to Chemistry, Applied Intelligence Journal (Kluwer Academic Publishers), Special Issue on "Neural Networks and Structured Knowledge" Vol. 12 (1/2): 117-146, 2000 - A. Micheli. Recursive Processing of Structured Domains in Machine Learning. PhD Thesis, Department of Computer Science, University of Pisa, TD-13/03, December 2003. Servizio Editoriale Universitario di Pisa. or the other updated publications for improved results. See also: - D. Cherqaoui and D. Villemin. Use of neural network to determine the boiling point of alkanes. J. Chem. Soc. Faraday Trans., 1994. 90(1):97-102 - A. Micheli, F. Portera, A. Sperduti. A preliminary empirical comparison of recursive neural networks and tree kernel methods on regression tasks for tree structured domains. Neurocomputing. Volume 64, March 2005, Pages 73-92. (c) 2005 Elsevier B.V - A. Micheli, D. Sona, A. Sperduti, Contextual Processing of Structured Data by Recursive Cascade Correlation. IEEE Transactions on Neural Networks. Vol. 15, n. 6, Pages 1396- 1410, November 2004. For questions regarding the database content, the origins of the learning dataset, the updating on the results for this data and collaborations, please contact Alessio Micheli (E-mail: micheli@di.unipi.it).