IuK2001 The 7th Annual Meeting of the IuK Initiative
Information and Communication of the Learned Societies in Germany
»Cooperative Systems«
Trier, March 11 - 14, 2001
 


“Data Mining” in Chemistry

Dr. Alexander Kos

AKos Consulting & Solutions GmbH
Rössligasse 2
CH-4125 Riehen
email: alexander.kos@akosgmbh.de
homepage: www.akosgmbh.de

Data mining is finding correlations between data that are not obvious and for which one cannot, or has not searched specifically. Data by itself are meaningless. Only the correlation of data creates knowledge. One way of data mining in chemistry is the clustering of chemical structure databases. This clustering or if less rigorous, “grouping” of structures is done by numerical approaches, numerical indirect ways and without any numerical processing by using visualization software.

We will illustrate a program called “MDL’s Reagent Selector” that use among others, K-Means – a non-hierarchical, distance-based clustering algorithm, PASS (Prediction of Activity Spectra of Substances) that uses a very indirect way of grouping substances by predicting their biological activity spectra using atom centred keys. Miner3D.excel is a visualization software with which one “browses“ through data sets until one finds correlations.

This talk will end with showing miner3d.web, a prototype of “clustering” results of a web search, i.e. in Alta Vista. Whereas clustering is easy for chemical structures, it is difficult for other areas like natural language. This presentation should illustrate how the problems in searching huge databases in chemistry, is very closely related to any problem of searching for information. Here is an area where linguists, mathematicians, chemists, and many more meet to develop clever search engines for the Internet.

 

The IuK 2001 is organized by the German Psychological Association (DGPs) and the Institute for Psychology Information (ZPID).

Last updated: February 7, 2001 · info@zpid.de · URL: http://www.zpid.de/iuk2001/