|
|
Data Mining in Chemistry
Dr. Alexander Kos
Data mining is finding correlations between data that are not obvious and
for which one cannot, or has not searched specifically. Data by
itself are meaningless. Only the correlation of data creates
knowledge. One way of data mining in chemistry is the clustering of
chemical structure databases. This clustering or if less rigorous,
grouping of structures is done by numerical approaches,
numerical indirect ways and without any numerical processing by using
visualization software.
We will illustrate a program called MDLs Reagent Selector
that use among others, K-Means a non-hierarchical,
distance-based clustering algorithm, PASS (Prediction of Activity
Spectra of Substances) that uses a very indirect way of grouping
substances by predicting their biological activity spectra using atom
centred keys. Miner3D.excel is a visualization software with which
one browses through data sets until one finds
correlations.
This talk will end with showing miner3d.web, a prototype of clustering
results of a web search, i.e. in Alta Vista. Whereas clustering is
easy for chemical structures, it is difficult for other areas like
natural language. This presentation should illustrate how the
problems in searching huge databases in chemistry, is very closely
related to any problem of searching for information. Here is an area
where linguists, mathematicians, chemists, and many more meet to
develop clever search engines for the Internet.
|
|