23Mar/105
New Project On Data Mining
I am happy to announce, that I am starting working on my first research project for Copenhagen University. This project is about data mining on huge base of XML documents. Since we all like functional languages at our faculty, most part of it will be implemented using them. So I hope soon there will be a lot of posts about data mining and functional languages!
March 28th, 2010 - 21:52
Python + lxml тебе в помощь
March 29th, 2010 - 09:02
Hi Anton!
If you don’t mind, I will respond in English.
Anton offers to use Python+lxml. Well, the thing is that there is no essential difference in using one language or another. The goal of the project is a bit different. But since in our faculty we all like functional programming, I am using Haskell. At least now =)
March 30th, 2010 - 21:54
speed is essential than we are talking about huge amounts of data or it doesnt matter in this particular case?
March 30th, 2010 - 22:11
hmm, after a small research found that Haskel (in most cases) is faster than Python or equal =)
March 30th, 2010 - 22:15
Well, First, I don’t think, that Python with lxml would be much faster in this case. I am not using pure Haskell too. I have use some dedicated module for parsing. Beside this, speed is not that essential right now. Also, preprocessing scripts and first planned analysis logic are not that hard and can be easily rewritten later on.
But anyhow, I will try to check the Python + lxml and compare the speed. Because I trust you, Anton =))