21 Апрель 2012
Real-world text mining using machine learning
Ян Жижка (Чехия, Брно)
Категория: Весна 2012.
Семинар прошел 21.04.2012
Today, huge volumes of text data are available, especially on the Internet. Very often, the data is not structured and the text is freely written by the Internet users in natural languages. Such the data is expected to contain interesting or valuable information that can be used for different goals in a lot of application areas. Because the data is too big, it is very difficult or impossible to process it "manually" within an acceptable time. Fortunately, modern informatics procedures and methods enable us to apply sophisticated methods included in artificial intelligence, especially the set of algorithms called machine learning. Machine learning methods applied to text mining are based on the inductive learning from existing examples. In the first part, the talk deals with a brief introduction to some machine learning methods applied to text mining. The main problems are connected with the appropriate preprocessing of the data, designing the mining procedure including selection of suitable algorithms and interpreting the results. In the second part, some interesting results obtained from the real-world data will be presented. The data represents opinions/sentiments of customers' reviews relating to services provided by hotel accommodation all over the world. The reviews are written by hundreds of thousands of customers in many languages. The focus of the described research was on revealing typical words and phrases in several languages, including English, Spanish, French, German, Japanese, Russian, Czech, and others.