确定 取消 应用
对外交流

学术讲座

发布日期 :2009-03-31    阅读次数 :2929

国际数据挖掘著名学者,美国IBM Watson 研究院研究员Wei Fan博士将来我校访问,并于46日上午10时在教七404教室作学术报告。报告题目,简要,及演讲者简介如下,望有兴趣者踊跃参加。

 

谢谢留意!

 

From Feature Construction, to Simple but Effective Inductive Modeling, towards Domain Transfer

 

Wei Fan, http://www.cs.columbia.edu/~wfan,  or http://www.weifan.info

 

This talk covers a "sequence" of solutions to some of the most important problems in data mining.  In real-world applications, data is rarely in feature vector format, but normally semi-structured or unstructured. Examples include transaction sequences, social network, network connection events, biological sequences, still images and video. The main problem is that: in order to use most of today's inductive learning methods, one has to first come up with predictive feature vectors from these raw data. We discuss a method called Model-based Tree (or MbT) that use frequent patterns in raw data to search for highly predictive  patterns in order to construct those good features. However, this is an NP-hard problem. The proposed method uses divide-conquer principle to avoid exhaustive search. It has linear scalability and can discover those features that trains model with accuracy higher than benchmark results on some of the most difficult problems.  Many of the features constructed by MbT cannot even be found by any existing approaches without running into prohibitive combinatorial explosion.  After data is already in feature vector format, the next important question is "which inductive algorithm to use"? There is a non-trivial algorithm selection process, given the fact that there are many inductive learning algorithms out there. We discuss a method called Random Decision Tree (RDT) that is remarkably simple to use, works for all three major inductive learning problems (classification, regression, and probability estimation). The main advantage of RDT is simplicity, accuracy, efficiency, naturally streaming and against sample selection bias. One of its applications on weather forecasting has won the ICDM06 application best paper award, and our submission using RDT to ICDM’08 Data Mining Contest has won the championship.  The third important scenario is that training and testing data may not always come from the same distributions as one would desire.  In the last part of the talk, we will discuss a few effective and novel approaches to transfer knowledge from a related but a different domain into target domain (examples include using Reuters data to predict New York Times article). The source data and data set of some of the solutions are available from the speaker's homepage http://www.cs.columbia.edu/~wfan

 

Bio Sketch of Wei Fan:                                          

 

Dr. Wei Fan received his PhD in Computer Science from Columbia University in 2001 and has been working in IBM T.J. Watson Research since 2000. He published more than 60 papers in top data mining, machine learning and database conferences, such as KDD, SDM, ICDM, ECML/PKDD, SIGMOD, VLDB,ICDE, AAAI, ICML etc. Dr. Fan has served as Area Chair, Senior PC ofSIGKDD'06, SDM'08 and ICDM'08, sponsorship co-chair of SDM'09, award committee member of ICDM'09, as well as PC of several prestigious conferences in the area including KDD'09/8/07/05, ICDM'07/06/05/04/03,SDM'09/07/06/05/04, CIKM'08/07/06, ECML/PKDD'07'06, ICDE'04, AAAI'07,PAKDD'09/08/07, EDBT'04, WWW'09/08/07, etc. He is on the advisory board ofKD2U. Dr. Fan was invited to speak at ICMLA'06. He served as US NSF panelist in 2007/08. His main research interests and experiences are in various areas of data mining and database systems, such as, risk analysis, high performance computing, extremely skewed distribution, cost-sensitive learning, data streams, ensemble methods, easy-to-use non parametric methods, graph mining, predictive feature discovery, feature selection, sample selection bias, transfer learning, novel applications and commercial data mining systems. He is particularly interested in simple, unconventional, but effective methods to solve difficult problems. His thesis work on intrusion detection has been licensed by a start-up company since 2001. His co-teamed submission uses Random Decision Tree has won the ICDM'08 Contest Crown Awards (Championship).  His co-authored paper in ICDM'06 that uses "Randomized Decision Tree" to predict skewed ozone days won the best application paper award.  His co-authored paper in KDD'97on distributed learning system "JAM" won the runner-up best application paper award.