IS 577 Data Mining

Data mining refers to the process of exploring large datasets with the goal of uncovering interesting patterns. This process usually involves a number of tasks such as data collection, pre-processing, and characterization; model fitting, selection, and evaluation; classification, clustering, and prediction. Although data mining has its roots in database management, it has grown into a discipline that focuses on algorithm design (to ensure computational feasibility) and statistical modeling (to separate the signal from the noise). It draws heavily upon a variety of other disciplines including statistics, machine learning, operations research, and information retrieval. Will cover the major data mining concepts, principles, and techniques that every information scientist should know about. Lectures will introduce and discuss the major approaches to data mining; computer lab sessions coupled with assignments will provide hands-on experience with these approaches; term projects offer the opportunity to use data mining in a novel way. Mathematical detail will be left to the students who are so inclined.