A dramatic increase in computing power has enabled new areas of data science to develop in statistical modeling and artificial intelligence, often called Machine Learning. Machine learning covers predictive and descriptive learning, and bridges theoretical and empirical ideas across disciplines. We will focus on concepts and methods for predictive learning: estimating models from data to predict unknown outcomes. Model types will include decision trees, linear models, nearest neighbor methods, and others as time permits. We will cover classification and regression using these models, as well as methods needed to handle large datasets. Lastly, we will discuss deep neural networks and other methods at the forefront of machine learning. We situate the course components in the "data science life cycle" as part of the larger set of practices in the discovery and communication of scientific findings. The course will include lectures, readings, homework assignments, exams, and a class project. Most of the course activities will use Python with the Pandas library, which students should already be proficient using. Students will learn how to use the scikit-learn Python library for machine learning during this course.
Prerequisite: Students should be familiar with the concepts of tabular data (tables) and data types (categorical, ordinal, continuous, etc.) and be able to implement these concepts in Python using Pandas. Either STAT/CS/IS 107, IS 205, INFO 407, or at least 1 semester of programming experience using Python and Pandas is recommended as a prerequisite. Students should also be comfortable with basic geometry concepts such as points, lines, and distances. Restricted to Sophomore, Junior, or Senior standing.