Statistics, Machine Learning, and Data Mining
Syllabus for Ma322 Section 802 Spring 2026
Info
3 Credits
Mon 3:10am - 6:00pm
Room: C415
Instructor Information
calvin_williamson@fitnyc.edu
office: B831 Science and Math
office hours: M 1-3, T 12-1, R 12-1
Description
This is an introduction to statistical techniques for machine learning and data mining. It emphasizes mathematical methods and computer applications related to automated learning for prediction, classification, knowledge discovery and forecasting in modern data science. Special emphasis will be given to the collection, mining, and analysis of massive data sets. (G2: Mathematics) Prerequisite(s): MA 222 and mathematic proficiency (see beginning of Mathematics section)
Outcomes
Upon completion of this course students will be able to:
- Describe the concepts of machine learning and identify examples of its use in data science.
- Employ statistical software to collect data, create training and test sets, and perform predictions.
- Create regression models for predicting outcome variables in terms of predictors.
- Explain the contributions of Google in understanding web scale data and the structure of the internet.
- Identify the characteristics of massive data sets and describe the tools needed to analyze them.
- Analyze decision tree models and display them with appropriate graphics.
- Use recommendation systems software and understand how it makes suggestions based on similarity measures.
- Perform classifications for data sets using nearest neighbor and probabilistic algorithms.
- Collect text data and use text mining software to perform sentiment analysis.
Course Materials
We will be using Google Colab and Google Sheets for all work in this course. Since these are web-based applications there is NO OTHER SOFTWARE required for the course besides a web browser.
Topics
Regression
- Simple Regression
- Multiple Regression
- Applications
- Conjoint Analysis
Introduction to Python
- Google Colab Notebook
- Using LLM as Coding Assistant
- Calculations
- Variables
- DataTypes
- Lists
- Dictionaries
- Functions
- Dataframes
- f-Strings
Introduction to Large Language Models (LLMs)
- LLM Examples
- ChatGPT, GPT-4o, Gemini, Claude
- Completions, APIs
Prompt Engineering
- Prompting
- Prompt Chaining
- Roles and Personas
- Chain of thought
- Few-shot and zero-shot Learning
Machine Learning
- Classification, Accuracy
- Training, Testing
- Decision Trees
Evaluation
Your grade will come from these parts:
- Quizzes (85%)
- In Class/Homework (15%)
Each of these parts is described in more detail below
Quizzes
Your quiz grade will come from 5 quizzes roughly covering 2 or 3 weeks material eachThis quizzes are 30-45 minutes each and are usually 5 or 6 questions each.These quizzes are with no notes, no internet, no phone, no software, no AI tools.Pen and paper and calculator only. They are some multiple choice, some short answer, some true false.
In Class/Homework (1 or 2 per class)
These are credits you obtain for demonstrating you have completed assigned problems. Some of these will come from homework assignments that you show me at the beginning of the class, some of these will come from in class assignments that are done during class and you show as you complete them. You will earn a credit for each successful assignment completion. You must be in attendance to earn these problem credits.
There is NO FINAL EXAM.
AI Policy
All uses of chatbots are encouraged, and there is no restriction on their use. This is especially for topics about large language models (ChatGPT, Gemini, Claude, etc).