Machine Learning

This course provided an in-depth look into the field of Machine Learning, starting with its distinction from data science and an understanding of algorithms that learn from data without explicit instructions. I explored data handling, scaling, and practical applications like poll analysis and spam detection, illustrating machine learning's real-world impact.


I further explored statistical learning, differentiating supervised from unsupervised learning and addressing challenges like the curse of dimensionality. Techniques such as K-Nearest Neighbors (KNN) and K-Means taught me about classification, regression, and data clustering.


The course also emphasized understanding overfitting and underfitting in models, introducing regularization methods to improve model accuracy. Tree-based methods, including decision trees and ensemble methods like bagging and boosting, were covered to enhance prediction accuracy.


Support Vector Machines (SVM) deepened my knowledge of classification and regression, emphasizing the role of hyperplanes and kernel functions. Principal Component Analysis (PCA) introduced me to dimensionality reduction, essential for high-dimensional data analysis.


Genetic Algorithms (GA) highlighted an evolutionary approach to problem-solving, and neural networks introduced me to deep learning fundamentals. The course ended with time series forecasting, showcasing neural networks like MLP and LSTM for predictive modeling.


Overall, this machine learning course has provided me with a great understanding of the field, ranging from fundamental principles to advanced techniques and practical applications. I've learned the distinctions between machine learning and data science, explored various algorithms, and gained valuable hands-on experience. I believe this course has equipped me with the knowledge and skills needed to navigate the complexities of machine learning in real-world scenarios.

For one assignment in this course, I needed to work on the Jupyter notebook at: https://github.com/ruiwu1990/CSCI_4120/blob/master/Decision_tree/HW6.ipynb

My instructions were to classify the breast cancer data using RandomForestClassifier, complete TODO sections, build a classification model, and tune hyperparameters. My accuracy needed to be more than 0.92 and the accuracy/number of features) needed to be more than 0.45.

HomeworkM5.pdf