зеркало из
1
0
Форкнуть 0
finance-advanced-analytics/book-content/04-technical-ml-learning.Rmd

468 строки
32 KiB
Plaintext

# Technical Path {#technical-learning-ml}
## Introduction
### Opening Thoughts
There are many ways to learn technical material. Paid courses, YouTube videos, blog posts, and books are all equally effective based on your learning style. The content in the following sections tries to have a healthy balance of all content types.
By default, all of the recommended learning content is free. There are additional resources mentioned where you might have to pay for. These serve as additional avenues to strengthen the concepts and techniques learned in the initial free content.
Machine Learning falls into two camps of programming languages, those who use R and those who leverage Python. Both have their strengths and weaknesses. All learning content will have an equal amount of resources for both R and Python.
### R or Python?
Short answer: Eventually you will have to learn both to become a successfully data practitioner, but if you could only pick one choose Python.
### Getting the Most out of Learning
[Deliberate practice](https://jamesclear.com/deliberate-practice-theory) is the best way of getting the learning to stick, and to rapidly evolve your skills.
**Whenever you learn something new in the data and AI world, it's best to usually apply it immediately to a real world project within your job or company.** By using a real world problem to practice what you just learned, you're able to reinforce the new knowledge into your long term memory while at the same time driving impact in your job by solving real problems. What a bonus! Be careful about only working on "toy data sets", which is public data that has been beat to death by hundreds of blogs and courses. The real world of data is messy and unpredictable, so working on things related to your current job or company gets you comfortable with that uncertainty even faster.
Don't feel bad looking up things on Bing/Google. Every technical person who works with computers today most likely looks up things online every day. Software syntax takes time to learn, and some of the best engineers still don't remember all the ins and outs of a language. When it doubt look it up online! Sites like Stack Overflow will quickly become your best friend as you try to work through issues in your code.
## Installing Software
Getting started with the right developer environment can save tons of headaches further down the road. While there are many options on what type of Interactive Developer Environment's (IDE) to use, the below ones are quickly becoming the standard for each language.
### Python
- [Coding Tools for Python Development 📃](https://docs.microsoft.com/en-us/learn/modules/install-code-tools-python-nasa/)
- [VSCode Python Setup 📹](https://www.youtube.com/playlist?list=PLo32uKohmrXt7VB91DLB-hMcDcWswE53Y)
- [Install Packages - Windows 📃](https://www.activestate.com/resources/quick-reads/python-package-installation-on-windows/)
- [Install Packages - Mac 📃](https://blog.quantinsti.com/installing-python-packages/)
### R
- [Installing R and RStudio 📃](https://rstudio-education.github.io/hopr/starting.html)
- [Install Packages 📃](https://rstudio-education.github.io/hopr/packages2.html)
### Additional Resources
- [Python vs R for Data Science: LinkedIn Learning 🏫](https://www.linkedin.com/learning-login/share?account=3322&forceAccount=false&redirect=https%3A%2F%2Fwww.linkedin.com%2Flearning%2Fpython-vs-r-for-data-science%3Ftrk%3Dshare_ent_url%26shareId%3DKEdLTyAORsGrs2BKq5uLDg%253D%253D)
## Data Analysis and Manipulation
Learning how to manipulate data outside of existing tools like Excel or Power BI quickly give you data super powers you never thought possible before. Breaking out of the four walls of excel and into the data universe by leveraging languages like Python and R unlock so much more potential for impact in whatever job you do. Even if you don't plan to build your own Machine Learning models, knowing the basics of data manipulation is an important skill to have, and builds a data foundation that Machine Learning is built upon if you ever want to come back and start building models.
### Python
- [Intro to Programming: Kaggle 🏫](https://www.kaggle.com/learn/intro-to-programming)
- [Intro to Python: Kaggle 🏫](https://www.kaggle.com/learn/python)
- [Intro to Pandas: Kaggle 🏫](https://www.kaggle.com/learn/pandas)
- [Intro to Data Visualization: Kaggle 🏫](https://www.kaggle.com/learn/data-visualization)
- [Python for Data Analysis 📕](https://wesmckinney.com/book/)
- [Exploratory Reports 📹](https://www.youtube.com/watch?v=-Cdv9C9hLeE)
### R
- [New to R? Start Here 📕](https://www.bigbookofr.com/new-to-r-start-here.html)
- [Basics of R 📕](https://rstudio-education.github.io/hopr/index.html)
- [Manipulating Data 📕](https://r4ds.had.co.nz/index.html) (skip "Model" chapter)
- [Automating Excel in R 📹](https://www.youtube.com/watch?v=EMSkZOF-ZG8)
- [Business Reporting in RMarkdown 📹](https://www.youtube.com/watch?v=mszKt0i4yuY)
- [Exploratory Reports 📹](https://www.youtube.com/watch?v=ssVEoj54rx4)
- [Fundamental of Data Visualization 📕](https://clauswilke.com/dataviz/)
- [R Graphics Cookbook 📕](https://r-graphics.org/)
### Additional Resources
- [Explore and Analyze Data in Python: Microsoft 📃](https://docs.microsoft.com/en-us/learn/modules/explore-analyze-data-with-python/) - Python
- [Python for Excel 📕](https://www.oreilly.com/library/view/python-for-excel/9781492080992/) - Python
- [Python Crash Course 📕](https://www.amazon.com/Python-Crash-Course-2nd-Edition/dp/1593279280) - Python
- [Advancing Into Analytics 📕](https://www.oreilly.com/library/view/advancing-into-analytics/9781492094333/) - Python/R
- [Python and R for the Modern Data Scientist 📕](https://www.oreilly.com/library/view/python-and-r/9781492093398/) - Python/R
- [R Programming Tutorial: Learn the Basics of Statistical Computing 📹](https://www.youtube.com/watch?v=_V8eKsto3Ug&list=PLWKjhJtqVAblQe2CCWqV4Zy3LY01Z8aF1&index=9) - R
- [Data Transformation Cheat Sheet 📃](https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf) - R
- [R Basics: Harvard 🏫](https://www.classcentral.com/course/edx-data-science-r-basics-9253) - R
## Version Control
If you plan to work with others on any project that contains code, knowing version control and specifically git is a must. Get up to speed with how to use git and it's most famous git server, GitHub. This skill opens up new opportunities to contribute to open source projects and even build your own open source software. It's also required to work on any technical team who collaborate on projects together.
### High Level Overview
- [Git and GitHub for Beginners 📹](https://www.youtube.com/watch?v=RGOj5yH7evk&t=491s)
- [Git for Professionals 📹](https://www.youtube.com/watch?v=Uszj_k0DGsg)
### Python
- [VSCode Github Project Setup 📹](https://www.youtube.com/watch?v=e-qQDuswx2I)
### R
- [Happy Git and GitHub for the useR 📕](https://happygitwithr.com/)
- [Setup R Project from GitHub 📹](https://www.youtube.com/watch?v=F7aYV0RPyD0)
### Additional Resources
- [Contribute to an Open-Source Project on GitHub 📃](https://docs.microsoft.com/en-us/learn/modules/contribute-open-source/)
## Machine Learning Basics
Let's get our feet wet on the introductory concepts of machine learning. Learn more of the terminology, build a few models, and start to understand how the data science life cycle starts to take shape. This section is by no means a comprehensive view of machine learning today, but it's a good starting point.
Future sections will cover most of these topics again but in more depth. Having some repetition of terms and concepts will help reinforce the knowledge in your brain and help you understand how there is always different angles to attack data problems with machine learning.
### High Level Topics
- [Machine Learning Crash Course 🏫](https://ml.berkeley.edu/blog/tag/crash-course)
- [Building AI 🏫](https://buildingai.elementsofai.com/)
- [Three Things to do when Starting Out in Data Science 📹](https://www.youtube.com/watch?v=ilUbD7EoQnk)
- [Types of ML Models 📹](https://www.youtube.com/watch?v=yN7ypxC7838)
- [Gradient Descent: Step-by-Step 📹](https://www.youtube.com/watch?v=sDv4f4s2SB8)
- [ML Fundamentals: Cross Validation 📹](https://www.youtube.com/watch?v=fSytzGwwBVw)
- [Stats Fundamentals 📹](https://www.youtube.com/playlist?list=PLblh5JKOoLUK0FLuzwntyYI10UQFUhsY9)
- [Introduction to Probability for Data Science 🏫](https://probability4datascience.com/index.html)
### Python
- [Intro to Machine Learning: Kaggle 🏫](https://www.kaggle.com/learn/intro-to-machine-learning)
- [Intermediate Machine Learning: Kaggle 🏫](https://www.kaggle.com/learn/intermediate-machine-learning)
- [Feature Egnineering: Kaggle 🏫](https://www.kaggle.com/learn/feature-engineering)
- [Data Cleaning: Kaggle 🏫](https://www.kaggle.com/learn/data-cleaning)
- [Scikit-Learn Crash Course 📹](https://www.youtube.com/watch?v=0B5eIE_1vpU)
### R
- [Tidy Modeling with R 📕](https://www.tmwr.org/index.html)
- [Intro to ML with Parsnip 📹](https://www.youtube.com/watch?v=2Zcwa7HPg5w&list=PLo32uKohmrXvDwyyty6pC4mcWER5tSdmO&index=10)
- [Supervised ML Case Studies in R 🏫](https://supervised-ml-course.netlify.app/)
- [Exploratory Data Analysis 📹](https://www.youtube.com/watch?v=ssVEoj54rx4)
- [Business Reporting in R with Rmarkdown 📹](https://www.youtube.com/watch?v=mszKt0i4yuY)
### Additional Resources
- [PCA Main Ideas 📹](https://www.youtube.com/watch?v=HMOI_lkzW08)
- [Introduction to Machine Learning: Microsoft 🏫](https://docs.microsoft.com/en-us/learn/modules/introduction-to-machine-learning/) - Python
- [Introduction to Machine Learning : Udemy 🏫](https://www.classcentral.com/course/udemy-introduction-to-data-science-using-python-25723) - Python
- [Machine Learning for Beginners 📕](https://github.com/microsoft/ML-For-Beginners) - Python
- [Best Python Machine Learning Libraries 📃](https://github.com/ml-tooling/best-of-ml-python) - Python
- [Analytical Skills for AI & Data Science 📕](https://learning.oreilly.com/library/view/analytical-skills-for/9781492060932/) - Python
- [Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow 📕](https://learning.oreilly.com/library/view/hands-on-machine-learning/9781492032632/) - Python
- [Avoiding Machine Learning Mistakes: LinkedIn Learning 🏫](https://www.linkedin.com/learning-login/share?account=3322&forceAccount=false&redirect=https%3A%2F%2Fwww.linkedin.com%2Flearning%2Fmistakes-to-avoid-in-machine-learning%3Ftrk%3Dshare_ent_url%26shareId%3D8%252BBs8riLTDmpKQwusvciAQ%253D%253D) - Python
- [Microsoft Approved Data Science Learning Resources 📃](https://medium.com/data-science-at-microsoft/data-science-learning-resources-193ccf6fafb) - Python/R
- [Introduction to Statistical Learning 📕](https://www.statlearning.com/) - R
- [Companion Book to Introduction to Statistical Learning 📃](https://emilhvitfeldt.github.io/ISLR-tidymodels-labs/index.html) - R
- [R Cheat Sheet 📃](https://www.business-science.io/r-cheatsheet?utm_content=buffer832d4&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer) - R
- [Practical Data Science with R 📕](https://learning.oreilly.com/library/view/practical-data-science/9781617295874/) - R
- [Modern Data Science with R 📕](https://mdsr-book.github.io/mdsr2e/) - R
## Regression {#regression}
Regression deals with predicting numerical quantities. It will quickly become your bread and butter for leveraging machine learning in finance. Understanding how to use software packages to train models and how each model works are both crucial to leveraging regression techniques to the fullest. Most of the resources here deal with examples of regression in action. Take time to soak in how these tutorials and experts approach a regression problem, how they structure their code, and the way they communicate the outputs.
### High Level Topics
- [Making Friends with Regression 📹](https://www.youtube.com/watch?v=WNvOtwP_yf4)
- [Evaluation Metrics for Regression Models 📃](https://www.analyticsvidhya.com/blog/2021/05/know-the-best-evaluation-metrics-for-your-regression-model/#:~:text=%20Know%20The%20Best%20Evaluation%20Metrics%20for%20Your,is%20clear%20by%20the%20name%20itself%2C...%20More%20)
### Python
- [Train and Evaluate Regression Models: Microsoft 🏫](https://docs.microsoft.com/en-us/learn/modules/train-evaluate-regression-models/)
- [Train and Understand Regression Models: Microsoft 🏫](https://docs.microsoft.com/en-us/learn/modules/understand-regression-machine-learning/)
- [E-Commerce Tutorial 📃](https://github.com/ishikkkkaaaa/ML-Projects/tree/main/1-E%20COMMERCE)
- [USA Housing Tutorial 📃](https://github.com/ishikkkkaaaa/ML-Projects/tree/main/2-USA%20housing)
### R
- [Lasso Regression Tutorial 📃](https://juliasilge.com/blog/lasso-the-office/)
- [Tune and Interpret Decision Trees Tutorial 📃](https://juliasilge.com/blog/wind-turbine/)
- [Tune Random Forests Tutorial 📃](https://juliasilge.com/blog/ikea-prices/)
- [Custom Metric Evaluation Tutorial 📃](https://juliasilge.com/blog/nyc-airbnb/)
- [Bagging Tutorial 📃](https://juliasilge.com/blog/astronaut-missions-bagging/)
- [Using Text as Features Tutorial 📃](https://juliasilge.com/blog/tate-collection/)
### How Various Models Work
- [Fitting a Line to Data: Least Squares 📹](https://www.youtube.com/watch?v=PaFPbb66DxQ)
- [Linear Models #1 📹](https://www.youtube.com/watch?v=nk2CQITm_eo)
- [Linear Models #2 📹](https://www.youtube.com/watch?v=zITIFTsivN8)
- [Regularization #1 📹](https://www.youtube.com/watch?v=Q81RR3yKn30)
- [Regularization #2 📹](https://www.youtube.com/watch?v=NGf0voTMlcs)
- [Regularization #3 📹](https://www.youtube.com/watch?v=1dKRdX9bfIo)
- [Decision Trees 📹](https://www.youtube.com/watch?v=g9c66TUylZ4)
- [Random Forest 📹](https://www.youtube.com/watch?v=J4Wdy0Wc_xQ)
- [Gradient Boost 📹](https://www.youtube.com/watch?v=3CC4N4z3GJc)
- [XGBoost 📹](https://www.youtube.com/watch?v=OtD8wVaFm6E)
### Additional Resources
- [Regression and Other Stories 📕](https://avehtari.github.io/ROS-Examples/index.html)
- [Fitting a Curve to Data 📹](https://www.youtube.com/watch?v=Vf7oJ6z2LCc)
- [Linear Regression and Gradient Descent: Stanford 📹](https://www.youtube.com/watch?v=4b4MUYve_U8)
## Time Series
Time series forecasting is a sub domain of regression, where we are trying to forecast a numerical quantity over time. Prediction over time is a separate world in machine learning, and has deep roots in more classic statistical methods.
While most regression models can be turned into a time series model by incorporating various date based features, there are also traditional statistical models that have been solely used for time series forecasting for decades. An interesting component of time series forecasting is that it can use multivariate data as well as univariate. For example you could forecast sales revenue by just using previous historical values of sales revenue (univariate) or use external regressor information like country holidays and population size to help forecast (multivariate). Knowing both types of models is a key component of being an expert time series practitioner.
### High Level Topics
- [Time Series Forecasting with Machine Learning 📹](https://www.youtube.com/watch?v=_ZQ-lQrK9Rg)
- [Forecasting: Principles and Practice 📕](https://otexts.com/fpp3/intro.html) (Chapter 1)
### Python
- [Intro to Time Series: Kaggle 🏫](https://www.kaggle.com/learn/time-series)
- [Time Series Analysis Overview 📹](https://www.youtube.com/watch?v=cKzXOOtOXYY&t=7s)
- [Time Series Analysis Deep Dive 📹](https://www.youtube.com/playlist?list=PL-bdv-10yhrPGf9mNzQJvTc3yhhwpfOAz)
- [Sktime Documentation 📃](https://www.sktime.org/en/stable/index.html)
### R
- [Forecasting: Principles and Practice 📕](https://otexts.com/fpp3/)
- [Introduction to Modeltime: Forecasting with Tidymodels 📹](https://www.youtube.com/watch?v=-bCelif-ENY)
- [High Performance Time Series Forecasting 📹](https://www.youtube.com/watch?v=elQb4VzRINg)
- [Arima Forecasting in R 📹](https://www.youtube.com/watch?v=3znQUrREUC8)
- [Forecasting Multiple Time Series with Modeltime 📹](https://www.youtube.com/watch?v=6RjYIOCnRMk)
- [Plotting Time Series in R 📹](https://www.youtube.com/watch?v=Nf8FwFCJz2c)
- [Microsoft Finance Time Series Forecast Framework: finnts 📃](https://microsoft.github.io/finnts/)
### How Various Models Work
- All regression models in the [Regression](#regression) chapter can be turned into time series models
- [Arima 📕](https://otexts.com/fpp3/arima.html)
- [Exponential Smoothing 📕](https://otexts.com/fpp3/expsmooth.html)
### Additional Learning Resources
- [Forecasting: Theory and Practice 📃](https://www.sciencedirect.com/science/article/pii/S0169207021001758)
- [Analyzing Time Series Data 📃](https://observablehq.com/collection/@observablehq/analyzing-time-series-data)
- [Machine Learning for Time Series with Python 📹](https://www.youtube.com/watch?v=cBojo1hsHiI) - Python
- [Practical Time Series Analysis 📕](https://learning.oreilly.com/library/view/practical-time-series/9781492041641/) - Python
- [Darts Package 📃](https://unit8co.github.io/darts/index.html) - Python
- [Various Resources 📕](https://www.bigbookofr.com/time-series-analysis-and-forecasting.html) - R
- [Time Series Forecasting: Business Science University 🏫](https://university.business-science.io/p/ds4b-203-r-high-performance-time-series-forecasting) - R
## Classification
Classification models try to forecast an outcome of an event. For example if a credit card transaction is fraud or if a self-driving car sees a stop sign next to the road. Usually the prediction outcome is a binary yes or no, and oftentimes a probability score between 0 and 1. With 1 having a 100% probability of something occurring. Classification models can even predict an outcome across multiple categories or buckets, like if a picture of a fruit is an apple, pear, orange, etc.
Classification models are some of the most widely used machine learning across industries today. Within finance there are many important implementations that range from compliance to risk management.
### High Level Topics
- [Classification in Machine Learning 📹](https://www.youtube.com/watch?v=pXdum128xww) (skip tutorial at end)
- [Confusion Matrix 📹](https://www.youtube.com/watch?v=Kdsp6soqA7o)
- [Sensitivity and Specificity 📹](https://www.youtube.com/watch?v=vP06aMoz4v8)
- [ROC and AUC 📹](https://www.youtube.com/watch?v=4jRBRDbJemM)
### Python
- [Train and Evaluate Classification Models: Microsoft 🏫](https://docs.microsoft.com/en-us/learn/modules/train-evaluate-classification-models/)
- [Confusion Matrix and Class Imbalance: Microsoft 🏫](https://docs.microsoft.com/en-us/learn/modules/machine-learning-confusion-matrix/)
- [Measure and Optimize Model Performance: Microsoft 🏫](https://docs.microsoft.com/en-us/learn/modules/optimize-model-performance-roc-auc/)
- [Hyperparameter Tuning with Random Forest: Microsoft 🏫](https://docs.microsoft.com/en-us/learn/modules/machine-learning-architectures-and-hyperparameters/)
- [Titanic Tutorial 📃](https://github.com/ishikkkkaaaa/ML-Projects/tree/main/3-Titanic%20project)
- [Advertising Tutorial 📃](https://github.com/ishikkkkaaaa/ML-Projects/tree/main/4-advertising%20project(logitic%20regression))
- [Loan Tutorial 📃](https://github.com/ishikkkkaaaa/ML-Projects/tree/main/6-Loan%20project)
### R
- [Logistic Regression 📹](https://www.youtube.com/watch?v=Qi-sVE0SWFc)
- [Hotel Bookings Tutorial 📃](https://juliasilge.com/blog/hotels-recipes/)
- [Water Source Availability Tutorial 📃](https://juliasilge.com/blog/water-sources/)
- [Beach Volleyball Tutorial 📃](https://juliasilge.com/blog/xgboost-tune-volleyball/)
- [Volcano Eruptions Tutorial 📃](https://juliasilge.com/blog/multinomial-volcano-eruptions/)
- [Food Consumption Tutorial 📃](https://juliasilge.com/blog/food-hyperparameter-tune/)
### How Various Models Work
- [Logistic Regression 📹](https://www.youtube.com/watch?v=yIYKR4sgzI8)
- [K-Nearest Neighbors 📹](https://www.youtube.com/watch?v=HVXime0nQeI)
- [Decision Trees #1 📹](https://www.youtube.com/watch?v=_L39rN6gz7Y)
- [Decision Trees #2 📹](https://www.youtube.com/watch?v=wpNl-JwwplA)
- [Random Forest 📹](https://www.youtube.com/watch?v=J4Wdy0Wc_xQ)
- [Gradient Boost 📹](https://www.youtube.com/watch?v=jxuNLH5dXCs)
- [XGBoost 📹](https://www.youtube.com/watch?v=8b1JEDvenQU)
### Additional Learning Resources
- [Locally Weighted and Logistic Regression: Stanford 📹](https://www.youtube.com/watch?v=8b1JEDvenQU)
- [Supervised Machine Learning: Coursera 🏫](https://www.classcentral.com/course/supervised-learning-classification-20945) - Python
- [Classification Bootcamp: Udemy 🏫](https://www.classcentral.com/course/udemy-machine-learning-classification-38705) - Python
- [Machine Learning Classification: Coursera 🏫](https://www.classcentral.com/course/ml-classification-4219) - Python
## Unsupervised Learning
Unsupervised learning is an evolving field of machine learning, and many say is the future of AI in general. Instead of relying on existing data with known outcomes to learn from like supervised learning (regression and classification), unsupervised learning tries to learn its own unique things about a data set without needing to know the answer ahead of time. This can be a game changer in finance when trying to segment customers into specific groups based on their purchasing behavior or finding anomalies to flag for potential fraud or corruption.
### Python
- [Train and Evaluate Clustering Models 🏫](https://docs.microsoft.com/en-us/learn/modules/train-evaluate-cluster-models/)
- [PCA Tutorial 📃](https://github.com/ishikkkkaaaa/ML-Projects/tree/main/7-Breast%20cancer)
### R
- [PCA Tutorial 📃](https://juliasilge.com/blog/un-voting/)
- [PCA and UMAP Tutorial📃](https://juliasilge.com/blog/cocktail-recipes-umap/)
- [Visualizing PCA in R 📹](https://www.youtube.com/watch?v=X4wsXba_tZI)
### How Various Models Work
- [K-Means Clustering 📹](https://www.youtube.com/watch?v=4b5d3muPQmA)
- [PCA 📹](https://www.youtube.com/watch?v=FgakZw6K1QQ&list=PLblh5JKOoLUIcdlgu78MnlATeyx4cEVeR&index=1)
## Natural Language Processing
Natural language processing (NLP) is all about extracting insight from unstructured data in the form of text. Our world is drowning in openly available text from twitter, blogs, and countless documents like PDFs that could be useful within our jobs in finance. Knowing how to extract insights out of a pile of documents is a super power worth learning about!
### Python
- [Explore Natural Language Processing in Azure: Microsoft 📃](https://docs.microsoft.com/en-us/learn/paths/explore-natural-language-processing/)
### R
- [Text Mining in R 📕](https://www.tidytextmining.com/)
- [Text Mining with Tidy Data Principles 📕](https://juliasilge.shinyapps.io/learntidytext/)
- [Supervised Machine Learning for Text Analysis 📕](https://smltar.com/)
### Additional Resources
- [Practical Natural Language Processing 📕](https://learning.oreilly.com/library/view/practical-natural-language/9781492054047/) - Python
- [Natural Language Processing with Python and spaCy 📕](https://learning.oreilly.com/library/view/natural-language-processing/9781098122652/) - Python
- [Applied Text Analysis 📕](https://learning.oreilly.com/library/view/applied-text-analysis/9781491963036/) - Python
- [Introduction to Natural Language Processing with PyTorch: Microsoft 📃](https://docs.microsoft.com/en-us/learn/modules/intro-natural-language-processing-pytorch/) - Python
## Deep Learning
The most rapidly evolving area of AI is deep learning, which use a completely new modeling architecture called neural networks. Most of the most exciting advancements in AI over the last decade have come from training neural networks on huge data sets. Deep learning has the potential to totally change how we build any type of prediction across all types of machine learning.
### High Level Topics
- [Deep Learning Crash Course 📹](https://www.youtube.com/watch?v=VyWAvY2CF9c)
- [How Neural Networks Learn 📹](https://www.youtube.com/playlist?list=PLblh5JKOoLUIxGDQs4LFFD--41Vzf-ME1)
### Python
- [Intro to Deep Learning: Kaggle 🏫](https://www.kaggle.com/learn/intro-to-deep-learning)
- [Practical Deep Learning for Coders 🏫](https://course.fast.ai/)
- [Train and Evaluate Deep Learning Models: Microsoft 🏫](https://docs.microsoft.com/en-us/learn/modules/train-evaluate-deep-learn-models/)
- [Deep Learning in Tensorflow 🏫](https://www.udacity.com/course/intro-to-tensorflow-for-deep-learning--ud187)
- [Yann LeCun's NYU Deep Learning Course 🏫](https://cds.nyu.edu/deep-learning/)
### R
- [Deep Learning with Tidymodels, Torch, and Tabnet 📹](https://www.youtube.com/watch?v=GuboAGHDgas)
### Additional Resources
- [Andrew Ng: Deep Learning, Education, and Real-World AI 📹](https://www.youtube.com/watch?v=0jspaMLxBig)
- [Nuts and Bolts of Applying Deep Learning 📹](https://www.youtube.com/watch?v=F1ka6a13S9I)
- [History of Deep Learning 📹](https://www.youtube.com/watch?v=mTtDfKgLm54)
- [Visual Introduction to Deep Learning 📕](https://kdimensions.gumroad.com/l/visualdl)
- [Deep Learning for Coders with fastai and PyTorch 📕](https://learning.oreilly.com/library/view/deep-learning-for/9781492045519/) - Python
- [Deep Learning: Coursera 🏫](https://www.coursera.org/specializations/deep-learning#courses) - Python
- [Computer Vision Tutorial: Kaggle 🏫](https://www.kaggle.com/learn/computer-vision) - Python
- [Deep Learning with Tensorflow 📕](https://learning.oreilly.com/library/view/deep-learning-with/9781617296864/) - Python
- [Deep Learning with R 📕](https://learning.oreilly.com/library/view/deep-learning-with/9781617295546/) - R
## Model Interpretability
A lot of times you may be asked to help understand how a particular machine learning model came up with its prediction. Knowing how to leverage various interpretability frameworks helps decode the black box of these models for better adoption by non-technical business partners and enables better understanding what features have the most impact in your model.
### High Level Topics
- [Interpretable Machine Learning 📕](https://christophm.github.io/interpretable-ml-book/)
- [Intro to SHAP 📃](https://www.aidancooper.co.uk/a-non-technical-guide-to-interpreting-shap-analyses/)
### Python
- [Machine Learning Explainability: Kaggle 🏫](https://www.kaggle.com/learn/machine-learning-explainability)
- [interpretML 📃](https://github.com/interpretml/interpret)
### R
- [Explanatory Model Analysis 📕](https://ema.drwhy.ai/)
- [Model Interpretability Tutorial 📃](https://juliasilge.com/blog/wind-turbine/)
- [Partial Dependence Plots with Tidymodels and DALEX 📃](https://juliasilge.com/blog/mario-kart/)
## AI Ethics and Fairness
With great power, comes great responsibility. As machine learning becomes more ingrained in our society, ethical consequences of poorly deployed models will only increase. Make sure you are building models that help enrich a diverse and inclusive future by checking out the below resources.
### High Level Topics
- [Intro to AI Ethics: Kaggle 🏫](https://www.kaggle.com/learn/intro-to-ai-ethics)
- [Fairness and Machine Learning 📕](https://fairmlbook.org/)
- [What Happens when an Algorithm Cuts Your Healthcare 📃](https://www.theverge.com/2018/3/21/17144260/healthcare-medicaid-algorithm-arkansas-cerebral-palsy)
### Python
- [Fairlearn 📃](https://fairlearn.org/)
- [Data Ethics in Deep Learning 📹](https://course.fast.ai/videos/?lesson=5)
## Web Apps
Building user interfaces that bring machine learning models directly to the end user to consume code free can be a total game changer for your business partners. You don't have to be a web developer to build applications that your users will love thanks to some amazing packages within the data science community. Check them out below.
### Python
- [Python Web Applications with Flask 📹](https://www.youtube.com/watch?v=Qr4QMBUPxWo)
- [Build 12 Data Science Apps with Python and Streamlit 📹](https://www.youtube.com/watch?v=JwSS70SZdyM)
### R
- [A Gentle Introduction to creating R Shiny Web Apps 📹](https://www.youtube.com/watch?v=jxsKUxkiaLI)
- [Shiny Walkthrough 📹](https://www.youtube.com/watch?v=eoeLn8SyDW8)
- [Building Predictive Web Applications with R Shiny 📹](https://www.youtube.com/watch?v=oegRVT262Ig)
- [Build Interactive Data-Driven Web Apps With R Shiny 📃](https://www.freecodecamp.org/news/build-interactive-data-driven-web-apps-with-r-shiny/)
- [Engineering Production Grade Shiny Apps 📕](https://engineering-shiny.org/)
- [Outstanding User Interfaces with Shiny 📕](https://unleash-shiny.rinterface.com/index.html)
## Production on Azure
One of the harder aspects of machine learning is getting your work in a production environment to run at scale. This involves loading models to run in a cloud like Microsoft Azure.
### Data Storage
- [Azure Data Lake Storage 📃](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction)
- [Azure SQL 📃](https://azure.microsoft.com/en-us/products/azure-sql/#product-overview)
### General Data Analytics
- [Azure Synapse 📃](https://azure.microsoft.com/en-us/services/synapse-analytics/?OCID=AID2200277_SEM_7a4e0c2545c71de6d91d6d59687840c2:G:s&ef_id=7a4e0c2545c71de6d91d6d59687840c2:G:s&msclkid=7a4e0c2545c71de6d91d6d59687840c2)
- [Azure Databricks 📃](https://azure.microsoft.com/en-us/services/databricks/#features)
- [Spark 📕](https://spark.apache.org/docs/latest/api/python/index.html) - Python
- [Spark 📕](https://therinspark.com/) - R
### Machine Learning
- [Intro to Azure ML 📃](https://azure.microsoft.com/en-us/services/machine-learning/#product-overview)
- [Automated Machine Learning 📃](https://docs.microsoft.com/en-us/azure/machine-learning/concept-automated-ml)
- [ML Pipelines 📃](https://docs.microsoft.com/en-us/azure/machine-learning/concept-ml-pipelines)
- [Build and operate machine learning solutions with Azure Machine Learning: Microsoft 🏫](https://docs.microsoft.com/en-us/learn/paths/build-ai-solutions-with-azure-ml-service/) - Python
### Additional Resources
- [Azure Friday 📹](https://www.azurefriday.com/)
## Life as a Data Scientist
Ready to commit to data science as a career? Check out the below content that features interviews from existing data scientists and best practices to be a great data practitioner.
### Build Models and Build Community
To-DO
### Coding Best Practices
- [Foundations for Best Practices in Machine Learning 📃](https://www.fbpml.org/home)
- [Best Practice for Writing Code Comments 📃](https://stackoverflow.blog/2021/07/05/best-practices-for-writing-code-comments/?utm_medium=email&utm_source=topic+optin&utm_campaign=awareness&utm_content=20210828+prog+nl&mkt_tok=MTA3LUZNUy0wNzAAAAF_LK1OjKlqDG9SxsVfzMCGmwO5ZelBS-fBHGWHz3FyjGzK_06soVvuT7ljzYAZYITpbG8uwSSDfuC45Z0MP5qLgyunNstZqBDys4jaWHFFSczZ)
### Getting a Job
- [Career and Community Resources 📕](https://www.bigbookofr.com/career-and-community.html)
- [Six Figure Data Scientist 📕](https://www.sixfiguredatascientist.com/book-agf883)
- [Deep Learning Interview Questions 📃](https://arxiv.org/abs/2201.00650)
### Additional Resources
- [An Old Hacker's Tips on Staying Employed 📃](https://madned.substack.com/p/an-old-hackers-tips-on-staying-employed)
- [Lessons from Data Scientists: LinkedIn Learning 🏫](https://www.linkedin.com/learning-login/share?account=3322&forceAccount=false&redirect=https%3A%2F%2Fwww.linkedin.com%2Flearning%2Flessons-from-data-scientists%3Ftrk%3Dshare_ent_url%26shareId%3Dz8kwfyJsRZypq%252FE5bEEjdQ%253D%253D)
- [Building Data Science Teams 📕](https://learning.oreilly.com/library/view/building-data-science/BLDNGDST0001/)
- [Advocate for Data Science at Your Company 📃](https://www.rstudio.com/champion)