| Learning outcome/ competencies | Building on the foundational knowledge acquired in the first semester, this course is designed to deepen students' understanding and skills in statistical programming in Python. By the end of the module, students possess a robust ability to implement statistical procedures in Python within a professional coding environment, and they will be equipped to contribute effectively in programming teams. A substantial amount of classes is reserved for hands-on exercises in Python programming, - done in jupyter notebooks
- in a number of real-world examples as well as
- in practical case studies similar to what one would encounter on the job,
- using, among other things, scikit, sklearn, Kaggle, GridSearchCV, RandomSearchCV, high-dimensional datasets, treepackages such as xgboost, LightGBM and CatBoost
Topics include: - Error-Free, Well-Structured and Transparent Programming in Python
- Data access and manipulation using Databases
- Advanced Data visualization techniques
- Collaborative Programming
- Teamwork using version control systems like Git and platforms such as GitHub
- Statistical Techniques
- Recap of basic statistics, including descriptive statistics, statistical tests, contingency tables, and correlation
- Advanced methods like robust regression models, outlier detection, variable transformation, missing value imputation, and dimensionality reduction
- Techniques including cross-validation, k-fold and bootstrap
- Deploying models to production (beyond notebooks)
- Model monitoring in production
- Specific applications:
- Churn Prediction
- Natural Language Processing (NLP), Embeddings, Retrieval Augmented Generation (RAGs)
- Time Series Forecasting
- Intro to LLMs
|
|---|