Foreword
In the present competitive and efficiency-oriented economic realm, credit scoring plays an important role, impacting numerous aspects of economic activities and driving their successes. From the level of self-employment, through micro, small, medium size and large companies and institutions, to even multinational enterprises and institutions up to entire countries.
A profound understanding of credit scoring nowadays has become a must for every person who is professionally involved in dealing with problems related to decision making or risk management in financial institutions.
Since its conception, credit scoring has traveled a long way from statistical and probabilistic analyses and decision-analytic approaches that have characterized earlier traditional methods. In recent years, maybe decades, the complexity of credit scoring and its related aspects has grown rapidly. Moreover, the wealth of data that can currently be collected by financial institutions and their customers, as well as various institutions and agencies, has changed the landscape by the emergence of the so-called data-driven technology, better access to a wider variety of data, increased computing power, etc. All these have initiated a new wave of novel approaches to credit scoring, as well as a possibility to include in the analyses further aspects such as the pricing of financial services to reflect the risk profile of the individual, company or institution, etc.
In general, credit scoring has evolved in recent years - as a result of the data-driven revolution - from traditional approaches based on statistics, probability, decision analysis, etc. to new ones built on artificial intelligence, notably machine learning. This sparked a great interest in these modern approaches to credit scoring, laying foundation for a considerable research effort that has resulted in many publications as well as practical applications.
However, the addition of more methods to the toolbox of a contemporary credit scoring modeler poses new challenges related to proper procedures of developing, validating, monitoring, and finally implementing such models and approaches in practice.
This volume tackles the problems related to the above-mentioned new directions, challenges, difficulties and opportunities in modern credit scoring in an interesting, innovative and constructive fashion. Credit scoring in context of interpretable machine learning is a rare, extraordinary entry in the world literature that presents combination of explanation of theoretical concepts and approaches, as well as contemporary scoring practices.
The text covers both classical credit scoring methods and new methodological and procedural approaches having their roots in machine learning. Such a vast range of topics cannot be developed by an individual - it requires a multidisciplinary team capable of combining individual expertise, experience and skills, as well as an ability to share the courage to respond to a need of the community for a monograph in this area that would be simultaneously methodologically sound and friendly for the reader.
The volume is, therefore, a collection of chapters which have been written by the members of the team having different backgrounds, expertise, experience, as well as professional and research interests. The authors share the research philosophy and policy of Decision Analysis and Support Unit at SGH Warsaw School of Economics that emphasizes that at the center of any application of mathematical models, there should be a clear vision of how they will be used to aid and support decision making.
The monograph starts with a presentation of the historical and organizational setting of credit scoring and a critical review of data-related processes that are relevant for preparing credit scoring models. Afterward, being aware of the recent data-driven revolution, the exposition moves to the presentation of selected machine learning methods that can be used for credit scoring, with a special emphasis on variable selection methods which concern one of the key challenges of the modeling practice. The third group of topics that is covered are analytical tasks that are typically undertaken when a credit scoring model has been built, that is the model's performance evaluation, model monitoring and methods allowing to understand how complex machine learning models produce their forecasts. This is a crucial issue due to the fact that virtually all machine learning methods are of a black (grey) box type in their functioning and hence the results obtained, may be not comprehensible to the user. Approaches to overcome this inherent difficulty attract much interest in recent years and a new field of the so-called Explainable Artificial Intelligence (XAI) has emerged. The monograph is completed with a review of key aspects related to the deployment of credit scoring models in complex IT infrastructures, with emphasis on the most important problems related to the performance and scalability of the scoring process, as well as architectures and processes that can be used while implementing credit scoring models in the development of decision engines.
The editors and authors of the particular chapters have to be congratulated for producing a captivating read and useful volume. Their contributions present well-chosen aspects of modern approaches to credit scoring up to the highest academic standards, yet in a comprehensible and constructive fashion. They focused on the new and promising data-driven approaches, notably based on machine learning which will certainly dominate in the years to come.
Professor Janusz Kacprzyk, Ph.D., D.Sc.
Polish Academy of Sciences
Spanish Royal Academy of Economic and Financial Sciences
Bulgarian Academy of Sciences
Finnish Society of Sciences and Letters
[[[separator]]]
Contents
Foreword
Preface
1 Background of the credit scoring
Daniel Kaszyński
1.1 History of credit scoring
1.2 Classical scorecard
1.3 Data analytics revolution and its challenges
1.4 People and processes
2 Data processing for credit scoring
Maciej Kwiatkowski
2.1 Data management
2.2 Data sources
2.3 Data quality assurance
2.4 Data pre-processing
2.5 Conclusions
3 Variable selection methods
Karol Przanowski Sebastian Zając Daniel Kaszyński Łukasz Opiński
3.1 The importance of variable pre-selection
3.2 Comparison measures for the variable selection methods
3.3 Variable selection methods
3.4 Numerical experiment of variable selection
3.5 Conclusions
4 Selected machine learning methods used for credit scoring
Małgorzata Wrzosek Daniel Kaszyński Karol Przanowski Sebastian Zając
4.1 Classical credit scoring models
4.2 Machine learning for credit scoring
4.3 Frameworks for model development
4.4 Numerical results of models
4.5 Conclusions
5 Sensitivity of machine learning methods to data issues
Daniel Kaszyński Kinga Siuta Bogumił Kamiński
5.1 Outlying observations
5.2 Missing data
5.3 Selection of the target variable
5.4 Multicollinearity problem
5.5 The Simpson's Paradox
5.6 New categories in categorical variables
5.7 Complete separation problem
5.8 Too granular categorical data
5.9 Coding ratios
5.10 Conclusions
6 Model performance evaluation and model monitoring
Daniel Kaszyński Małgorzata Wrzosek Kamil Cerazy
6.1 The importance of validation and monitoring
6.2 Validation and monitoring process
6.3 Qualitative methods for credit scoring models validation
6.4 Quantitative methods for credit scoring models validation
6.5 Additional validation dimensions
6.6 Conclusions
7 Model interpretability and explainability
Marcin Chlebus Marta Kłosok Przemysław Biecek
7.1 Shapley values and Break-down
7.2 Permutation Feature Importance
7.3 Ceteris Paribus Plot/Individual Conditional Expectation
7.4 Partial Dependence Plot
7.5 An empirical example
7.6 Instance level
7.7 Global level
7.8 Extensions of base XAI analysis
7.9 Conclusions
8 Performance considerations and platforms for scoring models
Łukasz Kraiński
8.1 Introduction to computation performance
8.2 Performance areas relevant to scoring models
8.3 Platforms for building scoring models
9 Techniques for implementing models in decision engines
Aleksander Nosarzewski
9.1 Challenges in model deployment
9.2 Methods of exporting model object
9.3 Model deployment
9.4 Good practices in MLOps
9.5 Tools useful for MLOps
Conclusions
Bibliography
List of Tables
List of Figures
Index
About the authors
Opis
Wstęp
Foreword
In the present competitive and efficiency-oriented economic realm, credit scoring plays an important role, impacting numerous aspects of economic activities and driving their successes. From the level of self-employment, through micro, small, medium size and large companies and institutions, to even multinational enterprises and institutions up to entire countries.
A profound understanding of credit scoring nowadays has become a must for every person who is professionally involved in dealing with problems related to decision making or risk management in financial institutions.
Since its conception, credit scoring has traveled a long way from statistical and probabilistic analyses and decision-analytic approaches that have characterized earlier traditional methods. In recent years, maybe decades, the complexity of credit scoring and its related aspects has grown rapidly. Moreover, the wealth of data that can currently be collected by financial institutions and their customers, as well as various institutions and agencies, has changed the landscape by the emergence of the so-called data-driven technology, better access to a wider variety of data, increased computing power, etc. All these have initiated a new wave of novel approaches to credit scoring, as well as a possibility to include in the analyses further aspects such as the pricing of financial services to reflect the risk profile of the individual, company or institution, etc.
In general, credit scoring has evolved in recent years - as a result of the data-driven revolution - from traditional approaches based on statistics, probability, decision analysis, etc. to new ones built on artificial intelligence, notably machine learning. This sparked a great interest in these modern approaches to credit scoring, laying foundation for a considerable research effort that has resulted in many publications as well as practical applications.
However, the addition of more methods to the toolbox of a contemporary credit scoring modeler poses new challenges related to proper procedures of developing, validating, monitoring, and finally implementing such models and approaches in practice.
This volume tackles the problems related to the above-mentioned new directions, challenges, difficulties and opportunities in modern credit scoring in an interesting, innovative and constructive fashion. Credit scoring in context of interpretable machine learning is a rare, extraordinary entry in the world literature that presents combination of explanation of theoretical concepts and approaches, as well as contemporary scoring practices.
The text covers both classical credit scoring methods and new methodological and procedural approaches having their roots in machine learning. Such a vast range of topics cannot be developed by an individual - it requires a multidisciplinary team capable of combining individual expertise, experience and skills, as well as an ability to share the courage to respond to a need of the community for a monograph in this area that would be simultaneously methodologically sound and friendly for the reader.
The volume is, therefore, a collection of chapters which have been written by the members of the team having different backgrounds, expertise, experience, as well as professional and research interests. The authors share the research philosophy and policy of Decision Analysis and Support Unit at SGH Warsaw School of Economics that emphasizes that at the center of any application of mathematical models, there should be a clear vision of how they will be used to aid and support decision making.
The monograph starts with a presentation of the historical and organizational setting of credit scoring and a critical review of data-related processes that are relevant for preparing credit scoring models. Afterward, being aware of the recent data-driven revolution, the exposition moves to the presentation of selected machine learning methods that can be used for credit scoring, with a special emphasis on variable selection methods which concern one of the key challenges of the modeling practice. The third group of topics that is covered are analytical tasks that are typically undertaken when a credit scoring model has been built, that is the model's performance evaluation, model monitoring and methods allowing to understand how complex machine learning models produce their forecasts. This is a crucial issue due to the fact that virtually all machine learning methods are of a black (grey) box type in their functioning and hence the results obtained, may be not comprehensible to the user. Approaches to overcome this inherent difficulty attract much interest in recent years and a new field of the so-called Explainable Artificial Intelligence (XAI) has emerged. The monograph is completed with a review of key aspects related to the deployment of credit scoring models in complex IT infrastructures, with emphasis on the most important problems related to the performance and scalability of the scoring process, as well as architectures and processes that can be used while implementing credit scoring models in the development of decision engines.
The editors and authors of the particular chapters have to be congratulated for producing a captivating read and useful volume. Their contributions present well-chosen aspects of modern approaches to credit scoring up to the highest academic standards, yet in a comprehensible and constructive fashion. They focused on the new and promising data-driven approaches, notably based on machine learning which will certainly dominate in the years to come.
Professor Janusz Kacprzyk, Ph.D., D.Sc.
Polish Academy of Sciences
Spanish Royal Academy of Economic and Financial Sciences
Bulgarian Academy of Sciences
Finnish Society of Sciences and Letters
Spis treści
Contents
Foreword
Preface
1 Background of the credit scoring
Daniel Kaszyński
1.1 History of credit scoring
1.2 Classical scorecard
1.3 Data analytics revolution and its challenges
1.4 People and processes
2 Data processing for credit scoring
Maciej Kwiatkowski
2.1 Data management
2.2 Data sources
2.3 Data quality assurance
2.4 Data pre-processing
2.5 Conclusions
3 Variable selection methods
Karol Przanowski Sebastian Zając Daniel Kaszyński Łukasz Opiński
3.1 The importance of variable pre-selection
3.2 Comparison measures for the variable selection methods
3.3 Variable selection methods
3.4 Numerical experiment of variable selection
3.5 Conclusions
4 Selected machine learning methods used for credit scoring
Małgorzata Wrzosek Daniel Kaszyński Karol Przanowski Sebastian Zając
4.1 Classical credit scoring models
4.2 Machine learning for credit scoring
4.3 Frameworks for model development
4.4 Numerical results of models
4.5 Conclusions
5 Sensitivity of machine learning methods to data issues
Daniel Kaszyński Kinga Siuta Bogumił Kamiński
5.1 Outlying observations
5.2 Missing data
5.3 Selection of the target variable
5.4 Multicollinearity problem
5.5 The Simpson's Paradox
5.6 New categories in categorical variables
5.7 Complete separation problem
5.8 Too granular categorical data
5.9 Coding ratios
5.10 Conclusions
6 Model performance evaluation and model monitoring
Daniel Kaszyński Małgorzata Wrzosek Kamil Cerazy
6.1 The importance of validation and monitoring
6.2 Validation and monitoring process
6.3 Qualitative methods for credit scoring models validation
6.4 Quantitative methods for credit scoring models validation
6.5 Additional validation dimensions
6.6 Conclusions
7 Model interpretability and explainability
Marcin Chlebus Marta Kłosok Przemysław Biecek
7.1 Shapley values and Break-down
7.2 Permutation Feature Importance
7.3 Ceteris Paribus Plot/Individual Conditional Expectation
7.4 Partial Dependence Plot
7.5 An empirical example
7.6 Instance level
7.7 Global level
7.8 Extensions of base XAI analysis
7.9 Conclusions
8 Performance considerations and platforms for scoring models
Łukasz Kraiński
8.1 Introduction to computation performance
8.2 Performance areas relevant to scoring models
8.3 Platforms for building scoring models
9 Techniques for implementing models in decision engines
Aleksander Nosarzewski
9.1 Challenges in model deployment
9.2 Methods of exporting model object
9.3 Model deployment
9.4 Good practices in MLOps
9.5 Tools useful for MLOps
Conclusions
Bibliography
List of Tables
List of Figures
Index
About the authors
Opinie
Foreword
In the present competitive and efficiency-oriented economic realm, credit scoring plays an important role, impacting numerous aspects of economic activities and driving their successes. From the level of self-employment, through micro, small, medium size and large companies and institutions, to even multinational enterprises and institutions up to entire countries.
A profound understanding of credit scoring nowadays has become a must for every person who is professionally involved in dealing with problems related to decision making or risk management in financial institutions.
Since its conception, credit scoring has traveled a long way from statistical and probabilistic analyses and decision-analytic approaches that have characterized earlier traditional methods. In recent years, maybe decades, the complexity of credit scoring and its related aspects has grown rapidly. Moreover, the wealth of data that can currently be collected by financial institutions and their customers, as well as various institutions and agencies, has changed the landscape by the emergence of the so-called data-driven technology, better access to a wider variety of data, increased computing power, etc. All these have initiated a new wave of novel approaches to credit scoring, as well as a possibility to include in the analyses further aspects such as the pricing of financial services to reflect the risk profile of the individual, company or institution, etc.
In general, credit scoring has evolved in recent years - as a result of the data-driven revolution - from traditional approaches based on statistics, probability, decision analysis, etc. to new ones built on artificial intelligence, notably machine learning. This sparked a great interest in these modern approaches to credit scoring, laying foundation for a considerable research effort that has resulted in many publications as well as practical applications.
However, the addition of more methods to the toolbox of a contemporary credit scoring modeler poses new challenges related to proper procedures of developing, validating, monitoring, and finally implementing such models and approaches in practice.
This volume tackles the problems related to the above-mentioned new directions, challenges, difficulties and opportunities in modern credit scoring in an interesting, innovative and constructive fashion. Credit scoring in context of interpretable machine learning is a rare, extraordinary entry in the world literature that presents combination of explanation of theoretical concepts and approaches, as well as contemporary scoring practices.
The text covers both classical credit scoring methods and new methodological and procedural approaches having their roots in machine learning. Such a vast range of topics cannot be developed by an individual - it requires a multidisciplinary team capable of combining individual expertise, experience and skills, as well as an ability to share the courage to respond to a need of the community for a monograph in this area that would be simultaneously methodologically sound and friendly for the reader.
The volume is, therefore, a collection of chapters which have been written by the members of the team having different backgrounds, expertise, experience, as well as professional and research interests. The authors share the research philosophy and policy of Decision Analysis and Support Unit at SGH Warsaw School of Economics that emphasizes that at the center of any application of mathematical models, there should be a clear vision of how they will be used to aid and support decision making.
The monograph starts with a presentation of the historical and organizational setting of credit scoring and a critical review of data-related processes that are relevant for preparing credit scoring models. Afterward, being aware of the recent data-driven revolution, the exposition moves to the presentation of selected machine learning methods that can be used for credit scoring, with a special emphasis on variable selection methods which concern one of the key challenges of the modeling practice. The third group of topics that is covered are analytical tasks that are typically undertaken when a credit scoring model has been built, that is the model's performance evaluation, model monitoring and methods allowing to understand how complex machine learning models produce their forecasts. This is a crucial issue due to the fact that virtually all machine learning methods are of a black (grey) box type in their functioning and hence the results obtained, may be not comprehensible to the user. Approaches to overcome this inherent difficulty attract much interest in recent years and a new field of the so-called Explainable Artificial Intelligence (XAI) has emerged. The monograph is completed with a review of key aspects related to the deployment of credit scoring models in complex IT infrastructures, with emphasis on the most important problems related to the performance and scalability of the scoring process, as well as architectures and processes that can be used while implementing credit scoring models in the development of decision engines.
The editors and authors of the particular chapters have to be congratulated for producing a captivating read and useful volume. Their contributions present well-chosen aspects of modern approaches to credit scoring up to the highest academic standards, yet in a comprehensible and constructive fashion. They focused on the new and promising data-driven approaches, notably based on machine learning which will certainly dominate in the years to come.
Professor Janusz Kacprzyk, Ph.D., D.Sc.
Polish Academy of Sciences
Spanish Royal Academy of Economic and Financial Sciences
Bulgarian Academy of Sciences
Finnish Society of Sciences and Letters
Contents
Foreword
Preface
1 Background of the credit scoring
Daniel Kaszyński
1.1 History of credit scoring
1.2 Classical scorecard
1.3 Data analytics revolution and its challenges
1.4 People and processes
2 Data processing for credit scoring
Maciej Kwiatkowski
2.1 Data management
2.2 Data sources
2.3 Data quality assurance
2.4 Data pre-processing
2.5 Conclusions
3 Variable selection methods
Karol Przanowski Sebastian Zając Daniel Kaszyński Łukasz Opiński
3.1 The importance of variable pre-selection
3.2 Comparison measures for the variable selection methods
3.3 Variable selection methods
3.4 Numerical experiment of variable selection
3.5 Conclusions
4 Selected machine learning methods used for credit scoring
Małgorzata Wrzosek Daniel Kaszyński Karol Przanowski Sebastian Zając
4.1 Classical credit scoring models
4.2 Machine learning for credit scoring
4.3 Frameworks for model development
4.4 Numerical results of models
4.5 Conclusions
5 Sensitivity of machine learning methods to data issues
Daniel Kaszyński Kinga Siuta Bogumił Kamiński
5.1 Outlying observations
5.2 Missing data
5.3 Selection of the target variable
5.4 Multicollinearity problem
5.5 The Simpson's Paradox
5.6 New categories in categorical variables
5.7 Complete separation problem
5.8 Too granular categorical data
5.9 Coding ratios
5.10 Conclusions
6 Model performance evaluation and model monitoring
Daniel Kaszyński Małgorzata Wrzosek Kamil Cerazy
6.1 The importance of validation and monitoring
6.2 Validation and monitoring process
6.3 Qualitative methods for credit scoring models validation
6.4 Quantitative methods for credit scoring models validation
6.5 Additional validation dimensions
6.6 Conclusions
7 Model interpretability and explainability
Marcin Chlebus Marta Kłosok Przemysław Biecek
7.1 Shapley values and Break-down
7.2 Permutation Feature Importance
7.3 Ceteris Paribus Plot/Individual Conditional Expectation
7.4 Partial Dependence Plot
7.5 An empirical example
7.6 Instance level
7.7 Global level
7.8 Extensions of base XAI analysis
7.9 Conclusions
8 Performance considerations and platforms for scoring models
Łukasz Kraiński
8.1 Introduction to computation performance
8.2 Performance areas relevant to scoring models
8.3 Platforms for building scoring models
9 Techniques for implementing models in decision engines
Aleksander Nosarzewski
9.1 Challenges in model deployment
9.2 Methods of exporting model object
9.3 Model deployment
9.4 Good practices in MLOps
9.5 Tools useful for MLOps
Conclusions
Bibliography
List of Tables
List of Figures
Index
About the authors