authorimg

navigating loan application surge through strategic automation at zappy financial services

Ileana

5.0

Here is Your Sample Download Sample πŸ“©

Introduction

The present business climate for organisations such as Zappy Financial Services (ZFS) is characterised by an increase in loan applications increase, which is being driven by a strong digital presence. The manual loan approval procedure, on the other hand, has resulted in talent shortages, lengthier approval periods, and greater operational hazards. The suggested programming solution tries to address these issues by adding partially automated procedures, allowing lending team to devote more time to client engagement. This solution will increase efficiency, lower operational risks, and enable scalable expansion. Failure to utilise automation would result in ongoing talent shortages, longer approval waits, and stifled corporate growth. To properly build and apply automation processes, the solution requires professional programming resources.

Approach

Method 1 Loan data automation

According to Sachan et al. (2020), the manual underwriting procedure is heavily reliant on paper. Manual loan approval automation would help lenders make informed decisions on loan applications increase by automating data collection and processing.

Process 1: Import and Extract the Dataset Using Python Libraries

To import the PDF and Excel files into the code, the relevant Python libraries must be installed. Tabula_py helps to convert pdf data into an editable pandas dataframe (Tabula, 2019).

The code snippet for installing the Python libraries

Figure 1: Code snippet of library import

(Source: Self-developed in Jupyter)

The data is then extracted from PDF file ('Loans_Database_Table.pdf') and converted into a pandas DataFrame using the tabula-py module. However, the pandas library is powerful enough to handle CSV files appropriately.

Figure 2: extracting pdf and excel

(Source: Self-developed in Jupyter)

The tabula.read_pdf () function in the above code sample reads the PDF file and returns a collection of DataFrame objects. Indexing (pdf_data[0]) is used to retrieve the extracted data, which is then turned into a pandas DataFrame (loan_data). In addition, the pandas library's pd variable correctly supports and loads CSV files through pd.read_excel().

Process 2: Retrieving the Data from the PDF and CSV File Stored in a List

The data retrieved from the PDF file must be converted into a pandas DataFrame before it can be accessed and worked with.

The following code snippet retrieves data from a PDF file saved in a list:

Figure 3: Code snippet of converting pdf to a data frame

(Source: Self-developed in Jupyter)

The extracted data from the PDF file is accessible via indexing (pdf_data[0]) and then turned into a pandas DataFrame (loan_data) using the pd.DataFrame() method in the preceding code.

Pd.read_csv() method is then used to extract CSV file into pandas array for further analysis.

Process 3: checking for null value

The isnull() method is used to detect the existence of null values (Pandas.isnull, 2023). The total number of null values for each column is then calculated using the sum() method.

Figure 4: Code snippet of missing value of pdf dataset

(Source: Self-developed in Jupyter)

Figure 5: Code snippet of missing value of CSV dataset

(Source: Self-developed in Jupyter)

 

The programme prints the count of null values for each column in the Manual loan approval data from the PDF file (loan_data) and the loan data from the CSV file (sales_data) by running these lines of code.

Process 4: checking for duplicate value

Duplicate value identification is essential for ensuring data integrity and preventing biased or incorrect analysis (Talha et al. 2019).

The code snippet for checking duplicates is as follows

Figure 6: Code snippet of checking duplicate in PDF dataset

(Source: Self-developed in Jupyter)

The output suggests that the following dataset has no duplicate.

 

Figure 7: Code snippet of checking duplicate in CSVdataset

(Source: Self-developed in Jupyter)

Process 3: Checking the PDF File and CSV File Type

The data type verification is required as it would then help to choose various methods that can be used on it.

The following code extracts loan_data datatype

Figure 8: Code snippet of checking PDF file dataset type

(Source: Self-developed in Jupyter)

The following code extracts sales_data datatype

Figure 9: Code snippet of checking CSVfile dataset type

(Source: Self-developed in Jupyter)

The sales_data.dtypes command is used in the given code to output the data types of the columns in the sales_dataDataFrame.

Process 6: Placing PDF File Data into a DataFrame and Merging with Other Data

The data from a PDF file is stored in a DataFrame and blended with data from a CSV file for further analysis and modelling.

The code snippet for loading the data from a PDF file into a DataFrame and integrating it with additional data.

Figure 10: Code snippet of merging two dataset

(Source: Self-developed in Jupyter)

The pd.merge() method is used in the above code to merge the loan_data DataFrame (which contains data from the PDF file) with the sales_data DataFrame (which contains data from the Excel file). Using the argument, the merging is conducted based on the supplied common columns (Loan_ID, Gender, Married, etc.).

The variable merged_df is allocated to the merged DataFrame, which represents the combined dataset comprising information from both the PDF and Excel files.

Process 7: Checking for duplicate value in merged dataset

As cited by Lee et al. (2021), data cleansing is essential for successful machine-learning projects.

The duplicate() function is used to identify duplicates (Pandas.dataframe.duplicated, 2023). The sum() command computes the sum of null values for each column in the merged_df DataFrame.

Figure 11: Code snippet of duplicate value in merged dataset

(Source: Self-developed in Jupyter)

Process 8: merged data description

The DataFrame values description offers an overview of the statistical attributes and distribution of the variables in the dataset, which is essential for EDA (Sinthong and Carey, 2019). It aids in understanding the data's central tendency, dispersion, and form. The supplied code sample computes and displays the value descriptions in the merged_df DataFrame.

Figure 12: Code snippet describing merged dataset

(Source: Self-developed in Jupyter)

The describe() command computes several statistical measures, such as count, mean, standard deviation, minimum, quartiles, and maximum, for each numerical column for given data-frame (Pandas.dataframe.describe, 2023).

Method 2 Descriptive Analysis

As per Brei (2020) descriptive analysis provides insights into Manual loan approval application trends, applicant profiles, and approval results.

Process 1: Visualisation of all variable

All variables in the merged_df DataFrame are visualised in the given code. The code snippet makes use of the matplotlib and seaborn libraries to generate several visualisations that aid in understanding the dataset's distribution, linkages, and trends. This visualisation's code snippet is

Figure 13: Code snippet of printing all variables as chart

(Source: Self-developed in Jupyter)

The merged_df.hist() method in the above code creates histograms for all variables in the DataFrame. The bins argument defines the number of histogram bins or intervals, while the figsize parameter specifies the plot size.

The histogram plot is displayed via the plt.show() method.

Figure 14: Visual of all variables in a chart

(Source: Self-developed in Jupyter)

To display the pairplot, the plt.show() method is used once more.

The programme provides visualisations that give insights into the distribution of individual variables through histograms and the interactions between variables through scatterplots by executing these lines of code.

Histograms aid in understanding the distribution and frequency of values within each variable by showing outliers, skewness, and trends in the data.

Process 2: Visualising relation between some variable

Figure 15: Code snippet of printing visualisation of certain variable

(Source: Self-developed in Jupyter)

The following code uses sns.pairplot(), which is a built-in function of seaborn, to create a visualisation of specified variables in a merged dataset.

Figure 16: Visualisation of Loan_amount_term, LoanAmount, CoapplicantIncome, ApplicantIncome variable

(Source: Self-developed in Jupyter)

The pairplot displays scatterplots for each paired combination of the provided variables, revealing relationships and trends (Seaborn.pairplot, 2022). Each scatterplot depicts the relationship between two variables, indicating how they are distributed and if there are any linear or non-linear interactions.

Process 3: Plotting a Histogram with Loan Status Data Values

A histogram may be used to visualise the distribution of loan status data. The supplied code sample shows how to plot a histogram with loan status data values.

The code for creating a histogram with loan status data values is as follows:

Figure 17: Code snippet of visualising loan approval by gender

(Source: Self-developed in Jupyter)

Figure 18: visualisation of loan approval by gender

(Source: Self-developed in Jupyter)

The sns.countplot() method is used to generate a histogram-like visualisation of the data (Seaborn.countplot, 2022). The x option is set to 'Loan_Status,' which represents the column in the merged_df DataFrame carrying the loan approval status information.

The plt.figure(figsize=(8, 6)) line adjusts the figure size for improved visibility and attractiveness. In this chart, the number '1' represents a man and the number '2' represents a female.

Process 4: Checking the Number of Approved Loans for Females

Code calculates percentage of female applicants whose loans are granted.

Figure 19: Code snippet of calculating female loan approval

(Source: Self-developed in Jupyter)

The loc function is used in the above code to filter the merged_df DataFrame based on two conditions: 'Gender' equals 'Female' and 'Loan_Status' equals 'Y', signifying Manual loan approval. The resultant DataFrame, female_approved, only contains entries where both requirements are met.

The female_approved Data-Frame and female_total Data-Frame are calculated using the len() and merged_df functions, respectively, to determine the number of authorised loans and proportion of female applicants with approved loans.

Process 5: Checking the Average Applicant Income

The offered code snippet retrieves the 'ApplicantIncome' column and calculates the mean value to compute the average applicant income in the merged_df DataFrame.

The code snippet for checking the average applicant income

Figure 20: Code snippet of calculating average income

(Source: Self-developed in Jupyter)

Process 6: Checking average income of self-employed applicant

The loc() function is used to filter the DataFrame based on the condition merged_df['Self_Employed'] == 1. Indexing is then used to obtain the 'ApplicantIncome' column, and the mean() function is used to get the average salary of the 'ApplicantIncome' column.

The programme determines the average salary of self-employed applicants by running this line of code.

Figure 21: Code snippet of calculating average income of self-employed

(Source: Self-developed in Jupyter)

Process 7: Checking income of non-self-employed applicant

The code snippet supplied filters the DataFrame based on the 'Self_Employed' column and determines the mean value of the 'ApplicantIncome' column for the filtered subset to compute the income of non-self-employed applicants in the merged_df DataFrame.

The code snippet for checking the income of non-self-employed applicants:

Figure 22: Code snippet of calculating average income of non-self-employed

(Source: Self-developed in Jupyter)

The loc() function is used in the above code to filter the DataFrame based on the condition merged_df['Self_Employed'] == 0, which chooses rows with 'Self_Employed' equal to 0 (representing non-self-employed candidates). Indexing is then used to obtain the 'ApplicantIncome' column (['ApplicantIncome']). Finally, for non-self-employed applicants, the mean() function is used to get the average value of the 'ApplicantIncome' column.

The programme determines the average salary of non-self-employed applicants by running this line of code.

Process 8: checking income of graduate applicant

The code snippet supplied filters the DataFrame based on the 'Graduate' column and calculates the mean value of the 'ApplicantIncome' column for the filtered subset to compute the income of graduate applicants in the merged_df DataFrame.

The code snippet for checking the income of graduate applicants:

Figure 23: Code snippet of calculating average income of graduate

(Source: Self-developed in Jupyter)

The loc function is used in the above code to filter the DataFrame based on the criteria merged_df['Graduate'] == 1, which chooses rows where 'Graduate' equals 1 (indicating graduate candidates). Indexing is then used to obtain the 'ApplicantIncome' column (['ApplicantIncome']). Finally, for the graduate applicants, the mean() function is used to get the average value of the 'ApplicantIncome' column. The programme determines the average salary of graduate candidates by running this line of code.

Process 9: Checking the loan approved rate of college graduate

The code snippet supplied filters the DataFrame based on the 'Graduate' column and calculates the percentage of loan approvals within the filtered subset to estimate the loan approval rate of college graduates in the merged_df DataFrame.

The code snippet for checking the loan approval rate of college graduates:

Figure 24: Code snippet of calculating loan approval rate of graduates

(Source: Self-developed in Jupyter)

The loc function is used in the preceding code to filter the DataFrame based on two conditions: merged_df['Graduate'] == 1, which selects rows where 'Graduate' equals 1 (indicating college graduates), and merged_df['Loan_Status'] == 'Y'