The aim is to bring closer the process of creating, selecting algorithms and evaluating data in order to achieve the intended goal of improving and accelerating decision-making.
For one of our clients (USA), a financing/credit provider (online) on the Amazon platform, machine learning algorithms have been developed and successfully implemented.
Our task was to streamline the already existing funding process, which consisted of engaging people to check customer data. After the analysis, the task was divided into several sub-projects, including customer analysis due to financial status, customer database churn prediction, customer sales prediction.
Our cooperation with the client started with data verification, which consisted in checking the quantity, quality and usefulness of data. We received about 50 thousand lines of data flow daily from the last two years. The structure of the databases was designed mainly for customer (vendor) data, and the customer data was spread out over about 50 characteristics (variables). After analyzing the available data it turned out that a large part of the data could not be included in the analysis. After cleaning the data and selecting a subset of useful data, a semi-annual customer history with about 20 characteristics was available. There were several reasons for this, namely, useless data, incompatibilities in the data caused by malfunctions in the data collection system and the transition to a new data collection system, or data redundancy. As far as usability is concerned, the final usability analysis is carried out when planning individual functionalities.
After reviewing the data, we prepared questions about lending and details of cooperation with the Amazon platform. During the videoconference it turned out that even among their experts there are inconsistencies in the customer evaluation process. This was a factor that complicated the process of implementation of our solutions, it became necessary to gain appropriate field knowledge.
After discussions with the client, we proposed several sub-projects, between which several of the above described and after discussing the details, a subset of projects was selected for implementation for the next period of cooperation. Whiteaster’s 3-person ML team was initially involved in the project. Thanks to our efforts, after the first 3 months the result was better than the experts on the client’s side. After discussing the results with the contracting company, for an even better result, we agreed to change the process in such a way that the clients we are sure will automatically receive funding through our model, and others will need to be checked manually by experts from the contracting company.
The model itself was supposed to take into account such data as e.g.: seller’s evaluation of the platform to be sold, financial status of the seller, evaluation of products available to the seller and seasonality of their sale. Each of the factors was evaluated separately, after which the evaluations of the individual factors were analyzed by an aggregate model, which is how we obtained a very interpreted result.
After the implementation of the first improvement, together with the Client we started to think about more risky projects, for which we did not have enough data. To carry out such projects we needed to assign additional people to the team dealing with this project. ML Whiteaster’s team was enlarged to 7 people, which allowed us to extend the scope of data collection, implement models to predict the sales of each client separately, add several new models and improve the result of the existing ones.
Important elements in the implementation of machine learning solutions are:
Improve (improve the effectiveness and speed up) the decision making process for customer funding.
Most of the necessary data was obtained, but a large part of the data was not taken into account for various reasons. Sometimes it is a good idea to add the necessary data to the process of obtaining them.
Good data quality and field knowledge help to improve/accelerate analysis. The analysis is about whether the goal is achievable using existing data and if so, how the data models used for model implementation will look like.
Data processing consists in preparing data for teaching, preparing scripts for cleaning data flow from production. The data processing stage is responsible for data quality, but only as much as the data itself allows.
The models contain knowledge on the basis of which predictions can be obtained (in this case, regarding the safety of customer funding). This knowledge is obtained from the data, as much as the information hidden in the data allows. At this stage the model is taught on the data. It is important to choose the right algorithm to solve the problem.
Model tuning is about fine-tuning the existing algorithm to teach models to improve results. The whole point is dedicated to this process, because this process is very time consuming and tuning is based on a trial and error cycle.
Implementation of the model in the production system. Depending on the scale of the problem, either a module is added that uses the model locally or uses a hosted model on the cloud.