PKDD'99 Discovery Challenge

Guide to the Financial Data Set

   

Domain

Once upon a time, there was a bank offering services to private persons. The services include managing of accounts, offering loans, etc. The bank wants to improv their services by finding interesting groups of clients (e.g. to differentiate between good and bad clients). The bank managers have only vague idea, who is good client (whom to offer some additional services) and who is bad client (whom to watch carefully to minimize the bank looses). Fortunately, the bank stores data about their clients, the accounts (transactions within several months), the loans already granted, the credit cards issued So the bank managers hope to find some answers (and questions as well) by analyzing this data.

 

Task description

The discovery challenge task is to

Each participant can use any KDD techniques and discover as much knowledge as possible. Ideally each approach will include 1. proposed goals, 2. details of datamining, and 3. demonstrated use of the results. Since the results of discovery may be unexpected, the applications may be different from those initially proposed.

 

Data description

The data about the clients and their accounts consist of following relations:

Each account has both static characteristics (e.g. date of creation, address of the branch) given in relation "account" and dynamic characteristics (e.g. payments debited or credited, balances) given in relations "permanent order" and "transaction". Relation "client" describes characteristics of persons who can manipulate with the accounts. One client can have more accounts, more clients can manipulate with single account; clients and accounts are related together in relation "disposition". Relations "loan" and "credit card" describe some services which the bank offers to its clients; more credit cards can be issued to an account, at most one loan can be granted for an account. Relation "demographic data" gives some publicly available information about the districts (e.g. the unemployment rate); additional information about the clients can be deduced from this.

 

Relation account

itemmeaningremark
account_ididentification of the account
district_idlocation of the branch
datedate of creating of the accountin the form YYMMDD
frequencyfrequency of issuance of statements "POPLATEK MESICNE" stands for monthly issuance

"POPLATEK TYDNE" stands for weekly issuance

"POPLATEK PO OBRATU" stands for issuance after transaction
     

Relation client

itemmeaningremark
client_idrecord identifier
birth numberidentification of client the number is in the form YYMMDD for men,
the number is in the form YYMM+50DD for women,

where YYMMDD is the date of birth

district_idaddress of the client
     

Relation disposition

itemmeaningremark
disp_idrecord identifier
client_ididentification of a client
account_ididentification of an account
typetype of disposition (owner/user) only owner can issue permanent orders and ask for a loan
     

Relation permanent order

itemmeaningremark
order_idrecord identifier
account_idaccount, the order is issued for
bank_tobank of the recipient each bank has unique two-letter code
account_toaccount of the recipient
amountdebited amount
K_symbolcharacterization of the payment "POJISTNE" stands for insurrance payment

"SIPO" stands for household

"LEASING" stands for leasing

"UVER" stands for loan payment

     

Relation Transaction

itemmeaningremark
trans_idrecord identifier
account_idaccount, the transation deals with
datedate of transactionin the form YYMMDD
type+/- transaction "PRIJEM" stands for credit

"VYDAJ" stands for withdrawal
operationmode of transaction "VYBER KARTOU" credit card withdrawal

"VKLAD" credit in cash

"PREVOD Z UCTU" collection from another bank

"VYBER" withdrawal in cash

"PREVOD NA UCET" remittance to another bank
amountamount of money
balancebalance after transaction
k_symbolcharacterization of the transaction "POJISTNE" stands for insurrance payment

"SLUZBY" stands for payment for statement

"UROK" stands for interest credited

"SANKC. UROK" sanction interest if negative balance

"SIPO" stands for household

"DUCHOD" stands for old-age pension

"UVER" stands for loan payment

bankbank of the partner each bank has unique two-letter code
accountaccount of the partner
     

Relation Loan

itemmeaningremark
loan_idrecord identifier
account_ididentification of the account
datedate when the loan was granted in the form YYMMDD
amountamount of money
durationduration of the loan
paymentsmonthly payments
statusstatus of paying off the loan 'A' stands for contract finished, no problems,

'B' stands for contract finished, loan not payed,

'C' stands for running contract, OK so far,

'D' stands for running contract, client in debt
     

Relation Credit card

itemmeaningremark
card_idrecord identifier
disp_iddisposition to an account
typetype of card possible values are "junior", "classic", "gold"
issuedissue datein the form YYMMDD
     

Relation Demographic data

item meaning remark
A1 = district_iddistrict code
A2district name
A3region
A4no. of inhabitants
A5no. of municipalities with inhabitants < 499
A6no. of municipalities with inhabitants 500-1999
A7no. of municipalities with inhabitants 2000-9999
A8no. of municipalities with inhabitants >10000
A9no. of cities
A10ratio of urban inhabitants
A11average salary
A12unemploymant rate '95
A13unemploymant rate '96
A14no. of enterpreneurs per 1000 inhabitants
A15no. of commited crimes '95
A16no. of commited crimes '96

 

This database was prepared by Petr Berka and Marta Sochorova