SAS & CRM for Techies

viernes, 17 de agosto de 2012

Estimating volume of customers after changes in acquisition

In lots of business, tracking the customer portfolio from month to month or over the years is a key indicator of activity, so the number of customer the company has is a figure always to keep in mind.

In consolidated markets, the number of customers tend to be quite stable, with every big player keeping his slice of the pie. However, every now and then, there are events such as new commercial agreements, advertising through alternative channels or changes in sales policies that can affect the number of customers we are acquiring.

In such cases, knowing how our total customer portfolio will be affected is key to know how the business is going the evolve in following months. On the other hand, if we are able to evaluate the impact of a variation on customer acquisition on our total portfolio, we can also aid decision-making by providing a way to estimate the effect of this change.

So... ¿How can we estimate the future impact of a variation of customer acquisition on the total customer portfolio? or to state it simpler: if the number of customers we are acquiring each month is increased or decreased, ¿how will the total number of customers of the company will be affected?

Let's take a graphical look at the problem:

As we can see, in a normal business situation (Situation A) we have a stable number of customers and a stable number of new customers each month. Since the total volume is stable, it's obvious that we are also losing clients every month, otherwise the line would be sloped. Then, after a decrease (or increase) of the customer acquisition, we have an unstable situation (Situation B) in which we will start losing (or gaining) customers until we reach a new stable situation (Situation C).

Probably you have realized we have made several assumptions here, namely:
* A reasonably constant customer portfolio (ie: total customers) with A customers
* A reasonably constant customer acquisition (ie: new customers) of a customers per month

This assumptions often hold true in stable markets, nevertheless we could adapt the solution to a growing or shrinking customer portfolio. However, in order to keep the concepts focused and the maths clear, we will assume the stable situation.

As we anticipated above, this situation also imply that we have:
* A reasonably constant attrition (ie: lost customers), which can be expressed as the % of customers from the total we are losing each month. If we call this p, we will lose p*A customers each month.

So now that we have our problem key elements identified, let's do some maths!

First, let's think how we can model the evolution of our customer portfolio. From the graphics and the above reasoning, it's clear that the number of customers we have is increased each month by the number of customers we acquire and decreased by those that we lose. Therefore, if we take a month m we can say (1):

Where
* Am is the customer portflio of month m
* Am-1 is the customer portflio of the previous month
* a is the number of clients acquired
* p is the number of clients lost

As you can see, this would hold true for every situation, since every variable could vary from one month to another. However we will see that in Situation A where the assumptions are made, this expression becomes a lot simpler.

In Situation A, we know that the total number of customers is constant, so we can say Am=Am-1=A, and knowing that acquisition and attrition are considered constat, then (1) can be rewritten as:

From here, we find that p can be expressed as:

p=a/A (2)

Now let's consider the month right before the variation in customer acquisition, wich will be the starting point of our analysis, that is m=0. At that point we will still have a total customer base of A customers:

However the month right after that we will have:

And the month right after that we will have:

And and arbitrary number of months n after that, we can infer that we'll have...

Using the good old calculation for the sum of a geometric series, we can express the sum of the factors of a' as theA sum of a geometric series of n-1 terms, with a initial value of a' and a ratio of (1-p), therefore we have:

Which can be nicely rewritten as (3):

As we can see from this last expression, the customer portfolio will adapt to the sudden change in acquisition in an exponential manner. Also, we can easily see that for a sufficiently large n, the series converges to (4):

So finally we have a grasp of the appearance of the evolution of our customer portfolio:

Since we know from (2) that the original volume of our portfolio was A=a/p and in (4) we shown that the customer portfolio converges to A'=a'/p in the long run, we can easily see that the percentage by which acquisition has declined is equal to the percentage of customers we will lose from our portfolio in the long run.

However, as we can see in (3) and the graphic above, it's obvious that this loss is not inmediate, actually it would take a really long time to reach the equilibrium. You can easily figure out how many customer we will lose at a given month N by using (3).

For example, for a stable customer portfolio of 1.000.000 customers with an attrition rate of 2% where the new acquisition volume is 15.000 customers per month, we can expect a maximum loss (4) of 250.000 customers, however in the first year only 54.000 are lost.

So, in conclusion, we have seen how a variation in customer acquisition affects an otherwise stable customer portfolio, and we have shown how to calculate the maximum impact (4) and also how the number of customers of the portfolio evolves month by month (3).

jueves, 16 de agosto de 2012

Defining Oracle libraries in SAS

Accessing Oracle data bases from SAS can be made really easy if we define libraries that encapsulate the connection parameters, so we can access them in a really easy manner. If the database is already configured in your TNS file, with a name such as CRM_DWH, we can easily define a library such as this:

/* Defining access to the CRM_DWH database */ 
libname crm_dwh oracle user="user"  password="pass"  path='CRM_DWH';

/* Accessing the table clients */ 
proc sql; 
 select count(*) from crm_dwh.clients; 
quit;

In this way, we can access all tables that are visible to the user. However, in some cases we would like to access a different schema than the default one. Fortunately, we can easily specify different schemas present in the same database. Let's say we have a SALES schema in our CRM_DWH database, then we can directly access the data just adding a schema option:

/* Defining the access to SALES schema in  the CRM_DWH database */ 
libname crm_dwh oracle user="user"  password="pass"  SCHEMA=SALES  path='CRM_DWH';

/* Obtaining information from the table monthly_sales in the SALES schema */ 
proc sql; 
 select sum(selling_price) from sales.monthly_sales; 
quit;

In some cases, there could be database services that are not specified in the default TNSNAMES file, in which case we could manually define the parameters to access the Oracle database. This can easily be done by defining the connection parameters in a macro variable, and then we can use it on the library definition as in the previous examples:

/* Defining the Oracle DB service */ 
%LET TNS_CRM_DWH= " 
  (DESCRIPTION = 
    ( ADDRESS_LIST =  (ADDRESS = (PROTOCOL = TCP)(HOST = 127.0.0.1)(PORT = 1532)) ) 
    ( CONNECT_DATA =  (SERVICE_NAME = CRM_DWH ) ) 
  )";

/* Using it to define a library */ 
libname crm_dwh oracle user="user"  password="pass"  SCHEMA=SALES  path=&TNS_CRM_DWH.;

Acessing oracle databases using libraries can make your life a lot easier when working from SAS, since you can use an uniform approach to work with multiple data sources without the hassle of pass through sentences.

miércoles, 15 de agosto de 2012

SAS Dates - Macro Style!

When working with processes in SAS that depend on dates, it is pretty useful to define all the standard dates we are most likely to use, so we don't have to go through the mess of coding them all again in every process. This will make our code cleaner and easier to understand, by using macro variables like "current_month" or "first_day_of_month" instead of complex date expressions.
One way to do this is by defining macro variables that store the date values we need. In te following example, I will show you how to define various useful dates with the YYYYMMDD format (that is year/month/day as in 20120821 ):

 
/* Example 1: Defining relevant days*/

/* Get the current date, let's assume today is 21 August 2012*/
%let today_date        = %sysfunc(today());

/* Define the current day and previous day (ie. 20120621 and 20120620) */
%let current_day       = %sysfunc(putn(&today_date, yymmddn.));
%let previous_day     = %sysfunc(putn(%sysfunc(intnx(DAY,&TODAY.,-1)), yymmddn.));

/* Define the first and last day of the month (ie: 20120601 and 20120630) */
%let first_day_month = %sysfunc(putn(%sysfunc(intnx(MONTH,&TODAY.,0,B)), yymmddn.));
%let last_day_month = %sysfunc(putn(%sysfunc(intnx(MONTH,&TODAY.,0,E)), yymmddn.));

And of course we can define dates in any other format we wish, like relevant months:

/* Example 2: Defining relevant months*/

/* Define current month, 6 months ago and 12 months ago (ie: 201206, 201112 and 201106) */
%let month_current   = %sysfunc(putn(&today, yymmn.)); 
%let month_6           = %sysfunc(putn(%sysfunc(intnx(MONTH,&TODAY.,-6)), yymmn.));
%let month_12          = %sysfunc(putn(%sysfunc(intnx(MONTH,&TODAY.,-12)), yymmn.));

So, if we have process that has to extract, let's say, the number of users during the last six months from a table where we have a field "date_registered" defined as YYYYMMDD. We could easily write the solution as:

/* Example 3: How to use our fancy macros*/

proc sql;
  select count(*) from clients where date_registered >= &month_6.;
quit;

Converting SAS date formats

A typical problem when working with any kind database with temporal records, is transforming dates to a human-readable. Another typical problem arises when we work with different data sources that manage dates in very different formats, for example an Excel datasheet and a Oracle database, in which case we need to normalize the formats before we start putting our information together.

Unfortunately, dates in SAS are a bit hard to understand, and I have seen a lot of people having trouble trying to make sense of them, specially those whitout a strong technical background and who are not so familiar with complex programming structures. However if you are fluent in Perl regular expressions and C pointers, then SAS dates should be piece of cake!

In order to simplify the understanding of SAS formats, I put together a little piece of code with all the usual conversions between SAS, Oracle and Human readable formats (typically YYYYMMDD), which serves as a reference table to know how we convert from one format to another.

/* From human-readable format YYYYMMDD to SAS and Oracle */
data test_dates; 
  mydate        = '20111201'; 
  mydate_sas    =  input(mydate, yymmdd8.); 
  mydate_oracle =  dhms(mydate_sas, 0,0,0); 
run;

/* From SAS to Oracle and YYYYMMDD*/ 
data fechas; 
  mydate_sas    = today(); 
  mydate        = put(mydate_sas, yymmddn8.); 
  mydate_oracle = dhms(mydate_sas, 0,0,0); 
run;

/* From Oracle to SAS and YYYYMMDD */ 
data fechas; 
set oracle_table; 
  mydate_sas    = datepart(mydate_oracle); 
  mydate        = put(mydate_sas, yymmddn8.); 
run;

These little examples will help you identifying the function you need to convert from one format to another. Good luck with your date conversions and enjoy your new Rosetta stone for SAS!