2. Data mining by applying Classification algorithms

I applied 4 classification algorithms to bmw-training.arff in Weka:

1. J48, 2. Supplied test set, 3. Applied lazy-ibk classifier, 4. Supplied to a test set. Based on highest percentage of correctly classified instances, i.e. true positives and true negatives vs. false positives and false negatives, I came to conclusion that model #3 is better: 57 correctly classified instances vs 42% incorrect ones.

3. Clustering Algorithm.

I applied SimpleKMeans in Weka to bmw-BrowserBehavior.arff data set.

First, I wanted to differentiate buyers from non-buyers.

I chose 2 clusters – best differentiates groups of likely buyers – 48% from low percentage of buyers – 24% along each parameter. 83% of those who ended buying a car, visited dealership (vs. 21% of non-buyers); 57% of purchasers visited showroom, whereas almost all non-purchasers, 97% visited showroom; 79% of purchasers were interested in M5 vs.8% non-purchasers; only 30% of purchasers were interested in 3Series vs. 97% non-purchasers; only 33% on purchasers were interested in Z4 vs 65% of non-purchasers. Purchasers were more interested in financing – 73% vs. 42% of non-purchasers. There is only one parameter in this segmentation is almost the same across the two groups- computer search, which could be argued that it is not in full control or the car dealer.

Then, to obtain information which can be useful to a car dealer how to sell three cars, I chose 15 clusters. Yes, it sounds a bit too many clusters for finding out how to sell 3 cars. However, breaking down to 15 clusters offered information on rate of purchases when individual was interested in only one car, in two cars, and in all three cars. Assuming that number associated with each car model indicates individual’s level of interest in that particular model, I worked out scenarios of when sales are more likely and less likely (based on increase in ‘Purchase’ percentage number) to happen.

1. No sales were associated with zero financing, although interest was indicated for either all or some of car models in those cases. This is based on combined information from clusters 6, 10, 12, 13, and 14..

Miracle of financing availability: moreover, in cluster 7 availability of financing resulted in 40% of purchases, although no interest was expressed in either car model (all zeros).

This information was not pronounced at lower cluster level, and hidden at the two-cluster level.

1.1. Overall, visits to dealer and showroom did not seem to have an effect on rate of purchases. We can compare cluster 1 and 5, which have similar level of strong interest in M5>3Series, financing(=100%), and same purchase outcome =80%. All individuals in cluster 1 visited dealership and showroom, whereas in cluster 5 noone did.

2. Sale is most likely when an individual has one car in mind and availability of financing. This is based on information from clusters 3 (Z4) and 11 (M5), with 100% purchases (although based on limited number of observations, 2 for each cluster). Cluster 14 represents those who were only interested in 3Series (5 observations), but there were no financing indicated, and zero purchases. So, it’s hard to conclude how well 3Series would sell if financing were available in this case.

3. Percentage of sales drops when an individual in interested in two cars down to >=64%, with availability of financing. This is based on combined information from clusters 1, 5, and 9 (interest in M5>3Series, and purchase of 83-100%), and cluster 4 (interest in 3Series>Z4, and purchase of 64%). Again, these clusters have small number of observations: 4-9).

4. When individual has all three cars in consideration, purchase rate drops even further to <=50% (clusters 0 and 2, with some strong interest for some models), or even zero (cluster 8 with low equal interest in each model) with available financing = 100%. Combined information from clusters 0, 2, and 8 (number of observations 4-13).

]]>http://pushpoppress.com/ourchoice/

The proPublica map is similar to this unemployment map:

http://www.washingtonpost.com/wp-srv/special/nation/unemployment-by-county/

The 3rd one is similar to this bike share map:

http://bikes.oobrien.com/?city=washingtondc