Assignment 1

Task: Explore the data and find an interesting story. Design a static visualization to convey a story from the provided data set. Provide one page of write up to describe the story, including design rationale and justification for your design decisions. Describe your choices of visualization type, size, color, and other visual elements. Why and how do you think your design effectively deliver the story?

Note: You are free to transform the data and import external data. You are free to use any graphics or charting tools. You are not required to include all the data in the file.

Data set: 20 professional Tennis players’ gender, game statistics (wins/loses, number of titles, prize money) in year 2012 & Australian Open 2013 prize money.

You can download the data as a csv or excel file on my website.

Hint: Is there a pattern or trend? What are the differences? How do you visualize money? Can you describe the players’ performance better than just showing the wins&loses or prize money?

Submission: This is an individual assignment. Your submission should include a page of your write up and a copy of your visualization in standard image file format (PNG, JPG, TIFF, GIF). You may zip these two files together.

File must be named as Assign1-YourFirstnameLastname (i.e. Assign1-SharonHsiao)

Deadline: Tuesday Feb.12 2013, by 6:00pm. Download Data Set

One thought on “Assignment

  1. Katherine Rostoff 04/17/2013 at 7:27 PM Reply

    Lab5 – Machine Learning.

    2. Data mining by applying Classification algorithms
    I applied 4 classification algorithms to bmw-training.arff in Weka:
    1. J48, 2. Supplied test set, 3. Applied lazy-ibk classifier, 4. Supplied to a test set. Based on highest percentage of correctly classified instances, i.e. true positives and true negatives vs. false positives and false negatives, I came to conclusion that model #3 is better: 57 correctly classified instances vs 42% incorrect ones.

    3. Clustering Algorithm.
    I applied SimpleKMeans in Weka to bmw-BrowserBehavior.arff data set.
    First, I wanted to differentiate buyers from non-buyers.
    I chose 2 clusters – best differentiates groups of likely buyers – 48% from low percentage of buyers – 24% along each parameter. 83% of those who ended buying a car, visited dealership (vs. 21% of non-buyers); 57% of purchasers visited showroom, whereas almost all non-purchasers, 97% visited showroom; 79% of purchasers were interested in M5 vs.8% non-purchasers; only 30% of purchasers were interested in 3Series vs. 97% non-purchasers; only 33% on purchasers were interested in Z4 vs 65% of non-purchasers. Purchasers were more interested in financing – 73% vs. 42% of non-purchasers. There is only one parameter in this segmentation is almost the same across the two groups- computer search, which could be argued that it is not in full control or the car dealer.

    Then, to obtain information which can be useful to a car dealer how to sell three cars, I chose 15 clusters. Yes, it sounds a bit too many clusters for finding out how to sell 3 cars. However, breaking down to 15 clusters offered information on rate of purchases when individual was interested in only one car, in two cars, and in all three cars. Assuming that number associated with each car model indicates individual’s level of interest in that particular model, I worked out scenarios of when sales are more likely and less likely (based on increase in ‘Purchase’ percentage number) to happen.

    1. No sales were associated with zero financing, although interest was indicated for either all or some of car models in those cases. This is based on combined information from clusters 6, 10, 12, 13, and 14..
    Miracle of financing availability: moreover, in cluster 7 availability of financing resulted in 40% of purchases, although no interest was expressed in either car model (all zeros).
    This information was not pronounced at lower cluster level, and hidden at the two-cluster level.

    1.1. Overall, visits to dealer and showroom did not seem to have an effect on rate of purchases. We can compare cluster 1 and 5, which have similar level of strong interest in M5>3Series, financing(=100%), and same purchase outcome =80%. All individuals in cluster 1 visited dealership and showroom, whereas in cluster 5 noone did.

    2. Sale is most likely when an individual has one car in mind and availability of financing. This is based on information from clusters 3 (Z4) and 11 (M5), with 100% purchases (although based on limited number of observations, 2 for each cluster). Cluster 14 represents those who were only interested in 3Series (5 observations), but there were no financing indicated, and zero purchases. So, it’s hard to conclude how well 3Series would sell if financing were available in this case.

    3. Percentage of sales drops when an individual in interested in two cars down to >=64%, with availability of financing. This is based on combined information from clusters 1, 5, and 9 (interest in M5>3Series, and purchase of 83-100%), and cluster 4 (interest in 3Series>Z4, and purchase of 64%). Again, these clusters have small number of observations: 4-9).

    4. When individual has all three cars in consideration, purchase rate drops even further to <=50% (clusters 0 and 2, with some strong interest for some models), or even zero (cluster 8 with low equal interest in each model) with available financing = 100%. Combined information from clusters 0, 2, and 8 (number of observations 4-13).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: