Store Format for Existing Stores

Your company currently has 85 grocery stores and is planning to open 10 new stores at the beginning of the year. Currently, all stores use the same store format for selling their products. Up until now, the company has treated all stores similarly, shipping the same amount of product to each store. This is beginning to cause problems as stores are suffering from product surpluses in some product categories and shortages in others. You’ve been asked to provide analytical support to make decisions about store formats and inventory planning.

To remedy the product surplus and shortages, the company wants to introduce different store formats. Each store format will have a different product selection in order to better match local demand. The actual building sizes will not change, just the product selection and internal layouts. The terms formats and segments will be used interchangeably throughout this project. You’ve been asked to:
⦁ Determine the optimal number of store formats based on sales data.
⦁ Sum sales data by StoreID and Year
⦁ Use percentage sales per category per store for clustering (category sales as a percentage of total store sales).
⦁ Use only 2015 sales data.
⦁ Use a K-means clustering model.
⦁ Segment the 85 current stores into the different store formats.
⦁ Use the StoreSalesData.csv and StoreInformation.csv files.
PCA is not used in this project.
Task 1 Submission
⦁ What is the optimal number of store formats? How did you arrive at that number?
⦁ How many stores fall into each store format?
⦁ Based on the results of the clustering model, what is one way that the clusters differ from one another?
⦁ Please provide a map created in Tableau that shows the location of the existing stores, uses color to show cluster, and size to show total sales. Make sure to include a legend! Feel free to simply copy and paste the map into the submission template.

Task 2: Store Format for New Stores
The grocery store chain is has 10 new stores opening up at the beginning of the year. The company wants to determine which store format each of the new stores should have. However, we don’t have sales data for these new stores yet, so we’ll have to determine the format using each of the new store’s demographic data.

Task 2: Determine the Store Format for New Stores
You’ve been asked to:
⦁ Develop a model that predicts which segment a store falls into based on the demographic and socioeconomic characteristics of the population that resides in the area around each new store.
⦁ Use a 20% validation sample with Random Seed = 3 when creating samples with which to compare the accuracy of the models. Make sure to compare a decision tree, forest, and boosted model.
⦁ Use the model to predict the best store format for each of the 10 new stores.
⦁ Use the StoreDemographicData.csv file, which contains the information for the area around each store.
⦁ Note: In a real world scenario, you could use PCA to reduce the number of predictor variables. However, there is no need to do so in this project. You can leave all predictor variables in the model.
Task 2 Submission
⦁ What methodology did you use to predict the best store format for the new stores? Why did you choose that methodology?
⦁ What are the three most important variables that help explain the relationship between demographic indicators and store formats? Please include a visualization.
⦁ What format do each of the 10 new stores fall into? Please provide a data table.

Task 3: Forecasting
Fresh produce has a short life span, and due to increasing costs, the company wants to have an accurate monthly sales forecast.

Leave a Reply