Skip to content

com-480-data-visualization/projdataviz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project of Data Visualization (COM-480)

Student's name SCIPER
Missipsa Annane 423060
Lingyi Zhu 423013
Yujia Wang 423111

Milestone 1Milestone 2Milestone 3

Milestone 1 (20th March, 5pm)

10% of the final grade

This is a preliminary milestone to let you set up goals for your final project and assess the feasibility of your ideas. Please, fill the following sections about your project.

(max. 2000 characters per section)

Dataset

The report utilizes two primary datasets:

Brand Information Table: Contains basic information for 1,219 freshly made beverage brands, including Name, Logo Link, Average Price, Number of Stores, and Category.


Order Details Table: Contains 31,800 order records, featuring fields such as City, City Level, Brand, Product Type, Price, Quantity, Order Amount, Consumption Scenario, Gender, Age Group, Consumption Motive, Order Date, and Year.

Data Quality Assessment Strengths:

The Brand Information Table has a clear structure with complete fields and no significant missing values. 
The data dimensions are comprehensive, ranging from macro-level (city tiers, brand scale) to micro-level (individual motives, demographics).

Problematic

Frame the general topic of your visualization and the main axis that you want to develop.

  • What am I trying to show with my visualization?
  • Think of an overview for the project, your motivation, and the target audience.

Research Theme: Brand Landscape and Consumer Behavior in China's Freshly Made Beverage Market

Objectives & Audience

  • Target Audience: Beverage industry professionals, market researchers, and the public interested in consumer behavior.
  • Motivation: Given the intense competition in the beverage market, this analysis uses data visualization to clarify market structures and consumer preferences, providing a strategic reference for brands.

Visualization Perspectives

Dimension 1: Brand Market Landscape (1,219 Brands)

Field Visualization Question
Average Price What is the price distribution across different brand types? Is there clear price stratification?
Number of Stores What is the market concentration? How large is the gap between leading brands and "long-tail" brands?
Category (Tea/Coffee/Milk) How do coffee and tea brands differ in terms of quantity and price positioning?

Proposed Visualizations: Price distribution histograms (faceted by brand type), Top 20 bar chart by store count, and a Price vs. Store Count scatter plot (positioning matrix highlighting head brands).

Dimension 2: Consumer Behavior Analysis (31,800 Orders)

Field Visualization Question
Price / Order Amount What is the actual distribution of Average Transaction Value (ATV)? Does it differ from the brand's listed average price?
Scenario (Dine-in/Pick-up/Delivery) What is the breakdown of delivery vs. dine-in vs. pick-up? Does ATV vary by scenario?
Motive (Novelty/Quality/Social) What drives consumer purchases? Do different motives correspond to different price points?
Product Type How popular are sub-categories like pure tea, fruit tea, and milk tea?

Proposed Visualizations: Consumption scenario pie/bar charts + grouped boxplots (ATV by scenario), stacked bar charts for motives (grouped by age or city tier), and product type word clouds or bar charts.

Dimension 3: Brand Positioning vs. Actual Consumption (Cross-Table Analysis)

Analysis Question Data Source
Is the "Average Price" in the Brand Table consistent with the "Actual Transaction Price" in the Order Table? Brand Table + Order Table (aggregated by brand)
Do consumers in different city tiers prefer different types of brands? Order Table + Brand Table (linked by brand type)
Are there significant differences in brand choice across different age or gender groups? Order Table (Brand + Demographics)

Proposed Visualizations: Scatter plot comparing brand average price vs. actual transaction price (with a diagonal reference line), grouped bar charts showing brand category order shares by city tier, and Faceted Bar Chart for cross-analysis of Age × Gender × Brand Type.

Exploratory Data Analysis

Pre-processing

The preprocessing was simple and straightforward:

  • translated both datasets into English
  • kept the original Chinese shop names
  • kept the original Chinese city names in the consumer dataset
  • removed rows with missing or invalid values
  • removed invalid entries such as shops with Average Price = 0.00
  • sorted the drink shop dataset by Number of Stores from highest to lowest
  • kept only tea drinks, coffee, and milk drinks in the drink shop dataset
  • dropped unnecessary columns from the consumer dataset: user_id, product_id, order_date, member, and social_touch
  • converted into JSON file

Consumer dataset

  • Rows: 31800
  • Unique brands: 8
  • Unique cities: 16
  • Unique product types: 5
Statistic Price Quantity Order Amount
Mean 21.37 2.00 42.86
Std 7.80 0.82 24.31
Min 8.00 1.00 8.00
25% 14.58 1.00 24.06
50% 21.32 2.00 35.19
75% 28.11 3.00 59.42
Max 35.00 3.00 105.00

Shops dataset

  • Rows: 1219
  • Unique brands: 1218
  • Unique types: 5
Statistic Average Price Number of Stores
Mean 21.35 161.06
Std 15.17 991.55
Min 3.00 1.00
25% 13.65 6.00
50% 17.00 19.00
75% 22.45 58.50
Max 218.00 25095.00

First insights

  • The consumer dataset focuses on a small set of major brands.
  • The shops dataset is much broader and covers a large number of brands.
  • Only 5 brands overlap between the two datasets, so the linkage plots should be interpreted carefully.
  • The number of stores is highly skewed, with a few brands having very large store networks compared to the rest.

Visualizations

We generated the following plots for the EDA:

top 10 consumer brands orders top 10 cities orders top 10 brands store avg price vs stores

Related work

What others have already done with the data?

The first drink shop dataset has been previously explored in data visualization projects on the Heywhale platform (https://www.heywhale.com/mw/dataset/6595190fb96e5fc9eba7fd27/project). Existing work is relatively limited, mainly presenting basic statistics such as the top 100 drink shops by store count (via bar charts) and the distribution of average price tiers (via pie charts). As a result, the analytical perspective remains narrow, focusing primarily on basic brand-level attributes. These studies do not extend to multi-dimensional analysis of the market, such as examining consumer behavior, brand performance, or the relationships between brand characteristics and consumption data.

Moreover, prior projects rely on single-dimensional visualizations and do not integrate multiple datasets. In particular, they do not combine the consumption dataset with the drink shop brand information dataset (e.g., average price, number of stores, product types). They also overlook deeper insights related to consumer demographics (e.g., gender, age group), consumption patterns (e.g., scenarios, motivations), and the relationship between brand operation metrics (e.g., store count, pricing) and actual consumption performance (e.g., order amount, sales volume).

Why is your approach original?

Our project offers an original approach through multi-dimensional analysis, cross-dataset integration, interactive visualization, and business-oriented insight mining, which greatly advances beyond existing single-dimensional work on the same dataset.

  • We carry out cross-dataset analysis by combining the drink shop brand dataset and the consumer order dataset, linked by brand and city. This integration allows us to explore relationships between brand attributes such as average price, store count, and product type, and real consumer behavior including spending, order amount, and preferences. Such combined analysis is absent in previous studies.
  • We design diverse and innovative visualizations instead of basic static charts. Our visualization system includes a word cloud where font size reflects store count, geographic heatmaps, box plots, dual-axis bar charts, scatter plots with trend lines, and stacked bar charts. The word cloud for store quantity is particularly original and intuitive compared with traditional bar charts.
  • Interactive design serves as another key contribution. Users can filter by city, brand, year, and price range, highlight specific groups, zoom into regions, and switch between metrics. This interactivity supports flexible, user-driven data exploration and significantly improves analytical depth compared with static visualizations in existing work.
  • We focus on actionable business insights rather than only descriptive statistics. Through correlation analysis, we explore meaningful questions such as how brand pricing relates to consumer spending, and who the core consumers are. These insights deliver practical value for understanding the Chinese drink shop market and exceed the scope of prior research.

What source of inspiration do you take?

Our choice of topic is inspired by reports on the global bubble tea market, which highlight its rapid growth in recent years. As the birthplace of milk tea, China has developed a highly diverse beverage culture along with a vast number of drink shops, making this phenomenon particularly distinctive and worth exploring.

Our visualization design and analytical framework are further informed by professional data visualization practices in the retail and FMCG (Fast Moving Consumer Goods) industries, as presented on mainstream visualization platforms and in business reports. The main sources of inspiration are as follows:

  • Retail brand analysis on Tableau Public: Visualization cases on Tableau Public provide valuable references for multi-indicator brand comparison. These projects often employ dual-axis bar charts to compare metrics such as sales volume and revenue across brands, as well as scatter plots to analyze relationships between operational indicators and market performance. Drawing on these approaches, we design visualizations such as brand ranking charts and store count versus revenue scatter plots.
  • Word cloud visualization in marketing and social media analysis: Word clouds are widely used in brand marketing reports and social media analytics to represent attention or popularity through variations in font size. We adapt this technique to visualize beverage brand store counts, where font size reflects the number of stores. Compared with traditional bar charts, this approach makes leading brands more visually prominent and improves readability.
  • Geographic heatmaps in urban consumption studies: Urban consumption reports published by institutions such as China’s National Bureau of Statistics and consulting firms (e.g., McKinsey and Deloitte) frequently use geographic heatmaps to illustrate regional consumption patterns. This inspires our design of city-level consumption heatmaps, enabling a clear and professional representation of spatial consumption characteristics.

Statement on Prior Dataset Exploration

This dataset has not been previously used by us in other courses or projects.

Milestone 2 (17th April, 5pm)

10% of the final grade

1. Project Goal

Our project aims to provide a comprehensive analysis of the beverage market landscape in China, focusing on brand competition, consumer behavior, and regional differences. By integrating a Shops Dataset (1,219 brands) with a granular Consumer Dataset (31,800 orders), we seek to bridge the gap between high-level market trends and personalized user experiences. The visualization will help users understand how pricing, scale, and demographics interact in one of the world's most dynamic consumer markets.

2. Core Visualization (Minimal Viable Product)

To fulfill the core requirements of the project, we will implement the following "Brand Market Landscape" components as our MVP:

  • Interactive Bubble/Logo Cloud: A central visualization showing the Top 20 drink shops. Bubble radius will represent the Store Count, allowing users to identify market leaders instantly.
  • Brand Matrix (Scatter Plot): A "Price vs. Store Count" plot to categorize brands into premium, mass-market, and niche segments.
  • Faceted Price Histograms: Distribution charts segmented by brand type (Tea, Coffee, Milk drinks) to show pricing strategies across different sectors.
  • Functional Website Skeleton: A structured multi-page layout with a navigation bar to switch between market overview and behavior analysis.

3. Extra Ideas & Enhanced Features

These modular features will be added to enhance interactivity and storytelling depth:

  • Consumer Behavior Dashboard: Four synchronized charts (Pie/Donut) visualizing:
    • Scenario & Motive: Delivery vs. dine-in; Social vs. Novelty seeking.
    • Demographics: Distributions across Gender and Age-groups.
  • Regional Tier Analysis (Pyramid): A hierarchical visualization representing 1st, 2nd, and 3rd-tier cities to show brand penetration.
  • 3rd-Tier City Drill-down Map: A detailed geospatial view focusing on emerging markets (e.g., Xiangyang, Shantou) from 2019 to 2023.
  • Personalized Recommendation Tool: A decision-tree style interactive widget that suggests brands based on user-inputted demographics.

4. Tools and Lectures

Tools

  • D3.js / TopoJSON: For the interactive bubble charts and geographic map visualizations.
  • Svelte/React: For managing application state and synchronized filtering across components.
  • Python (Pandas): For data cleaning and joining the 1,219 brands with 31,800 order records.

Lectures Needed

  • Past Lectures: Perception & Colors (for branding and map design), D3.js basics, and Interaction techniques.
  • Future Lectures: Advanced Interaction (for linked views) and Storytelling techniques to guide the user through the data layers.

5. Implementation Breakdown

  1. Phase 1 (Skeleton): Build the basic web structure and navigation based on the design sketches.
  2. Phase 2 (Market MVP): Implement the bubble cloud and scatter plots using the Shops Dataset.
  3. Phase 3 (Consumer Behavior): Integrate the Order Dataset to build interactive filters and demographic charts.
  4. Phase 4 (Advanced Features): Develop the City Tier pyramid and the recommendation engine; polish the UI/UX.

Prototype URL: https://com-480-data-visualization.github.io/projdataviz/ Webpage Sketch_V2

Milestone 3 (29th May, 5pm)

80% of the final grade

Late policy

  • < 24h: 80% of the grade for the milestone
  • < 48h: 70% of the grade for the milestone

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors