Project of Data Visualization (COM-480)

Student's name	SCIPER
Missipsa Annane	423060
Lingyi Zhu	423013
Yujia Wang	423111

Milestone 1 • Milestone 2 • Milestone 3

Milestone 1 (20th March, 5pm)

10% of the final grade

This is a preliminary milestone to let you set up goals for your final project and assess the feasibility of your ideas. Please, fill the following sections about your project.

(max. 2000 characters per section)

Dataset

The report utilizes two primary datasets:

Brand Information Table: Contains basic information for 1,219 freshly made beverage brands, including Name, Logo Link, Average Price, Number of Stores, and Category.


Order Details Table: Contains 31,800 order records, featuring fields such as City, City Level, Brand, Product Type, Price, Quantity, Order Amount, Consumption Scenario, Gender, Age Group, Consumption Motive, Order Date, and Year.

Data Quality Assessment Strengths:

The Brand Information Table has a clear structure with complete fields and no significant missing values. 
The data dimensions are comprehensive, ranging from macro-level (city tiers, brand scale) to micro-level (individual motives, demographics).

Problematic

Frame the general topic of your visualization and the main axis that you want to develop.

What am I trying to show with my visualization?

Think of an overview for the project, your motivation, and the target audience.

Research Theme: Brand Landscape and Consumer Behavior in China's Freshly Made Beverage Market

Objectives & Audience

Target Audience: Beverage industry professionals, market researchers, and the public interested in consumer behavior.
Motivation: Given the intense competition in the beverage market, this analysis uses data visualization to clarify market structures and consumer preferences, providing a strategic reference for brands.

Visualization Perspectives

Dimension 1: Brand Market Landscape (1,219 Brands)

Field	Visualization Question
Average Price	What is the price distribution across different brand types? Is there clear price stratification?
Number of Stores	What is the market concentration? How large is the gap between leading brands and "long-tail" brands?
Category (Tea/Coffee/Milk)	How do coffee and tea brands differ in terms of quantity and price positioning?

Proposed Visualizations: Price distribution histograms (faceted by brand type), Top 20 bar chart by store count, and a Price vs. Store Count scatter plot (positioning matrix highlighting head brands).

Dimension 2: Consumer Behavior Analysis (31,800 Orders)

Field	Visualization Question
Price / Order Amount	What is the actual distribution of Average Transaction Value (ATV)? Does it differ from the brand's listed average price?
Scenario (Dine-in/Pick-up/Delivery)	What is the breakdown of delivery vs. dine-in vs. pick-up? Does ATV vary by scenario?
Motive (Novelty/Quality/Social)	What drives consumer purchases? Do different motives correspond to different price points?
Product Type	How popular are sub-categories like pure tea, fruit tea, and milk tea?

Proposed Visualizations: Consumption scenario pie/bar charts + grouped boxplots (ATV by scenario), stacked bar charts for motives (grouped by age or city tier), and product type word clouds or bar charts.

Dimension 3: Brand Positioning vs. Actual Consumption (Cross-Table Analysis)

Analysis Question	Data Source
Is the "Average Price" in the Brand Table consistent with the "Actual Transaction Price" in the Order Table?	Brand Table + Order Table (aggregated by brand)
Do consumers in different city tiers prefer different types of brands?	Order Table + Brand Table (linked by brand type)
Are there significant differences in brand choice across different age or gender groups?	Order Table (Brand + Demographics)

Proposed Visualizations: Scatter plot comparing brand average price vs. actual transaction price (with a diagonal reference line), grouped bar charts showing brand category order shares by city tier, and Faceted Bar Chart for cross-analysis of Age × Gender × Brand Type.

Exploratory Data Analysis

Pre-processing

The preprocessing was simple and straightforward:

translated both datasets into English
kept the original Chinese shop names
kept the original Chinese city names in the consumer dataset
removed rows with missing or invalid values
removed invalid entries such as shops with Average Price = 0.00
sorted the drink shop dataset by Number of Stores from highest to lowest
kept only tea drinks, coffee, and milk drinks in the drink shop dataset
dropped unnecessary columns from the consumer dataset: user_id, product_id, order_date, member, and social_touch
converted into JSON file

Consumer dataset

Rows: 31800
Unique brands: 8
Unique cities: 16
Unique product types: 5

Statistic	Price	Quantity	Order Amount
Mean	21.37	2.00	42.86
Std	7.80	0.82	24.31
Min	8.00	1.00	8.00
25%	14.58	1.00	24.06
50%	21.32	2.00	35.19
75%	28.11	3.00	59.42
Max	35.00	3.00	105.00

Shops dataset

Rows: 1219
Unique brands: 1218
Unique types: 5

Statistic	Average Price	Number of Stores
Mean	21.35	161.06
Std	15.17	991.55
Min	3.00	1.00
25%	13.65	6.00
50%	17.00	19.00
75%	22.45	58.50
Max	218.00	25095.00

First insights

The consumer dataset focuses on a small set of major brands.
The shops dataset is much broader and covers a large number of brands.
Only 5 brands overlap between the two datasets, so the linkage plots should be interpreted carefully.
The number of stores is highly skewed, with a few brands having very large store networks compared to the rest.

Visualizations

We generated the following plots for the EDA:

Related work

What others have already done with the data?

The first drink shop dataset has been previously explored in data visualization projects on the Heywhale platform (https://www.heywhale.com/mw/dataset/6595190fb96e5fc9eba7fd27/project). Existing work is relatively limited, mainly presenting basic statistics such as the top 100 drink shops by store count (via bar charts) and the distribution of average price tiers (via pie charts). As a result, the analytical perspective remains narrow, focusing primarily on basic brand-level attributes. These studies do not extend to multi-dimensional analysis of the market, such as examining consumer behavior, brand performance, or the relationships between brand characteristics and consumption data.

Moreover, prior projects rely on single-dimensional visualizations and do not integrate multiple datasets. In particular, they do not combine the consumption dataset with the drink shop brand information dataset (e.g., average price, number of stores, product types). They also overlook deeper insights related to consumer demographics (e.g., gender, age group), consumption patterns (e.g., scenarios, motivations), and the relationship between brand operation metrics (e.g., store count, pricing) and actual consumption performance (e.g., order amount, sales volume).

Why is your approach original?

Our project offers an original approach through multi-dimensional analysis, cross-dataset integration, interactive visualization, and business-oriented insight mining, which greatly advances beyond existing single-dimensional work on the same dataset.

We carry out cross-dataset analysis by combining the drink shop brand dataset and the consumer order dataset, linked by brand and city. This integration allows us to explore relationships between brand attributes such as average price, store count, and product type, and real consumer behavior including spending, order amount, and preferences. Such combined analysis is absent in previous studies.
We design diverse and innovative visualizations instead of basic static charts. Our visualization system includes a word cloud where font size reflects store count, geographic heatmaps, box plots, dual-axis bar charts, scatter plots with trend lines, and stacked bar charts. The word cloud for store quantity is particularly original and intuitive compared with traditional bar charts.
Interactive design serves as another key contribution. Users can filter by city, brand, year, and price range, highlight specific groups, zoom into regions, and switch between metrics. This interactivity supports flexible, user-driven data exploration and significantly improves analytical depth compared with static visualizations in existing work.
We focus on actionable business insights rather than only descriptive statistics. Through correlation analysis, we explore meaningful questions such as how brand pricing relates to consumer spending, and who the core consumers are. These insights deliver practical value for understanding the Chinese drink shop market and exceed the scope of prior research.

What source of inspiration do you take?

Our choice of topic is inspired by reports on the global bubble tea market, which highlight its rapid growth in recent years. As the birthplace of milk tea, China has developed a highly diverse beverage culture along with a vast number of drink shops, making this phenomenon particularly distinctive and worth exploring.

Our visualization design and analytical framework are further informed by professional data visualization practices in the retail and FMCG (Fast Moving Consumer Goods) industries, as presented on mainstream visualization platforms and in business reports. The main sources of inspiration are as follows:

Retail brand analysis on Tableau Public: Visualization cases on Tableau Public provide valuable references for multi-indicator brand comparison. These projects often employ dual-axis bar charts to compare metrics such as sales volume and revenue across brands, as well as scatter plots to analyze relationships between operational indicators and market performance. Drawing on these approaches, we design visualizations such as brand ranking charts and store count versus revenue scatter plots.
Word cloud visualization in marketing and social media analysis: Word clouds are widely used in brand marketing reports and social media analytics to represent attention or popularity through variations in font size. We adapt this technique to visualize beverage brand store counts, where font size reflects the number of stores. Compared with traditional bar charts, this approach makes leading brands more visually prominent and improves readability.
Geographic heatmaps in urban consumption studies: Urban consumption reports published by institutions such as China’s National Bureau of Statistics and consulting firms (e.g., McKinsey and Deloitte) frequently use geographic heatmaps to illustrate regional consumption patterns. This inspires our design of city-level consumption heatmaps, enabling a clear and professional representation of spatial consumption characteristics.

Statement on Prior Dataset Exploration

This dataset has not been previously used by us in other courses or projects.

Milestone 2 (17th April, 5pm)

10% of the final grade

1. Project Goal

Our project aims to provide a comprehensive analysis of the beverage market landscape in China, focusing on brand competition, consumer behavior, and regional differences. By integrating a Shops Dataset (1,219 brands) with a granular Consumer Dataset (31,800 orders), we seek to bridge the gap between high-level market trends and personalized user experiences. The visualization will help users understand how pricing, scale, and demographics interact in one of the world's most dynamic consumer markets.

2. Core Visualization (Minimal Viable Product)

To fulfill the core requirements of the project, we will implement the following "Brand Market Landscape" components as our MVP:

Interactive Bubble/Logo Cloud: A central visualization showing the Top 20 drink shops. Bubble radius will represent the Store Count, allowing users to identify market leaders instantly.
Brand Matrix (Scatter Plot): A "Price vs. Store Count" plot to categorize brands into premium, mass-market, and niche segments.
Faceted Price Histograms: Distribution charts segmented by brand type (Tea, Coffee, Milk drinks) to show pricing strategies across different sectors.
Functional Website Skeleton: A structured multi-page layout with a navigation bar to switch between market overview and behavior analysis.

3. Extra Ideas & Enhanced Features

These modular features will be added to enhance interactivity and storytelling depth:

Consumer Behavior Dashboard: Four synchronized charts (Pie/Donut) visualizing:
- Scenario & Motive: Delivery vs. dine-in; Social vs. Novelty seeking.
- Demographics: Distributions across Gender and Age-groups.
Regional Tier Analysis (Pyramid): A hierarchical visualization representing 1st, 2nd, and 3rd-tier cities to show brand penetration.
3rd-Tier City Drill-down Map: A detailed geospatial view focusing on emerging markets (e.g., Xiangyang, Shantou) from 2019 to 2023.
Personalized Recommendation Tool: A decision-tree style interactive widget that suggests brands based on user-inputted demographics.

4. Tools and Lectures

Tools

D3.js / TopoJSON: For the interactive bubble charts and geographic map visualizations.
Svelte/React: For managing application state and synchronized filtering across components.
Python (Pandas): For data cleaning and joining the 1,219 brands with 31,800 order records.

Lectures Needed

Past Lectures: Perception & Colors (for branding and map design), D3.js basics, and Interaction techniques.
Future Lectures: Advanced Interaction (for linked views) and Storytelling techniques to guide the user through the data layers.

5. Implementation Breakdown

Phase 1 (Skeleton): Build the basic web structure and navigation based on the design sketches.
Phase 2 (Market MVP): Implement the bubble cloud and scatter plots using the Shops Dataset.
Phase 3 (Consumer Behavior): Integrate the Order Dataset to build interactive filters and demographic charts.
Phase 4 (Advanced Features): Develop the City Tier pyramid and the recommendation engine; polish the UI/UX.

Prototype URL: https://com-480-data-visualization.github.io/projdataviz/

Milestone 3 (29th May, 5pm)

80% of the final grade

Late policy

< 24h: 80% of the grade for the milestone
< 48h: 70% of the grade for the milestone

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
data		data
website_template		website_template
COM480_milestone2.pdf		COM480_milestone2.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project of Data Visualization (COM-480)

Milestone 1 (20th March, 5pm)

Dataset

Problematic

Exploratory Data Analysis

Pre-processing

Consumer dataset

Shops dataset

First insights

Visualizations

Related work

What others have already done with the data?

Why is your approach original?

What source of inspiration do you take?

Statement on Prior Dataset Exploration

Milestone 2 (17th April, 5pm)

1. Project Goal

2. Core Visualization (Minimal Viable Product)

3. Extra Ideas & Enhanced Features

4. Tools and Lectures

Tools

Lectures Needed

5. Implementation Breakdown

Milestone 3 (29th May, 5pm)

Late policy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project of Data Visualization (COM-480)

Milestone 1 (20th March, 5pm)

Dataset

Problematic

Exploratory Data Analysis

Pre-processing

Consumer dataset

Shops dataset

First insights

Visualizations

Related work

What others have already done with the data?

Why is your approach original?

What source of inspiration do you take?

Statement on Prior Dataset Exploration

Milestone 2 (17th April, 5pm)

1. Project Goal

2. Core Visualization (Minimal Viable Product)

3. Extra Ideas & Enhanced Features

4. Tools and Lectures

Tools

Lectures Needed

5. Implementation Breakdown

Milestone 3 (29th May, 5pm)

Late policy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages