Taobao is the largest e-commerce in China. People buy all sorts of things there, including instruments. However, we noticed that the shopping behavior of Western instruments and Chinese instruments is very different, and price, size, location are all the influencing factors. Besides that, we further analyzed the purposes and qualities customers look into when searching in Taobao.
To define the topic, we first analyzed the Chinese traditional instruments dataset provided in class and did some research on music visualization. Then, we did a brainstorm on all the data we can find and possible relationships to create.
Through the process, three main topics have emerged:
After group discussion, we decided to focus on instrument usage in film scores and their composers because there is a lot of literature about music in martial art films and film as one form of pop art related most to contemporary life compared to the other two topics. The initial research question is: What’s the most used Chinese traditional instrument in films and why?
As most of the references we could found in this topic are very qualitative and it’s hard to find the pre-made dataset, we decided to build our own dataset (Pic2) and use an online graph tool to visualize the data (Pic3). For further analyzing the data, we import the dataset into NEO4J (Pic4), a visual graph database. In the database, we can conduct queries to answer our research questions.
In short, there’re two main findings:
Pic2. Dataset Excel (See Original)
Pic3. Graph Visualization (See Original)
Pic4. NEO4J Visualization
To further develop the research question, we did some interviews with the potential audience. Through the interview, we found one common question interests both the audience and ourselves is: Why the composer choose a certain instrument in certain film scenes? However, when we try to compare certain instruments to the corresponding film scene type, the data is limited and the analysis process is complicated. It’s hard for us to find all the data needed and analyze these films manually.
What’s more, we found that the former research question will be more relevant when it applies to normal people rather than only limited to professional users. So we came up with sales data in Taobao, which is a much larger dataset and can reveal the public’s shopping decisions. So, our final research question is: Why people choose a certain instrument in Taobao?
Before collecting data to answer our research question, we first generated some hypotheses about factors influencing an instrument purchase decision.
1) Price
The first obvious one is ‘price’. Since instrument can be as cheap as a cup of coffee or as expensive as a house, the most practical consideration of purchasing decision is can you afford it? So we assume that cheaper instruments are more affordable, so their sales counts will also be more.
2) Popularity
Later on, when we came to one report(source) showing how popular piano is in China: 5.82 ownerships every 100 households, we started to question our first hypothesis. As piano is very expensive while well-known, we wondered whether popularity is another factor influences people’s buying decisions.
To understand instrument purchasing behavior, we collected data mostly from Taobao, the largest e-commerce in China. Through a data API service, we can search the instruments by keywords. It works in the same way as we type keywords in Taobao, and gets the search result in the first page (normally containing 10 products) automatically in JSON format. The data abstracted contains the product name, ID number, price, and sales amount in a month. To get product information in later pages, we wrote a script to collect data of the top 50 products under each keyword. As a result, there're 2000 products in our dataset.
With the product ID number, we can use another API to search for more detailed information about this product, such as storehouse location, market price, discount, and so on. Due to the speed limitation of the API service, searching for all ID numbers can be both time-consuming and money-consuming, so we only search detailed information for one five of the products to make a sample. During this process, we formatted the code into some function, so it may be useful for later projects if anyone wants to use the same API.
1) Unbalanced market share
If we compare sales data of CI(Chinese Instrument) with WI(Western Instrument), we will find the market unbalanced. WI has more than two times the sale of CI. Surprisingly, the sales of piano are even more than CI in total. And we can find that plucked instruments such as guqin, guzheng, and pipa account for a large percentage of CI sales, while keyboard instruments (piano and accordion) and bowed instruments(cello and violin) are the two biggest part in WI.
Pic5. Sale Count
2) The cheaper, the more sale count?
Then we put price aside for a while and look into the monthly sale count. However we can see that the most profitable instrument may not be the best sellers. In CI, wind instruments such as dizi account for more than half of the sales count. And in WI, while the keyboard instrument still accounts for a relatively large portion, percussion instruments (xylophone and marimba) also sell well. We know that dizi and xylophone are the relatively cheap instruments, so we wondered whether there is a tendency that the cheaper instruments sell better?
Pic6. Sales
3) Both the cheapest and the most expansive ones selling well
We made this graph combined by bar chart and box plot. The bar height represents the monthly sales count, and the plot shows the range of the price. For the color coding, orange represents Chinese instruments while blue stands for Western instruments. The instruments in the chart are sorted by the median price, we can see that the sale count bars are gathered at the two ends. Either the cheapest ones and the most expansive ones sell well. However, nobody is interested in the ones in the middle. The polarity is interesting and we want to know why.
Pic7. Sale Count & Price
4) Will popularity affect purchasing behavior?
Is it because of its popularity and well-knownness? From the Baidu index data, we found something interesting. The top 2 well-sold Chinese and Western instruments (CI: dizi, pipa; WI: piano, trumpet) also show a polarizing trend. Combining the index data with sales data, we found that the well-sold but expansive ones, such as piano and pipa, also have a high search index. Meanwhile, the well-sold but less popular ones, such as dizi and trumpet, are much cheaper.
So probably people buy expensive ones because they are famous, and choose the cheap ones because they are more affordable? There could be different motivations behind the purchasing behavior.
Pic8. Baidu Search Index
5) Motivations behind product names
To understand people's purchasing motivations, we did a text analysis of the product names. Because the sellers in Taobao want their products shown to as many people as possible, they try hard to put every possible keyword searched by customers in the product name and make it really long. We use a keyword extract API to find the words that occur most in the titles. Through the analysis, some patterns emerged. Based on the semantic meaning, we clustered the keywords into 5 groups (Pic9): material, purpose, geolocation, style, and user.
Pic9. Text Analysis
After looking into purpose, we find that most categories have a balanced emphasis on performing and graded examinations, except for percussion instruments and wind instruments. These two categories don’t have graded examinations, so in most cases people buy them for fun or as a music enlightenment tool for kids. Besides, they are cheaper and more affordable compared to other instruments, so it's easier for people to make the buying decision.
Pic10. Instrument Purchasing Purpose
As for style preferences, we find that most WI have “hand-made” and “imported” tags to emphasize their high quality. Many CI are labeled “ancient” and “ethnic”. which suggest people associate them more with cultural meanings. There are also some tags like “mini” and “small groups”, which represent some specific needs of small groups. Interestingly, piano has a unique tag which is “intelligent”, that may because piano is the most modern instruments and there are many piano assisted by information technology to make it easier to learn.
Pic11. Instrument Style Preference
Finally, we put the median price, monthly sale count, and Baidu index together to look for some overall patterns. Each line represents one instrument and the color encodes their category. There are mainly two clusters. One has a low price and low Baidu index, but sells well, mostly percussion instruments and wind instruments. The other one has a relatively medium price but the sale count is very small. They are mostly plucked instruments and bowed instruments.
According to these two clusters, we find the price is a very influential factor behind the purchasing behavior. But there seems to be no obvious correlation between popularity from the Baidu index and monthly sale count. Although some instruments with higher Baidu index sell better, there are also many negative examples. For example, violin has a similar price and Baidu index to Chinese plucked instruments such as guzheng, but it sells way more better.
Pic12. Parallel Coordinate Analysis
1) What we learned from the process
The exploring process of the data story is challenging but worthwhile. To conclude, there’re mainly two things we learned from trial and error.
• First is how to evaluate a topic.
As we changed our topic from films to sales in half-way, it wasted a lot of time. However, through this, we also realized the importance to evaluate the topic before going further. As a result, we summarized three key questions to ask when deciding the topic: 1) Can you get the data needed? 2) Can you analyze the data into the form wanted? 3) Can the analyzed result arouse the audience’s interest?
• Second is to view data from different perspectives.
At first, we generated many hypotheses about factors influencing people’s purchasing decisions. However, no single factor shows direct causation with sale counts through the initial data analysis. Stucked for a while, we decided to review the data manually to see if we can find any insights. After that, we combined different influence factors together to better understand purchasing decisions.
2) What we can do next
The insights we found through the data analysis are interesting but also limited. To draw further impact, we can deepen our research question into: “why people choose certain instruments?”. So if anyone wants to continue the research, there are two promising directions to contribute:
• Combine offline sales data
• Combine the music consumption behavior on the content platform (e.g., Bilibili, NetEase Music, etc.)