First Monday

Fashion informatics and the network of fashion knockoffs by Lauren Copeland, Giovanni Luca Ciampaglia, and Li Zhao

Knowledge discovery techniques have a long history of application to fields of practice such as marketing and business intelligence. Fashion and other manufacturing compartments have comparably enjoyed little attention from computer scientists. With the increasing availability of multimedia data from the Web and social media, our understanding of the fashion apparel industry could be significantly enhanced through the use of knowledge discovery methods and of large scale datasets obtained from places such as Twitter and Instagram. Here, we are interested in one of the issues at the center of the contemporary structure and dynamics of the fashion industry: the practice of knockoffs. We combine Web scraping and network science techniques to give a preliminary characterization of how brands knock designs off each other. Such a study could be one of the first examples of an emergent field, which we refer to and define as “fashion informatics.”


Background and literature
Results and findings
Conclusions and implications
Future research




With the onslaught of technological advances within the fashion industry, a great deal of data about fashion apparel and consumer behavior is being generated online. To be able to make sense of this massive amount of information there is a pressing need to introduce into the apparel and textile industry a new computational toolbox including, but not limited to, big data analytics, data mining, algorithmic decision-making, machine learning, and social network analysis. These techniques and methods are crucial to the viable future of the apparel and textile retail industry. There has been very little research regarding big data and machine learning algorithms as catalysts to retail management. As professionals in the industry are beginning to grapple with the concept of big data and how it can be useful in areas such as product management and consumer analytics, no research has looked at how these strategies can be utilized to better understand sustainable and ethical issues within the apparel industry.

The necessity for the application of computational methods in the fashion industry is something we term “fashion informatics” and introduce to the apparel and textile research industry, as well as academia, as a new sector of research importance.

There is a great need for more fashion informatics within the industry, as it is just beginning to expand into real-world applications such as social network analysis, consumer analytics, and product supply chain management. Fashion informatics also needs to be better understood by academics and students going into these industries. This will require, at the very least, a better appreciation of the methods and techniques from data science, network science, computational social science, and big data analytics within the necessary apparel industry entry level skills, and an introduction of computational thinking ideas within academic curricula related to fashion apparel and design.



Background and literature

Current research

Currently, few researchers have used data-intensive computational methods within the field of fashion (Chen and Luo, 2017; Park, et al., 2016; Lin, Zhou, et al., 2015; Lin, et al., 2015; Lin, et al., 2014). To the best of our knowledge, no one has used machine learning to explore the concept of ethical issues in fashion apparel production, specifically, fashion knockoffs. According to Swayne (2014), big data are the future of the fashion industry. Researchers at Penn State University (Swayne, 2014) state that “Eventually, data scientists could analyze real-time data from social media sites, such as Twitter, Pinterest and Instagram to predict fashion styles” [1]. There has been much attention around identifying fashion trends or actual product assortments being sold through the means of big data analytics. Chen and Luo (2017) most recently looked at the mining of online data to determine best-selling features of clothing items. The researchers utilized e-commerce Web sites to gather huge amounts of data in order to “1) explore and organize a large-scale clothing dataset from an online shopping website, 2) prune and extract images of best-selling products in clothing item data and user transaction history, and 3) utilize a machine learning based approach to discovering ne-grained clothing attributes as the representative and discriminative characteristics of popular clothing style elements.” [2] Through this method Chen and Luo (2017) found success in effectiveness of machine learning to be able to produce accurate results. Additionally, Park, et al. (2016) used data mining on Instagram to predict the most popular fashion runway models during the New York Fashion Week with remarkable accuracy.

Heng Xu (2015, 2014) found that quantitative research is lacking within the fashion industry, even though massive amounts of large scale data are available, whether through retailer databases, online social media, or general Web libraries. Lin, et al. (2014) analyzed the text of fashion reviews and traced patterns of influence among fashion designers. They conclude that such a fashion influence network resembles a small-world network [3]. Additionally, Lin, Zhou, et al. (2015) applied text analysis to to group different designers by their favorite fabrics. More recently, Wong, et al. (2016) applied computer vision techniques to images collected from to analyze color choices made by designers. They found robust patterns of influence among designers. Though these researchers are taking steps to better understanding the use of big data and computer techniques in the fashion industry, these methods still lack the speed, manpower, and full implementation needed to match the current needs of the fashion industry.

Current industry needs. Industry professionals are ahead of academic researchers when it comes to exploring the topic of big data analytics in retail business. The Trunk Club ( uses algorithms to build trunks in real time from available products in their warehouses; Macy’s is tracking consumer behavior and purchase analytics through their smart phone application; Asos and Target are monitoring their competitors online to better understand how to price a product in relation to their competitors. Through big data analytics, These large amounts of data are being provided to companies either through their own means of data collection or through private companies like WGSN and Editd. The goal of these companies is to keep track of massive amounts of data and to leverage the insights that can be drawn from them as a business offering. However, not all retail professionals are up to speed on these types of resources, and many still do not know how to use them to their advantage (Gupta, 2015; Murray, 2016; Noyes, 2014; Sharma, 2017). Additionally, there is little research regarding big data or other computational techniques about ethical issues within the fashion industry. This brings us to the topic of our research: utilizing computational methods to initiate an investigation into fashion knockoffs and intellectual property rights surrounding fashion designers.

Fashion knockoffs. Fashion piracy acts or knockoffs — broadly defined as the intentional replication of fashion apparel designs between competing brands — fuel and rob the fashion apparel industry of its creative uniqueness as well as lead to social and environmental issues within society (Dahlén, 2012). The well-known trend known as ‘fast fashion’ has emerged worldwide in the last two decades and has puzzled scholars and industry experts as a paradox with both positive and negative consequences. According to recent estimates, the apparel and textile industry is worth over US$70 billion in the U.S. alone (Reichard, 2013). Fast fashion or street fashion brands have become a large part of these industry profits. Brands with a very fast product lifecycle, are especially known to knock off designs from high-end luxury brands and become profitable for retailers in the process, creating an ethical conversation of ownership and what necessary protection of fashion design would look like. The distinction between low- and high-end fashion is present in all aspects of the fashion industry and it is a common feature of cultural production markets.


Example of a fashion knockoff collected from the Web in our dataset
Figure 1: Example of a fashion knockoff collected from the Web in our dataset (Left: Quiz Dress, Right: Mary Katrantzou Spring 2012 Runway; Mimi, 2012).


Contrary to mere counterfeiting, knockoffs do not necessarily resemble the original item in every detail, as seen in Figure 1. The concept of knockoffs is closely but confusingly tied to intellectual property rights that have been approached in varied ways, both internationally and nationally, with no serious definition or outcome of protection. As apparel production continues to become quicker, and wreaks more havoc on underdeveloped countries, their citizens, and the environment, there is a question of how the issue of intellectual property (IP) rights within the industry can become defined and protected. The answer to this question is also directly tied to the responsibility of retailers within the fashion industry, as they strive to become more sustainable. According to the International Anti-Counterfeiting Coalition (IACC), in 2015 the global counterfeiting trade totaled 1.77 billion U.S. dollars. The connection between knockoffs and counterfeiting is one that has had much discussion surrounding it. The idea of IP in fashion is a point of contention within both government and industry, and one that has not been fully fleshed out to create the proper protection of fashion designers and brands regarding knockoffs.

Responsible apparel

Understanding apparel and textile consumer behavior, and especially how the habits of contemporary consumers are changing, is thus an important research question in apparel and textile marketing research. This is especially true because of the significant load that mass production of apparel and textile goods has on the environment (Black, 2008). According to Challa (2012), the apparel and textile industry is considered highly polluting compared to other manufacturing industries. For example, it takes 10 times more energy to produce one ton of textiles than one ton of glass (Draper, et al., 2007). A majority of textile products have negative impact on the environment one way or another: whether it be through production, consumer use, or garment waste. Global textile consumption is equal to 30 million tons per year (Hiller Connell, 2015). This phenomenon is exacerbated by the rise of the aforementioned fast fashion brands and retailers, which rely heavily on knockoff practices to update their catalog offerings in a short amount of time.

Increasing the consumption of more environment-friendly apparel and textile products would thus be a way to benefit the environment worldwide. Indeed, some concerned manufacturers — including fast fashion ones — are producing apparel using more environmentally sustainable materials and processes. However, while production of sustainable fashion seems to become increasingly mainstream, research indicates that there are barriers to the consumption of environmentally sustainable apparel, including, but not limited to, price, awareness toward environmental issues, and purchasing attitudes. To reduce overall environmental impacts of the apparel and textile industry, and to encourage more fashion firms to adopt environmentally sustainable strategies, the purchase intention for environmentally sustainable apparel needs to increase; and therefore, it is necessary to explore mechanisms for overcoming consumption barriers. As long as there is low demand for sustainable apparel, the two chief conditions for a larger diffusion of fast fashion will not be satisfied. These are: (i) the strong offering of varied sustainable items, that consumers in the mainstream are known to desire, will be lacking; and (ii), there will be no financial break in the price of those items (Hiller Connell, 2010).

Intellectual property and counterfeiting. Intellectual property rights in the fashion industry are minimal, and surprisingly do not fall in line with other creative industries such as art, design, and literature, to cite a few. Surprisingly, fashion industry professionals do not actively fight for IP rights as a group. Instead, they fight in small entities for legislation, even though other creative industry professionals understand that group efforts generally achieve more. This is an area where fashion industry leaders can make improvements in their business strategies to protect the rights of designers (Xiao, 2010).

The United States seems to be behind in the area of IP rights and laws (Xiao, 2010). European government and industry influentials are leading the way, and set a good example for the United States (Beltrametti, 2010). Instead of pushing forward with legislation, government leaders and industry professionals in the United States tend to lean on international IP rights and laws when making judgements. This is despite the fact that it is far easier to pass legislation in the United States than it is to pass international law across multiple countries (Xiao, 2010). As a result, the United States government has put forth little effort in protecting designer rights, due to international laws not providing the same protections within the same timeframe for designers and artists as for other areas (Raustiala and Sprigman, 2006). IP rights legislation must also include the country of origin in its jurisdiction, counting where an apparel item was designed, created, and/or produced (Xiao, 2010). These actions often occur in different countries, and not all countries have IP rights and regulations in place. This can make prosecution very difficult, leading to many fashion industry professionals in the United States limiting collaboration within legal arenas, in order to push legislation regarding intellectual property. Thus, these trends contribute to to a vicious cycle in which the fashion industry struggles to understand and define intellectual property, hence making attempts to prosecute its infringements less successful (Raustiala and Sprigman, 2006).

IP rights for designers are very important, as those designers and brand owners who want to protect their designs cannot do so without legal remedies in place. However, legislation must be the result of the entire industry and its leaders working to develop appropriate legislative solutions with appropriate elected representatives and governmental bodies. As a result, some companies, representatives of brands, and designers find themselves falling into the world of knockoffs, with some even knocking off their own designs. Rather than fighting the problem, some companies have chosen to embrace it in a way that enriches their current operations. At this point it seems unlikely that the fashion industry will collectively argue for change on IP rights, unless consumers begin demanding it (Raustiala and Sprigman, 2006).

Currently, fashion designers may only be protected by trademark infringement, but does not protect creative designs copied and distributed under a different label, a kind of design piracy. This is seen daily through the growth of fast fashion and knockoffs. According to Beltrametti (2010), U.S. law treats design piracy and counterfeiting quite differently, despite overall similarities. Beltrametti argues that the U.S. should not consider these as separate issues, but should provide the same amount of protection. Beltrametti (2010) argued for increased protections on the basis a variety of bills introduced in the U.S. Congress under the umbrella of the Design Piracy Prohibition Act. These bills were collectively intended to extend copyright protections to fashion designers, but Congress has not passed any of these proposals (Ellis, 2010).

Dahlén (2012) called for a need for a tangible differentiation between “fashion” and “clothing,” based on the idea that intellectual property rights are often catered towards symbolic and intangible production. Dahlén argued that “clothing” is the result of material production and not necessarily held to the same protections as “fashion.” Dahlén (2012) discussed that a state can develop IP rights and regulations based on the ways national and international IP laws work together, even when countries taking part in the process of design, creation, and production are not the same. Fauchart and von Hippel (2008) made a case for “norms-based” IP, complementary to law-based IP. These were defined as operating “entirely on the basis of implicit social norms that are held in common by members of a given community” [4]. In this case members of the fashion community could be involved in understanding the impact of fashion knockoffs.

Computational social science and big data

As social scientists in apparel/textiles/fashion oftentimes find, quantitative analysis of subjective topics can be hard to define. Cioffi-Revilla (2017) defines the combination of social sciences and computer science as the “integrated, interdisciplinary pursuit of social inquiry with emphasis on information processing and through the medium of advanced computation” [5]. These combinations include data extraction, social network analysis, social geographic systems, modeling, and simulation (Cioffi-Revilla, 2017). Grewal, et al. (2017) identify five key areas in the future of retailing including “(1) technology and tools to facilitate decision making, (2) visual display and merchandise offer decisions, (3) consumption and engagement, (4) big data collection and usage, and (5) analytics and profitability” [6]. The topic of big data has been recognized to be potentially critical to the fashion industry (Nair, et al., 2017). Both industry professionals and researchers could benefit from the use of big data (Madsen and Stenheim, 2016) through their ability to provide greater insight than conventional datasets (Nathan, et al., 2013). As big data strategies and availability becomes more prevalent, it is undoubted that fashion companies and industry professionals will hire internally, as well as outside firms to collect and analyze big data specific to their needs, and to help them better understand customers.

As already mentioned, companies such as WGSN and Editd are providers of such data and companies like Target, Asos, Macy’s, and Trunk Club are already using these types of data companies to better their product performance and educate themselves and their employees on the current status of the industry, product analysis, and consumer behavior (Gupta, 2015; Murray, 2016; Noyes, 2014; Sharma, 2017). Machine learning techniques are particularly helpful when dealing with large-scale datasets. Chen and Luo (2017) conducted a study regarding a machine learning algorithm about clothing attributes. This type of research helped to gain an understanding on the possibilities of big data. Ding, et al. (2013) identified four characteristics to big data. They recognize that mixed characteristics occur when social aspects of fashion are accounted for, limiting an algorithmic understanding of fashion. Though still novel, big data techniques are becoming increasingly common in the fashion industry. Large companies, in particular, are starting to use these technologies to better connect with their customers and to manage the business. Across fields including the fashion industry, several companies report how big data analytics can change the way a given company does business. In this paper, we introduce the concept of “fashion informatics” which we define as the analysis of massive amounts of data by means of machine learning, social network analysis, and computer vision techniques targeted towards the fashion industry. Within the context of apparel, there is no doubt big data and computer science techniques could greatly affect retail business on all levels, particularly relative to IP rights among designers and brands (Kiron, et al., 2014).

There are some barriers associated with big data and fashion. One especially pressing problem is that much information can be collected without the approval of consumers, creating ethical challenges. Combining ethical challenges with the possibility of seeking out knockoffs in the industry could create potential backlash from fast fashion retailers. Knockoffs and counterfeits are very controversial, and using big data could add possible implications and reactions from industry. Other issues of IP in the industry include vanity counterfeits, overruns, condoned copies, self-copies, and high-quality counterfeits that also need to be taken into account (Hilton, et al., 2004).

Rahm (2014) stated a need to be able to properly identify which specific items are knockoffs as key to the IP debate. Big data researchers focusing on this issue could help pinpoint and better define this phenomenon, and could be crucial to advance the debate among industry professionals. Of course, big data analytics does not come without concerns, specifically from an ethical consideration. Take for example the practice of consumer personalization. Big datasets about consumer behavior have been used for profiling purposes. The idea is to cater every possible consumer experience to a specific individual. However not everyone in the industry accepts datasets for these purposes (Hilton, et al., 2004; Martin, 2015).




The aim of this study is to show how data collected from the World Wide Web and social media could contribute to a better understanding of fashion knockoffs. To do so, we leverage public domain information scraped from the Web and social media. Our goal is to show how simple analytics techniques could provide a broad quantitative sketch of the overall practice of fashion knockoffs. Even though the resulting dataset presented here is limited in size, it is amenable to analytical techniques drawn from network science. Therefore, we see this exercise as a proof of concept of the fact that more massive datasets could have a transformative impact in the world of fashion.

To collect our data, we make the following observation: blogs about fashion “copycats,” as well as social media groups, usually report instances of fashion knockoffs, and the information about brands involved is readily available. We reason that, if collected and aggregated in the proper manner, this form of collective intelligence (Bonabeau, 2009) could provide unique insight into the overall practice of fashion knockoffs within the fashion industry.

Our unit of analysis is at the level of individual fashion brands. We use Web and social media search techniques to collect an organic sample of fashion knockoffs. Our long-term goal is to create a broad spectrum of fashion knockoff detection techniques, and an initial set of labeled examples of knockoff could provide the basis for training automated methods for doing so. To gain an understanding of the overall patterns in these data, in this work we use network science techniques to visualize and inspect this preliminary seed set. In the following we describe our research goals and proposed method in greater detail.

Research goals

Despite the importance of knockoffs in the fashion world, very little research exists on quantitative measurement of knockoff practices, and in particular regarding whether the flow of knockoffs strictly follows a top-down direction from luxury brands to fast fashion brands. Traganitis, et al. (2015), Ding, et al. (2013), and Chen and Luo (2017) all provide starting points to developing algorithms to interpret such data. Evidence that high-end brands knock off low-end ones is only anecdotal, and it is not clear the extent of this phenomenon, nor the exact nature of the direction of relationships between these brands. Moreover, there is also very little research on the interplay between knockoffs, fast fashion, and online consumer preferences for sustainable fashion as a whole. By utilizing machine learning technology to quickly identify fashion knockoffs throughout the entire Web there is the possibility of being able to strengthen the conversation surrounding intellectual property and fashion designers. Though, in order to get to such a place in the research there needs to be an understanding of methodology. Researchers must understand how to go about collecting and training a large scale algorithm to encompass the ability to deliver such a lofty goal. Therefore, the researchers in this study aim to investigate some of these missing pieces and to begin the concept of academic research being a catalyst to industry and designers in new ways from a network science perspective. Our research goals are thus three-fold:

  1. To understand to what extent fashion knockoffs fuel the demand for fast fashion through means of machine learning techniques.
  2. Better understanding of whether social media can indeed help increase the awareness toward consumption of sustainable fashion through machine learning techniques based on items procured from these types of networks.
  3. Begin to understand and apply a methodology used to visualize and automatically analyze the network of knockoff victims and offenders.

We plan to address these research goals by means of large-scale data mining from the Web and social media (Leskovec, et al., 2014). This will allow us to get a systematic picture of the practice of fashion knockoffs, fast fashion, and sustainable fashion consumption, as well as understand how successful our method is in the possible development of a fashion knockoff machine learning algorithm.




The complex relationships of innovation of the fashion industry can be investigated using a mix of network science and machine learning techniques. At scale, this will necessarily require collecting a large dataset of labeled examples of fashion knockoffs from the social media stream and the Web. No such dataset currently exists, and a substantive effort will be needed toward this end. However, to test the feasibility of our approach, we performed a pilot study on a small scale. Human coders in upper level fashion courses at a large Midwest university were assigned a set of approximately 84 of the top 100 popular brands and were asked to use Web search engines like Google or Bing and social media like Instagram to look for instances of knockoffs (FashionUnited, 2016). Coders were asked to collect any image comparing the two fashion items, and the names of the brands. Each coder was assigned a brand and was asked to collect at least 20 knockoffs of that brand’s products, regardless of whether the brand was knocking off or being knocked off by another brand. To mitigate the potential risk of drawing inaccurate inferences about individual fashion brands, the collected data were vetted to make sure that each pair of images included a genuine instance of a fashion knockoff, and to double check the correct identification of the brand name. To this end, all annotations were independently verified by two independent coders and by two fashion researchers (two of the authors); coding disagreements were adjudicated by the fashion researchers.

From this collected data, one can apply data mining techniques to infer latent attributes about the sample of fashion brands, such as the centrality of a given brand in the network, which can be interpreted as a measure of its prestige. Here, we use a recursive definition of the notion of prestige. This approach is at heart of two well-known algorithms: HITS, or Hyperlink-Induced Topic Search (Kleinberg, 1999), and PageRank (Page, et al., 1999). Intuitively, the two methods assume that if a brand is knocked off considerably it means it is somehow “prestigious”; however, it matters more if a brand is knocked off by other prestigious brands, as opposed to less prestigious ones. These approaches are at the heart of modern search engine technology, like Google Search. The main difference between HITS and PageRank is the type of latent information being estimated.

PageRank is based on the concept of random walks. A random walk is an iterative process that starts over a node; at any given time, it moves to a randomly chosen neighbor of the node it is currently on. With a small probability, the node may also “teleport” to a random node, regardless of whether it is a neighbor of the current node or not. The PageRank algorithm estimates the stationary distribution of such a random walk process as the latent prestige attribute. The more a node is visited during the process, the more “prestigious”, or important, it is.

HITS is a link analysis algorithm which estimates two attributes for each node — the “authority” score, and the “hub” score. These two concepts originate in the context of the early World Wide Web, where different Web pages needed to be ranked by their importance relative to a user query. Intuitively, an “authority” is a Web page whose content is somewhat valuable, and thus one would expect that many other Web pages “cite” it by linking to it. Conversely, a “hub” is a page whose content is not particularly valuable in itself, but that tends to include links to authority pages — a sort of information gateway. Intuitively, in the context of fashion brands, the hub score captures the tendency to knock other brands, while the authority score captures the tendency to be knocked off, and thus is a measure of influence.

Mathematically, let us consider a set of B brands and a list of N knockoffs K, where each knockoff is denoted by an ordered pair of brands (bi, bj), where bi represents the knocking brand (the ‘offender’) and bj is the knocked brand (the ‘victim’). This list can be re-arranged into a weighted adjacency matrix A which contains B rows and B columns, one for each brand. The value of a generic entry Aij is equal to the number of knockoffs between the i-th and the j-th brand. Note that, in general, Aij will be different than Aji.

As already mentioned, the HITS algorithm computes two scores for each brand, a hub score and an authority score. Each of the two quantities is defined recursively in terms of the other, in the following way:


Equation 1




Equation 2


where Aij=1 if and only if there is an edge from i to j and 0 otherwise. In the above equations, the only quantity that is known a priori is Aij. To compute the authority and hub scores, one can note that if we knew the hub scores of a page, then it would be possible to compute its authority score, and vice versa. It is possible to prove mathematically that iteratively alternating the update of the hub and authority scores one is guaranteed to converge to a stationary set of values, which represent the authority and hub scores (Kleinberg, 1999). In practice, first for any brand b we set auth(b) and hub(b) to 1. Then we apply the above equations. To make sure the computation converges the above equations need to be opportunely normalized after each iteration. We refer the reader to original description of the HITS algorithm.

We used the implementation of the HITS algorithm provided in the Gephi package, an open source software for network analysis and visualization (Bastian, et al., 2009). In the following, we report results of this exercise.



Results and findings

Exactly 808 distinct knockoffs involving about 300 brands were collected. The knockoff network was visualized as a network, in which circles (or nodes, in the parlance of network science) represent brands, and lines (or edges) represent the knockoff relationship. Of course, a brand may knock off another brand multiple times, and it would be useful to know just how many distinct pairs of knocking-knocked brands exist. To do so, we grouped all edges by their brands, and counted the number of times a distinct pair occurred. This exercise yielded a list of 470 distinct knocking-knocked brands.

Of course, the total number of brands is much larger than the seed set we assigned to our coders, since coders were asked to record a knockoff as long as either the knocked or knocking image of the two brands matched their initial brand assignment. Manual inspection of the collected brands shows that our dataset covers a wide range of brands, including both high fashion (e.g., Dolce & Gabbana), fast fashion (e.g., H&M), and unknown brands (e.g., Choies), indicating that coders might have included instances of mere counterfeits, rather than actual knockoffs. To mitigate this possible source of noise, we filtered our dataset to include only brands that appear in at least two knockoffs, whatever the role (e.g., knocked or knocking). This choice corresponds to taking the k-core decomposition of the network, a technique commonly used to identify the set of most important nodes in the network (Kitsak, et al., 2010). In our case, our threshold is equivalent to choosing a value of k=2. After filtering, we obtained a network with 84 brands and 258 distinct knocking-knocked pairs. We then used the aforementioned implementation of the HITS algorithm available in the Gephi software, which let us compute the authority and hub scores. Figure 2 shows the resulting network of knockoffs.


Fashion knockoff network
Figure 2: Fashion knockoff network. Nodes are brands and there is a directed edge A B if A knocked off a design from B. Thicker arrows indicate multiple knockoffs. Node size is proportional to the number of incoming edges, while text size to that of outgoing ones. The color of nodes is proportional to the HITS authority score of the node, indicating influential brands. Red-orange nodes have high authority, yellow intermediate, and green-blue low. For visualization purposes, only brands participating in at least two knockoffs (received or performed) are displayed.
Note: Larger version of Figure 2 available here.


The result of research goal one “To understand to what extent fashion knockoffs fuel the demand for fast fashion through means of machine learning techniques.” is that, surprisingly, among the most prestigious brands one can find both luxury ones, like Céline or Valentino, as well as more “mainstream” one, such as Nike or Vans. The list of top 20 brands with the largest authority scores is reported in Table 1.


Table 1: The top 20 most influential brands in terms of the authority score of the HITS algorithm.
BrandAuthority score
Louis Vuitton0.29
Marc Jacobs0.20
Calvin Klein0.16
Yves Saint Laurent0.16
Micheal Kors0.14
Stella McCartney0.13
Jimmy Choo0.12
Victoria’s Secret0.11


Our intuition is further confirmed by inspecting the list of the top 20 brands with the largest hub score, which is reported in Table 2. The hub score should not simply be interpreted as an absolute measure of knocking activity, but rather as a measure of how well a brand links to brands with a large authority score. In other words, hubs tend to preferentially knock off designs from authorities. However, a hub could act itself as an intermediate authority for other hubs, which is the case for brands that emulate luxury items but cater to more mainstream markets, like Michael Kors and Marc Jacobs.


Table 2: The top 20 brands in terms of the hub score of the HITS algorithm.
BrandHub score
Forever 210.53
Michael Kors0.29
Marc Jacobs0.12
Banana Republic0.08
Rebecca Minkoff0.08
Dolce & Gabbana0.07
Free People0.06
Charlotte Russe0.05
Ugg Australia0.05
Jessica Simpson0.04
American Apparel0.04


In order to explore research goal 2 — “Better understanding of whether social media can indeed help increase the awareness toward consumption of sustainable fashion through machine learning techniques based on items procured from these types of networks.” — researchers deemed that the findings in this study are very limited in how consumers are affected. However, there is a viable contribution to a visual representation of how brands relate to one another regarding their impact to fashion knockoffs. This type of information would be more useful for brands, designers, and retailers to better understand who is knocking them off and how often. Additionally, this information could be used for product analysis prediction for knocking retailers to understand how they are conducting product analysis for future season’s predictions of trends and design/brand influencers on their product. Regarding the third research goal of this study— “Begin to understand and apply a methodology used to visualize and automatically analyze the network of knockoff victims and offenders.” — it was found that the HITS algorithm offers an analogy (“hubs” and “authorities”). It is useful for organizing and analyzing this type of information related to fashion knockoffs and image analysis. Authorities correspond to brands that tend to be the subject of several knock-offs, and thus are somewhat “prestigious”. Hubs, on the other hand, tend to knockoff other brands, and thus are less prestigious.



Conclusions and implications

Thanks to the deluge of information produced and shared every day by billions of Internet users, the Web and social media are providing tantalizing opportunities to make sense of collective social phenomena in a new way. Fashion is an example of a social and cultural phenomenon that has puzzled and fascinated scientists for a long time, and the onslaught of new clothing and fashion technologies that are redefining the fashion industry can help us improve our understanding of such a process as a whole. In this work, we proposed to take advantage of the collective intelligence provided by Internet users to better understand the phenomenon of fashion knockoffs, and propose to use techniques originally developed in the context of network science to make sense of fashion knockoffs. Such intelligence could, for example, assist designers, retailers, and brands in identifying and collecting information regarding knockoffs. This information could also be used in order to better educate consumers regarding the process of intellectual theft on a large scale, and begin an image-driven conversation of the network of knockoffs and how consumerism aids in this problem.

Overall, our findings open a new area of research within the field of apparel/textiles/fashion. Algorithms play a role in how to analyze and assess information in a crucial way but those algorithms must be tested to understand which source is most beneficial. Product analysis and possibly consumer behavior trends can be analyzed through this type of computer science research. Different types of data mining within the fashion topic of interest could be employed and needs to be further explored, including spatial analysis, sentiment analysis, categorization, clustering, and network analysis.

Though network analysis is already of particular interest within social media research, this type of research could be conducted through more mainstream data mining techniques all over the Web. Hubs and authorities have always existed and interacted within the world of fashion brands, in often complex ways. Of course, the tendency of fashion brands of knocking off one another is not limited to a simple trickle-down path. It has been observed that brands from the top also engage in knocking off other brands, either one another straight from the runways, as well as toward the lower level of the more mainstream brands. Our contribution is to provide a global quantitative picture of these practices, in the hope that the key players may reflect upon their practices. This could serve as a powerful tool to bring more accountability within the fashion industry, for example by exposing the most egregious cases of fashion knock-off and the brands associated with them. This means that we should also consider the full ethical implications of using data-driven tools to expose controversial practices by brands. Any data-driven account of knockoff practices by brands should be carefully validated to make sure that it is based upon accurate data. The risk is that of making false accusations, potentially tarnishing the image of one or more fashion brands.

As more and more AI-based technological solutions are introduced to expose controversial business practices, the risk of potential misuse of these tools should be carefully weighed against their potential benefits. Industry practices about dissemination of results and education of fashion consumers should evolve accordingly. For example, information visualizations based on the fashion knockoff networks should employ interactive techniques that allow consumers to visualize examples of the knockoffs of each individual brand, and should include proper explanation of the difference between a knockoff and a mere counterfeit item. In some cases, this information could even be helpful to consumers for guiding their own spending choices.

Finally, in this research we have proposed the phrase of “fashion informatics,” which we believe will become useful in the future, as the move towards more influential strategies of big data and computer science within the fashion industry is explored and utilized. “Fashion informatics” is defined as the analysis of massive amounts of data by means of machine learning, social network analysis, and computer vision techniques targeted toward the fashion industry.

A better understanding of fashion knockoffs may also affect the sustainability of the fashion industry. Existing researchers already pose that higher knowledge regarding a behavior leads to more favorable attitudes and the possibility of a greater intent to perform the intended behavior (Connolly and Prothero, 2003; Brosdahl and Carpenter, 2010; Hiller Connell, 2010; Kozar and Hiller Connell, 2010; Hiller Connell, 2011; Hiller Connell and Kozar, 2012a, 2012b; Kozar and Hiller Connell, 2013; Reiter and Kozar, 2016). When consumers are aware of environmental issues associated with their behavior, they are much more likely to engage in behavior that is favorable towards the environment. As research improves our understanding of not only why consumers are using online technology, but also which platforms are being chosen, we will also begin to identify how they use those sites, for example to gain knowledge and to get influence from peers.

With the emergence of a data-driven computational social science, the fashion industry will be increasingly informed by researchers who use methods predicated on the analysis of massive amounts of data by means of machine learning and computer vision techniques. This will establish a novel link between the textile, apparel, and computing communities. This discipline of fashion informatics will be chiefly concerned with recognizing processes by which consumers understand and relate to fashion knockoffs, and may ultimately encourage the consumption of a larger share of sustainable apparel. Research may ultimately contribute to a better understanding, and thus more diffused consumer knowledge, of responsible issues in the apparel and textile industry.

Big data and fashion informatics research will drive more interdisciplinary research in both education and industry. According the Chitrakon (2017) the future job market for students going into fashion includes opportunities as a 3D printing engineer, consumer psychologist, data scientist, and sustainability expert, among four of the top five positions. Fashion informatics will contribute to a wide array of topics in the fashion industry including data ethics (understanding and protecting consumer rights), consumer purchase behavior, in-store analytics, problem solving, analysis of new trends, identification of new designers and fashion influencers, merchandise strategies (identifying real time effects), and clothing performance (body type/scanning, returns processes, smart fashion) (Hastreiter, 2016). Sherman (2017) indicated that currently automation in the fashion industry accounts for 51 percent of jobs, but by 2055 will account for 2.7 trillion in wages.




Due to the fact that many unknown brands were indicated in image results by coders, it is assumed that coders might have included instances of mere counterfeits, rather than actual knockoffs. Specifically, there is a need for a greater education of the differences between knockoffs and counterfeits. Additionally, the lifespan of links to images online are not constant. Images need to be retained or preserved for future analysis.

Coverage is another limitation of data collection. Coders were assigned brands from a non-exhaustive list of top brands. Therefore, it is possible that not all relevant knocking-knocked relations between brands may be included, especially if little known brands interact with more prestigious ones. Our visualization may miss this due to a lack of data about related knockoffs.



Future research

Dong, et al. (2016) recognize mobile involvement as important in understanding how mobile big data can be utilized. While the results of this study do not necessarily directly correlate with mobile data, the overall idea of collecting large-scale data from mobile devices will become crucial to further research.

While we do not make claim about the scalability of our methodology, it is nonetheless important to assess whether our research has the potential to identify meaningful patterns for a number of fashion brands. According to industry statistics, there are approximately 18,000 fashion designers in the U.S. (FashionUnited, 2019). Our approach is based on standard graph mining techniques and tools, such as the PageRank and HITS algorithms — which are known to scale to network of billions of nodes and edges — and the Gephi network visualization software, capable of visualizing networks with up to 300,000 nodes and 1,000,000 edges (Pavlopoulos, et al., 2017). Thus, nothing prevents our proposed approach from scaling to large number of fashion brands, both in the United States as well as worldwide.

Our methods depend on a large amount of data about fashion knockoffs. No such dataset currently exists. Substantial future efforts should develop this sort of resources, with information collected from the Web and social media sources like Instagram and Pinterest. Coverage should include a large number of brands, have a global scope, and be updated regularly to reflect the highly dynamic nature of the fashion industry. End of article


About the authors

Lauren Copeland is Assistant Professor in the School of Fashion Design and Merchandising at Kent State University.
Direct comments to: lcopela6 [at] kent [dot] edu

Giovanni Luca Ciampaglia is Assistant Professor in Computer Science and Engineering at the University of South Florida.
E-mail: glc3 [at] mail [dot] usf [dot] edu

Li Zhao is Assistant Professor in the Textile and Apparel Management Department at the University of Missouri.
E-mail: zhaol1 [at] missouri [dot] edu



This work was partially funded by the Office of the Vice Provost of Research at Indiana University Bloomington through the Collaborative Research and Creative Activity Funding Award. The researchers also acknowledge support from the Indiana University Network Science Institute.



1. Swayne, 2014, paragraph 15.

2. Chen and Luo, 2017, p. 1.

3. Lin, et al., 2014, p. 1.

4. Fauchart and von Hippel, 2008, p. 187.

5. Revilla, 2017, p. 259.

6. Grewal, et al., 2017, p. 1.



M. Bastian, S. Heymann, and M. Jacomy, 2009. “Gephi: An open source software for exploring and manipulating networks,” Third International AAAI Conference on Weblogs and Social Media, at, accessed 20 November 2019.

S. Beltrametti, 2010. “Evaluation of the design piracy prohibition act: Is the cure worse than the disease? An analogy with counterfeiting and a comparison with the protection available in the European community,” Northwestern Journal of Technology & Intellectual Property, volume 8, number 2, pp. 147–173, and at, accessed 20 November 2019.

S. Black, 2008, Eco-chic: The fashion paradox. London: Black Dog Publishing.

E. Bonabeau, 2009. “Decisions 2.0: The power of collective intelligence,” MIT Sloan Management Review, volume 50, number 2, pp. 45–52, and at, accessed 20 November 2019.

D.J.C. Brosdahl and J.M. Carpenter, 2010. “Consumer knowledge of the environmental impacts of textile and apparel production, concern for the environment, and environmentally friendly consumption behavior,” Journal of Textile and Apparel, Technology and Management, volume 6, number 4, at, accessed 20 November 2019.

L. Challa, 2012. “Impact of textile and clothing industry on environment: Approach to eco-friendly textiles,” at, accessed 20 November 2019.

K.-T. Chen and J. Luo, 2017. “When fashion meets big data: Discriminative mining of best selling clothing features,” arXiv (22 February), at, accessed 20 November 2019.

C. Cioffi-Revilla, 2017. Introduction to computational social science: Principles and applications. New York: Springer International.
doi:, accessed 20 November 2019.

J. Connolly and A. Prothero, 2003. “Sustainable consumption: Consumption, consumers and the commodity discourse,” Consumption, Markets & Culture, volume 6, number 4, pp. 275–291.
doi:, accessed 20 November 2019.

M. Dahlén, 2012. “Copy or copyright fashion? Swedish design protection law in historical and comparative perspective,” Business History, volume 54, number 1, pp. 88–107.
doi:, accessed 20 November 2019.

G. Ding, L. Wang, and Q. Wu, 2013. “Big data analytics in future Internet of things,” arXiv (17 November), at, accessed 20 November 2019.

L. Dong, S. Chen, Y. Cheng, Z. Wu, C. Li, and H. Wu, 2016. “Measuring economic activities of China with mobile big data,” arXiv (2 August), at, accessed 20 November 2019.

S. Draper, V. Murray, and I. Weissbrod, 2007. Fashioning sustainability: A review of sustainability impacts of the clothing industry. London: Forum for the Future.

S.R. Ellis, 2010. “Copyrighting couture: An examination of fashion design protection and why the DPPA and IDPPPA are a step towards the solution to counterfeit chic,” Tennessee Law Review, volume 78, number 1, pp. 163–212.

FashionUnited, 2019. “Global fashion industry statistics — International apparel,” at, accessed 20 November 2019.

FashionUnited, 2016. “Top 100 fashion companies index,” at, accessed 20 November 2019.

E. Fauchart and E. von Hippel, 2008. “Norms-based intellectual property systems: The case of French chefs,” Organization Science, volume 19, number 2, pp. 187–201.
doi:, accessed 20 November 2019.

D. Grewal, A.L. Roggeveen, and J. Nordfält, 2017. “The future of retailing,” Journal of Retailing, volume 93, number 1, pp. 1–6.
doi:, accessed 20 November 2019.

A. Gupta, 2015. “Forecasting the fashion future: Big data comes to rescue fashion designers!” (23 February), at, accessed 20 November 2019.

N. Hastreiter, 2016. “4 ways big data is going to revolutionize the fashion industry,” at, accessed 20 November 2019.

K.Y. Hiller Connell, 2015. “Environmental impacts of apparel production, distribution, and consumption: An overview,” In: S.S. Muthu (editor). Handbook of sustainable apparel production. New York: CRC Press, pp. 41–61.

K.Y. Hiller Connell, 2010. “Internal and external barriers to eco-conscious apparel acquisition,” International Journal of Consumer Studies, volume 34, number 3, pp. 279–286.
doi:, accessed 20 November 2019.

K.Y. Hiller Connell and J.M. Kozar, 2012a. “Social normative influence: An exploratory study investigating its effectiveness in increasing engagement in sustainable apparel-purchasing behaviors,” Journal of Global Fashion Marketing, volume 3, number 4, pp. 172–179.
doi:, accessed 20 November 2019.

K.Y. Hiller Connell and J.M. Kozar, 2012b. “Sustainability knowledge and behaviors of apparel and textile undergraduates,” International Journal of Sustainability in Higher Education, volume 13, number 4, pp. 394–407.
doi:, accessed 20 November 2019.

B. Hilton, C.J. Choi, and S. Chen, 2004. “The ethics of counterfeiting in the fashion industry: Quality, credence and profit issues,” Journal of Business Ethics, volume 55, number 4, pp. 343–352.
doi:, accessed 20 November 2019.

International Anti-Counterfeiting Coalition (IACC), 2015.“Counterfeiting adds up” at, accessed 20 November 2019.

D. Kiron, P.K. Prentice, and R.B. Ferguson, 2014. “Raising the bar with analytics,” MIT Sloan Management Review, volume 55, number 2, pp. 29–33, and at, accessed 20 November 2019.

M. Kitsak, L.K. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H.E. Stanley, and H.A. Makse, 2010. “Identification of influential spreaders in complex networks,” Nature Physics, volume 6, number 11, pp. 888–893.
doi:, accessed 20 November 2019.

J.M. Kleinberg, 1999. “Authoritative sources in a hyperlinked environment,” Journal of the ACM, volume 46, number 5, pp. 604–632.
doi:, accessed 20 November 2019.

J.M. Kozar and K.Y. Hiller Connell, 2013. “Socially and environmentally responsible apparel consumption: Knowledge, attitudes, and behaviors,” Social Responsibility Journal, volume 9, number 2, pp. 315–324.
doi:, accessed 20 November 2019.

J.M. Kozar and K.Y. Hiller Connell, 2010. “Socially responsible knowledge and behaviors: Comparing upper-vs. lower-classmen,” College Student Journal, volume 44, number 2, pp. 279–293.

J. Leskovec, A. Rajaraman, and J.D. Ullman, 2014. Mining of massive datasets. Second edition. Cambridge: Cambridge University Press.

Y. Lin, Y. Zhou, and H. Xu, 2015. “Text-generated fashion influence model: An empirical study on,” HICSS ’15: Proceedings of the 2015 48th Hawaii International Conference on System Sciences, pp. 3,642–3,650.
doi:, accessed 20 November 2019.

Y. Lin, Y. Zhou, and H. Xu, 2014. “The hidden influence network in the fashion industry,” 24th Annual Workshop on Information Technologies and Systems (WITS) [Auckland, New Zealand].

Y. Lin, H. Xu, Y. Zhou, and W.-C. Lee, 2015. “Styles in the fashion social network: An analysis on,” In: N. Agarwal, K. Xu and N. Osgood (editors). Social computing, behavioral-cultural modeling, and prediction. Lecture Notes in Computer Science, number 9021. Cham, Switzerland: Springer, pp. 356–361.
doi:, accessed 20 November 2019.

D.ø. Madsen and T. Stenheim, 2016. “Big data viewed through the lens of management fashion theory,” Cogent Business & Management, volume 3, number 1, article 1165072.
doi:, accessed 20 November 2019.

K.E. Martin, 2015. “Ethical issues in the big data industry,” MIS Quarterly Executive, volume 14, number 2, article 4, and at, accessed 20 November 2019.

Mimi, 2012. “Catwalk copy: Mary Katranzou dress for less,” Beauty & the dirt, at, accessed 20 November 2019.

S. Murray, 2016. “Data analytics is on trend with fashion houses,” Financial Times (4 October), at, accessed 20 November 2019.

H.S. Nair, S. Misra, W.J. Hornbuckle IV, R. Mishra, and A. Acharya, 2017. “Big data and marketing analytics in gaming: Combining empirical models and field experimentation,” Marketing Science, volume 36, number 5, pp. 699–725.
doi:, accessed 20 November 2019.

M. Nathan and A. Rosso with T. Gatten, P. Majmudar, and A. Mitchell, 2013. “Measuring the UK’s digital economy with big data,” National Institute of Economic and Social Research, at, accessed 20 November 2019.

K. Noyes, 2014. “What’s on trend this season for the fashion industry? Big data,” Fortune (22 September), at, accessed 20 November 2019.

L. Page, S. Brin, R. Motwani, and T. Winograd, 1999. “The PageRank citation ranking: Bringing order to the Web,” Stanford InfoLab, at, accessed 20 November 2019.

J. Park, G.L. Ciampaglia, and E. Ferrara, 2016. “Style in the age of Instagram: Predicting success within the fashion industry using social media,” CSCW ’16: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, pp. 64–73.
doi:, accessed 20 November 2019.

G.A. Pavlopoulos, D. Paez-Espino, N.C. Kyrpides, I. Iliopoulos, 2017. “Empirical comparison of visualization tools for larger-scale network analysis,” Advances in Bioinformatics, volume 2017, article ID 1278932.
doi:, accessed 20 November 2019.

E. Rahm, 2014. “Discovering product counterfeits in online shops: A big data integration challenge,” Journal of Data and Information Quality (JDIQ), volume 5, numbers 1–2, article number 3.
doi:, accessed 20 November 2019.

K. Raustiala and C. Sprigman, 2006. “The piracy paradox: Innovation and intellectual property in fashion design,” Virginia Law Review, volume 92, number 8, pp. 1,687–1,777.

R. Reichard, 2013. “Textiles 2013: The turnaround continues,” Textile World (29 January), at, accessed 20 November 2019.

L. Reiter and J. Kozar, 2016, “Chinese students’ knowledge of environmentally and socially sustainable apparel and sustainable purchase intentions,” International Journal of Marketing Studies, volume 8, number 3, pp. 12–21.

V. Sharma, 2017. “How big data plays an important role in fashion industry,” at, accessed 20 November 2019.

L. Sherman, 2017. “How automation is reshaping fashion,” Business of Fashion (23 January), at, accessed 20 November 2019.

M. Swayne, 2014. “Big data may be fashion industry’s next must-have accessory,” Penn State News (17 December), at, accessed 20 November 2019.

P.A. Traganitis, K. Slavakis, and G.B. Giannakis, 2015. “Sketch and validate for big data clustering,” IEEE Journal of Selected Topics in Signal Processing, volume 9, number 4, pp. 678–690.
doi:, accessed 20 November 2019.

M.Y. Wong, Y. Zhou, and H. Xu, 2016. “Big data in fashion industry: Color cycle mining from runway data,” AMCIS2016: Decision Support and Analytics (SIGDSA), at, accessed 20 November 2019.

E.Y. Xiao, 2010. “The new trend: Protecting American fashion designs through national copyright measures,” Cardozo Arts & Entertainment Law Journal, volume 28, number 2, pp. 405–430, and at, accessed 20 November 2019.


Editorial history

Received 20 February 2019; accepted 18 November 2019.

Creative Commons License
This paper is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Fashion informatics and the network of fashion knockoffs
by Lauren Copeland, Giovanni Luca Ciampaglia, and Li Zhao.
First Monday, Volume 24, Number 12 - 2 December 2019