This color analysis tool was built to support a data visual analytics method for the study of national Webs. It was initially used to analyze the changing colors of the former Yugoslav domain, .yu, during the Kosovo War. For more information please refer to:
Ben-David, A., Amram, A., & Bekkerman, R. (in press). The Colors of the National Web: Visual Data Analysis of the Historical Yugoslav Web Domain. Internation Journal on Digital Libraries.
The method can be readily applied to any other national Web (or Web archive), and, in fact, to any folder of images.
The following is a tutorial for using the tool on a given folder of images. We will compare the color histogram of Google images results to "Donald Trump" and "Hillary Clinton" (as of November 2016). Two demo folders are included in this tutorial.
There are 2 ways to run this tool
- Use "Jupyter Notebook".
- Install image-color-analysis as package
- Please Download and install Anaconda, version Python 2.7.
- Please 'download zip' the project folders from Github.
- Unzip the folder
- In Anaconda, open "Jupyter Notebook".
- When notebook opens - it automatically opens your default browser and shows your file directory.
- Open your downloaded and extracted folder.
- Open the file "image_color_analysis.ipynb".
The script in the file you just opened is divided into three sections:
- Building a collage from all the images.
- Fitting a K-Means clustering model (identifying clusters of colors in the images).
- Creating a color histogram that summarizes the color histogram of the images.
Although the code is annotated, below is an explanation of each section.
In this section, we first specify the location of an images folder and load the images. Then, we create a collage from all the images. This is done by calculating the maximal width of all images and the sum of the heights of all images. Finally, it creates a new image with an alpha channel. That is, all images are arranged one below the other, and the empty spaces between them are marked as transparent.
Click the 'run' button to start this procedure. When it's done, it prints the time it took to build the collage (this give a good indication of the process when analyzing a large corpus). The generated collage pops up. You may want to view or save it.
In order to calculate and identify clusters of colors in the dataset, we first need to convert the images into a numerical representation - a color array. The array has four dimensions: Red, Green, Blue (RGB) and Alpha (the transparent color we added to mark the empty spaces in the previous section). Subsequently, the collage is represented as a matrix of the total height * total width * 4(that is, RGB+A).
Since KMeans does not work on a matrix, we need to transform it into a continuous, one-dimensional layer. For example, if the maximum height of the collage is 100, and the maximum width is 400, instead of representing the collage as a 100 * 400 * 4, it is represented as one line of the sum: 160,000.
Finally, we remove all transparent colors, as they are not necessary for the calculation, and we fit the model. In the code, we specify 5 clusters, but this number can be changed.
Click the 'run' button to start this procedure. When it's done, it prints the time it took to build the collage. (The larger your collage is and the larger the number of clusters, the longer it will take to complete).
This section calculates the proportion of colors for each section. Then, it normalizes the histogram, so that the proportions sum to 1. Finally, it generates an image that puts the width of each color in a histogram. 1.for each cluster, calculate the proportion of each color.
Click the 'run' button to start this procedure. The resulting histogram that pops up summarizes the color composition of your corpus!
this is the bit of code where you may change the name of the demo folder to your own folder of images:
folder = 'YOUR FOLDER'
This is the bit of code where you may change the number of clusters (=colors in the histogram):
kmeans_model = KMeans(n_clusters=YOUR NUMBER)
- Please Download and install python 3.6
- Install the package
pip install image-color-analysis
- Create new python file and import the package
from image_color_analysis import *
The package has 4 functions:
- image_collage
- k_means
- colors_bar
- analyze_folder
The function creates a collage of all the images in a folder. The function receives directory/folder path. The function return a collage as image (object's type: PIL.Image).
import image_color_analysis
folder = 'YOUR FOLDER'
image = image_color_analysis.images_collage(folder)
# print the collage
image.show()
# save the image in png format (you can choose another format as well)
image.save('collage.png')in order to load the image:
from PIL import Image
im = Image.open('collage.png')The function find a k-mean model for a collage. The function receives collage image (object PIL.Image) and number of clusters (default value = 5). The function returns the model it create (object's type: KMeans)
import image_color_analysis
folder = 'YOUR FOLDER'
image = image_color_analysis.images_collage(folder)
# train model with 5 clusters (k=5)
model_5_clusters = image_color_analysis.k_means(image)
# train model with 8 clusters (k=8)
model_8_clusters = image_color_analysis.k_means(image, 8)if you want to save the model to a file and load it later:
import image_color_analysis
import pickle
folder = 'YOUR FOLDER'
image = image_color_analysis.images_collage(folder)
# train model with 5 clusters (k=5)
model_5_clusters = image_color_analysis.k_means(image)
# save the model
model_file_name = 'model.pkl'
pickle.dump(model_5_clusters, open(model_file_name, 'wb'))
# load the model
loaded_model = pickle.load(open(model_file_name, 'rb'))Then the function creates a color bar image from a KMeans model and save it to a file. The function receives model(object sklearn.cluster.Kmeans) and file path (path and file name for saving the color bar image). The function return a color bar image (object's type: PIL.Image).
import image_color_analysis
folder = 'YOUR FOLDER'
image = image_color_analysis.images_collage(folder)
# train model with 5 clusters (k=5)
model_5_clusters = image_color_analysis.k_means(image)
color_bar_path = "new_color_bar.png"
color_bar_image = image_color_analysis.colors_bar(model_5_clusters,color_bar_path)
# print the color bar
color_bar_image.show()This function run all the functions , creates color bar and presents it to the user. The function receives:
- directory/folder path of images
- optional - file' name to save the color bar( default = 'colors_bar.png' in current dir)
- optional - number of clusters in k-means (default - 5)
import image_color_analysis
folder = 'YOUR FOLDER'
# 5 clusters and and create file 'colors_bar.png'
image_color_analysis.analyze_folder(folder)
# 7 clusters and and create file 'new_colors_bar.jpg'
image_color_analysis.analyze_folder(folder, 'new_colors_bar.jpg', 7)