python – Interpreting feature vector values in a clustering algorithm

I am building an image clustering algorithm utilizing a Convolutional Neural Network as part of a bigger project. This project is being developed in Google Colab, using Python and Tensorflow.

The main premise of this algorithm is that it takes a large scale of enterprise logos from a database and, utilizing VGG16 for feature extraction, forms K clusters of visually similar logos.

I have already created efficient and satisfactory code, utilizing Principal Component Analysis
for dimensionality reduction and written each logo’s feature vector in a pickle file, however, I have came across a hurdle: I now need to identify what each value in the feature vector means.

For example, this is the feature vector of a random image in my collection:

{'MIN822210.jpg': array([[1.4606382 , 0. , 0.66149026, ..., 0. , 0. , 0. ]], dtype=float32)

While I know these values (1.46, 0, 0.66, 0, 0,0) represent the relevancy of each feature in this particular image, how can I identify exactly what these features are? How can I tell what features are edges, interest points, etc., with the intention of naming clusters appropriately?

My end-goal would be to have a cluster, K, with an identifiable name, (for instance “Grayscale”), which would contain most of the grayscale logos found in the database, but for this I need to understand what features comprise a “grayscale” image.

Are there any tools, tricks or code to perform this analysis?

Product of the Month September 2016

Source link

Leave a Comment

Your email address will not be published. Required fields are marked *