Neural Image Assessment: ranking photographs by aesthetic and technical quality

Using Deep Learning to categorize photographs

A scenic photograph near Lac Cordu — Chamonix. Source: ⓒ to the author.

Extreme sports do not often lend themselves to having an opportunity to stop and take a photo of the situation. A way we can overcome this is to take a portable camera and get it to periodically take photographs using the timelapse function. However this too has its pitfalls.

In such cases, after a long weekend, I often find myself with well over 9000 photos, some good, some bad, many of the wall/river/floor in front of me. Seeing as my day job relies on using software to solve everyday problems, I thought that there must exist a method to quickly sort the desirable photographs from the undesirables — and there was!

This articles looks at how we can use the pre-trained NIMA network to group our photos into varying bins of technical and aesthetic quality.

What is NIMA?

NIMA stands for “NIMA: Neural Image Assessment” and is a convoluted neural network (CNN) whose structure is outlined in the paper (accessible here). This consists of two CNN’s analysing both technical and aesthetic quality from the MobileNet dataset.

The power in NIMA lies in the use of an Earth Mover’s Loss function. This can be thought of as the amount of “earth” that needs to be moved to make two probability distributions equal. This is useful as it captures the inherent order of the classes: in other words, categories closer together have a higher similarity than those further apart — a property that is not captured by conventional (categorical cross-entropy) loss functions.

The main model is conveniently stored on a docker image and can be installed using the GitHub repository.

Before we do this we need to install some prerequisites:

conda install -c conda-forge jq imagemagick

And of course Docker itself:

Once we have done this we open the command line, navigate to our directory and build the docker image.

docker build -t nima-cpu . -f Dockerfile.cpu

It is also important to make sure that docker can see the location of your image files for later applications.

To test the repo from the github directory we can run the following commands.

Before running we need to make sure that the Docker Damon is running. This is usually done by opening the docker desktop app

--docker-image nima-cpu
--base-model-name MobileNet
--weights-file $(pwd)/models/MobileNet/weights_mobilenet_technical_0.11.hdf5
--image-source $(pwd)/src/tests/test_images/42039.jpg
The sample image file from the test directory.

Should everything be working correctly, we should now get the following output.

"image_id": "42039",
"mean_score_prediction": 4.705008000135422

Batch Processing

Running groups of files at once is called batch-processing. This NIMA model allows us to do so by specifying a directory instead of an individual file:

--image-source $(pwd)/src/tests/test_images/

Choosing our model/weights

In addition to the technical model, we can also look at the aesthetic score by changing the weights file for the model:

--weights-file $pwd/models/MobileNet/weights_mobilenet_aesthetic_0.07.hdf5 

which produces the following result for the same image.

"image_id": "42039",
"mean_score_prediction": 5.154479526741852

We are now able to combine both scores and rank phots in increasing order of Aesthetic and Technical quality.

Ranking examples from the Documentation: (Apache 2.0)

*Converting to the JPG format

Although the model can read other formats, the model automatically appends the .jpg file extension (in lowercase), regardless of the file extension. A workaround seems to be to use ImageMagick to convert files into the correct formats

mogrify -monitor Detail -path "OuputFolderName" -format jpg *.png

In order to speed up the process we use a simple python script to read all our files, and the multiprocessing module to chunk them into 32 groups. On a Macbook Pro with 8 cores, this took 14.11 minutes for 9805 files.

The script in question can be found in the directory below.

Ranking the Photographs

Finally, all that is left is to generate the output. Although it is possible to run this directly, I used python to submit the task.

docker = os.path.abspath('../../image-quality-assessment/')
predict = docker+'/predict'
weights = docker+'/models/MobileNet/weights_mobilenet_aesthetic_0.07.hdf5'
tloc = '/tmp/photodel'
command = '''
--docker-image nima-cpu
--base-model-name MobileNet
--weights-file %s
--image-source %s
command += '> filerank.json'

The end result was that 9805 files took ~8 minutes to rank

Plotting the Files and Sorting the Results

The final step is to have a look at the distribution of our files. Here we start by looking at a Kernel Density Estimator histogram showing the ranking of our collection.

file  = 'filerank.json'
data = 'step'.join(open(file,'r').read().split('step')[1:])
import json
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_json(data)
df = df.set_index('image_id')
ax = df.hist()
Frequency of different Plots

Such a distribution can now be used to determine arbitrary niceness threshold/criteria which we will use to filter our photographs.

In this case, I have determined high-quality photos to be anything ranking past the peak of our histogram. Using this I sort my photos into categories as follows:

def sort(path):   ''' Sort files into folders based on value '''        # get the file name with no extension
name = os.path.splitext(os.path.basename(path))[0]
# extract the value from the dataframe
value = float(df.loc[name])
file = path.split('/')[-1] if value > 5.5:
shutil.copy(path, 'parsed/high/'+file)
elif value > 4.5:
shutil.copy(path, 'parsed/good/'+file)
shutil.copy(path, 'parsed/blurry/'+file)

As with all machine learning algorithms, the results are dependent on the training datasets used and must be taken with a pinch of salt. Therefore I strongly recommend taking a precautionary glance at any categories you plan to discard on the off chance something has been miscategorized.

Further-further Processing

Due to the sheer number of photographs selected, I then wish to reduce the categories further by only selecting those with people in them. This can be done using Facebook’s detectron2 algorithm (as described here). This can also be useful if your aim is to protect clients’ privacy etc.

A note on the efficiency

Once set up, the docker container is pretty performant, processing 10000 images in a matter of minutes on a MacBook pro.

ImageFiles found: 9805100%|███████████████████████████████████████████| 32/32 [01:05<00:00,  2.03s/it]1.088495131333669 minutes for  9805 files in chunks: 32

We are now able to rank photographs using the NIMA CNN Neural Image Assessment packages and group our photos in bins of increasing “quality”. This process allows us to easily sort through a large number if images, saving us the manual burden of going through each one individually.

(If you found this useful, please click the ‘clap’ button. And remember you can click it more than once! Maybe even 50 times?)

Read the full article here

Leave a Reply

Your email address will not be published.

Identifying people in photos using Python and Neural Networks

Identifying people in photos using Python and Neural Networks

Table of Contents Hide InstallationImportsPredictor ConfigurationReading the

Tracking the impact of UX Research: a framework

Tracking the impact of UX Research: a framework

Table of Contents Hide Why we should measure research (even if no one asks us

You May Also Like