Artistic Photo Filtering Recognition Using CNNs

Bianco, Simone; Cusano, Claudio; Schettini, Raimondo

doi:10.1007/978-3-319-56010-6_21

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10213))

Included in the following conference series:

International Workshop on Computational Color Imaging

1192 Accesses
4 Citations

Abstract

In this paper we propose an approach based on deep Convolutional Neural Networks (CNNs) to recognize artistic photo filters applied to images. A total of 22 types of Instagram-like filters is considered. Different CNN architectures taken from the image recognition literature are compared on a dataset of more than 0.46 M images from the Places-205 dataset. Experimental results show that not only it is possible to reliably determine whether or not one of these filters has been applied, but also which one. Differently from other tasks, where the fine-tuning of a CNN trained on a different problem is usually good enough, here the fine-tuned AlexNet obtains an accuracy of only 67.5%. We show, instead, that an accuracy of about 99.0% can be obtained by training a CNN from scratch for this specific problem.

You have full access to this open access chapter, Download conference paper PDF

An Analysis of the Transfer Learning of Convolutional Neural Networks for Artistic Images

Classification of Images as Photographs or Paintings by Using Convolutional Neural Networks

DeepArtist: A Dual-Stream Network for Painter Classification of Highly-Varying Image Resolutions

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Photo sharing services such as Flickr, Instagram etc. are continuously evolving with the progressive introduction of new features for their evergrowing user bases. One of the most popular features is the option to apply photographic filters which allow the user to adjust the mood of his pictures in a completely automatic way. Several preset filters are available corresponding to various image transformations, mostly related to shifts in the color distribution, variations in brightness and contrast, and the like.

In this work we investigate the problem of the automatic detection of the application photographic filters commonly used in the photo sharing services. We show how it is possible to reliably distinguish between original and processed images. Moreover, we also show how it is possible to identify with a very high confidence which filter has been used. The objective of this preliminary work is twofold. On one hand, it shows that it is possible to reliably identify certain kinds of distortions, paving the way for future investigation about the automatic classification of processed vs. unprocessed images; on the other hand, it would allow to take into account the influence of photographic filters in other computer vision tasks. In fact, Chen et al. [9] showed that state-of-the-art image recognition approaches using Convolutional Neural Networks (CNN) fail to correctly classify social media photos (especially Instagram), where a lot of pictures have been edited with photographic filters.

The approach we investigate in this paper is based on the use of Convolutional Neural Networks trained on a large dataset of images processed with 22 different photographic filters designed to reproduce those available on Instagram. We experimented with different architectures taken from the image recognition literature and we show how they can be adapted to achieve a very high classification rate.

The paper is organized as follows: Sect. 2 reports all the information about the photographic filters and the data used in the experimentation; Sect. 3 illustrates the classification strategy; Sect. 4 reports the results obtained and discusses their implications; Sect. 5 concludes the paper by summarizing our findings and by suggesting future directions of research.

2 Photographic Filters

In this work we consider the following 22 types of Instagram-like filters (descriptions are taken from the Instagram website):

1.
1977: the increased exposure with a red tint gives the photograph a rosy, brighter, faded look;
2.
Amaro: adds light to an image, with the focus on the centre;
3.
Apollo: lightly bleached, cyan-greenish color, some dusty texture;
4.
Brannan: increases contrast and exposure and adds a metallic tint;
5.
Earlybird: gives photographs an older look with a sepia tint and warm temperature;
6.
Gotham: produce a black and white high contrast image, with bluish undertones;
7.
Hefe: hight contrast and saturation, with a similar effect to Lo-Fi but not quite as dramatic;
8.
Hudson: creates an “icy” illusion with heightened shadows, cool tint and dodged center;
9.
Inkwell: direct shift to black and white — no extra editing;
10.
Kelvin: increases saturation and temperature to give it a radiant “glow”;
11.
Lo-Fi: enriches color and adds strong shadows through the use of saturation and “warming” the temperature;
12.
Mayfair: applies a warm pink tone, subtle vignetting to brighten the photograph center and a thin black border;
13.
Nashville: warms the temperature, lowers contrast and increases exposure to give a light “pink” tint — making it feel “nostalgic”;
14.
Poprocket: adds a creamy vintage and retro color effect;
15.
Rise: adds a “glow” to the image, with softer lighting of the subject;
16.
Sierra: gives a faded, softer look;
17.
Sutro: burns photo edges, increases highlights and shadows dramatically with a focus on purple and brown colors;
18.
Toaster: ages the image by “burning” the centre and adds a dramatic vignette;
19.
Valencia: fades the image by increasing exposure and warming the colors, to give it an antique feel;
20.
Walden: increases exposure and adds a yellow tint;
21.
Willow: a monochromatic filter with subtle purple tones and a translucent white border;
22.
X-Pro II: increases color vibrance with a golden tint, high contrast and slight vignette added to the edges.

An example of the application of the 22 filters on an input image is reported in Fig. 1. Each filter is implemented by a sequence of basic image processing operations, such as: adjustment of color levels, adjustment of color curves (i.e. nonlinear channel transformation), brightness and contrast adjustment, addition of blur and/or noise, hue, saturation and lightness adjustment, addition of vignette, use of a color layer (to generate a color cast), use of a gradient layer, conversion to black & white, and addition of flare. A schematic view of which basic operations are used for each filter is reported in Table 1.

To generate a large scale dataset with filters, we randomly sampled 20 000 images from Places-205 [17]. After that, we applied the 22 filters to generate filtered images forming a dataset that contains 0.46 M images (original images included) in total. Original images are randomly divided into training, validation and test set with ratio 75%, 5%, and 20%.

Table 1. Summary of the basic image processing operation used in each of the 22 Instagram-like filters.

Full size table

3 Investigated Strategy

In the last years convolutional neural networks (CNNs) emerged as the de facto standard for image classification. According to the deep learning paradigm, networks are composed of several layers that progressively transform the raw data into high-level information [11]. The input consists in the image pixels, and the image features are learned, instead of being explicitly designed. The main drawback of CNNs is that their training require a large amount of annotated data and of computational time.

In most cases training a network from scratch is not really necessary. In fact it is possible to reuse a network previously trained on a different task by fine-tuning it with a relatively small amount of data. This strategy works because the features learned by the network tend to be quite general, providing information that can be exploited for various image classification domains (only the last layer need to be adapted to the actual classification task) [1, 16].

The baseline for image classification is represented by the AlexNet [10], a CNN that has been trained on more than one million images, distributed for the 2012 edition of the Imagenet Large Scale Visual Recognition Challenge [14]. Several other image classification tasks have been successfully addressed by fine-tuning AlexNet [13]. We argue that likewise to other similar computer vision tasks [5, 6], the simple fine-tuning of a pre-trained network is not a viable solution to the problem of classifying Instagram-like filters. In fact, networks trained for object recognition tend to learn features that detect specific spatial patterns (i.e. those useful to discriminate the salient parts of the objects). For instance, the fist convolutional layer usually learns to extract features that resemble Gabor filters. The network learns to be as invariant as possible with respect to variations in color, contrast etc. In particular, it is expected that the network is able to recognize the same objects in images that have been modified by the application of the Instagram-like filters.

To address the problem of classifying the images into the 23 categories (22 filters + the original image) we experimented with three different networks derived from the AlexNet, the GoogLeNet and the LeNet architectures.

AlexNet is a network designed for the recognition of 1000 image categories. It includes three convolutional layers, followed by three fully-connected layers [10]. The input of the network is the image resampled to \(227\times 227\) pixels. The output of each convolutional layer is further processed by spatial max pooling, and rectified linear activations are applied to the output of both convolutional and fully connected layers. A final softmax layer maps the activation values to a vector of 1000 probability estimates.

GoogLeNet has a very complex architecture including a large number of different layers, the majority o which perform convolutions, pooling, and rectified linear activations. Groups of convolutions are used to form “Inception modules” that represent complex transformations of the data while requiring a relatively small number of parameters [15]. AlexNet and GoogLeNet have been designed for the same classification problem and, as a result, they have the same kind of inputs and outputs (with the minor difference that GoogLeNet accepts \(224 \times 224\) input images).

LeNet is the first CNN proposed for an image classification task [12]. It has been designed for the recognition of handwritten digits and includes two convolutional and two fully connected layers. The network takes as input monochrome \(32 \times 32\) images and produces as output a vector of probabilities (one for each of the ten symbols in the Arabic numeral system).

We adapted the three networks to our problem by resizing the last layer to 23 output units. In the case of the LeNet we also modified the input to \(224 \times 224\) color images (note that this significantly increased the number of parameters). We trained each network by 450 000 iterations of the stochastic gradient descent algorithm, where each iteration processed a mini-batch of 256 images. For the AlexNet we also experimented with fine-tuning from the standard version trained on the Imagenet data (to do so, we allowed the training procedure to update only the coefficients of the last layer).

4 Experimental Results

The results we obtained on the test set are shown in Table 2. All the three network architectures allow to obtain high classification rates. Even for the simple LeNet we have more than 94% percent of accuracy. The best performing network is the AlexNet which obtained 99% of classification rate. GoogLeNet obtained slightly worse results (97.6%), but with little more than a tenth of the parameters. As expected, fine tuning of the original AlexNet trained for object recognition leads to quite poor results (67.5% of classification rate).

Table 2. Summary of the networks evaluated in the experiments. For each network are reported the training method (fine tuning or full training from scratch), the depth (numbers of learnable layers between input and output), the number of parameters, and the classification rates obtained on the test set (percentages of time in which the correct class is the predicted one, and in which it is among the five with the highest prediction scores).

Full size table

More details can be found in Table 3, that reports the confusion matrix obtained with AlexNet on the test set. The diagonal of the matrix shows how the network was able to detect with very high precision the 22 filters considered. For all of them we have more than 98% of correct classifications. The main difficulty for the network consists in detecting the absence of any filter. In fact, only 91.6% of the times the original images where recognizes as such. Instead, they are often classified as if the Hefe, Hudson or Mayfair filters would have been applied. These filters do not include any strong variation in the color distribution and, to human inspection, appear quite natural. Among the filters, the highest level of confusion (about 2%) occurs between the Inkwell and the Willow filters, which both produce gray-level images. In all the other cases, the off-diagonal entries of the confusion matrix are below 1%.

Table 3. Confusion matrix obtained on the test set by the AlexNet architecture retrained for the filter detection task. Results are reported as percentages.

Full size table

The behavior of the GoogLeNet is very similar, as can be seen from the confusion matrix in Table 4. Results are in general slightly worse than those obtained by AlexNet, with the exception that the confusion among Willow and Inkwell filters raises up to 10.7%. For the sake of brevity we omit the confusion matrix of LeNet and of the fine-tuned AlexNet.

Table 4. Confusion matrix obtained on the test set by the GoogLeNet architecture retrained for the filter detection task. Results are reported as percentages.

Full size table

As we previously argued, the poor performance of the fine-tuned network can be explained by the fact that the original training forced the network to discard information about the color distribution that can be deceiving for image recognition, but useful for the classification of the filters. A qualitative evidence of this can be obtained by analyzing the coefficients learned by the first convolutional layer, that are reported in Fig. 2. For the standard AlexNet these coefficients form Gabor-like filters that are able to identify local features such as edges and corners. We obtained, instead, mostly low-pass filters sensitive to specific colors (red, green, blue, purple, yellow, cyan among the others). A few filters seems able to detect edges at a particular orientation, often with opponent colors at the two sides. Only four filters have been learned for the detection of fine details.

5 Conclusions

In this paper we have investigated the problem of automatically detect the application of photographic filters commonly used in the photo sharing services. To this end, a total of 22 types of Instagram-like filters is considered to generate a dataset of more than 0.46 M images from the Places-205 dataset. Three different deep Convolutional Neural Networks (CNNs) have been compared: AlexNet, GoogLeNet, and LeNet. Experimental results show that it is both possible to determine with high accuracy whether or not one of these filters has been applied, and also which one. In particular, we showed that a recognition accuracy of about 99.0% can be obtained by training an AlexNet from scratch for this specific problem.

The contribution of this preliminary work is twofold: first, it shows that it is possible to reliably identify certain types of distortions, opening the possibility of future investigation about the automatic classification of processed vs. unprocessed images; second, it allows to take into account the influence of photographic filters in other computer vision tasks [2,3,4, 7, 8].

References

Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. In: Unsupervised and Transfer Learning Challenges in Machine Learning, vol. 7, p. 19 (2012)
Google Scholar
Bianco, S.: Reflectance spectra recovery from tristimulus values by adaptive estimation with metameric shape correction. J. Opt. Soc. Am. A 27(8), 1868–1877 (2010)
Article Google Scholar
Bianco, S., Ciocca, G., Marini, F., Schettini, R.: Image quality assessment by preprocessing and full reference model combination. In: IS&T/SPIE Electronic Imaging, p. 72420O. International Society for Optics and Photonics (2009)
Google Scholar
Bianco, S., Ciocca, G., Napoletano, P., Schettini, R.: An interactive tool for manual, semi-automatic and automatic video annotation. Comput. Vis. Image Underst. 131, 88–99 (2015)
Article Google Scholar
Bianco, S., Cusano, C., Schettini, R.: Color constancy using CNNs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 81–89 (2015)
Google Scholar
Bianco, S., Cusano, C., Schettini, R.: Single and multiple illuminant estimation using convolutional neural networks. arXiv preprint (2015). arXiv:1508.00998
Bianco, S., Mazzini, D., Pau, D.P., Schettini, R.: Local detectors and compact descriptors for visual search: a quantitative comparison. Digital Signal Proc. 44, 1–13 (2015)
Article Google Scholar
Bianco, S., Schettini, R.: Computational color constancy. In: 2011 3rd European Workshop on, Visual Information Processing (EUVIP), pp. 1–7. IEEE (2011)
Google Scholar
Chen, Y.H., Chao, T.H., Bai, S.Y., Lin, Y.L., Chen, W.C., Hsu, W.H.: Filter-invariant image classification on social media photos. In: Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, pp. 855–858. ACM (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 806–813 (2014)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)
Google Scholar
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks?. In: Advances in Neural Information Processing Systems, pp. 3320–3328 (2014)
Google Scholar
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems, pp. 487–495 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy
Simone Bianco & Raimondo Schettini
Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
Claudio Cusano

Authors

Simone Bianco
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Cusano
View author publications
You can also search for this author in PubMed Google Scholar
Raimondo Schettini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simone Bianco .

Editor information

Editors and Affiliations

University of Milan-Bicocca, Milan, Italy
Simone Bianco
University of Milan-Bicocca, Milan, Italy
Raimondo Schettini
University Jean Monnet, Saint-Etienne, France
Alain Trémeau
Chiba University, Chiba, Japan
Shoji Tominaga

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bianco, S., Cusano, C., Schettini, R. (2017). Artistic Photo Filtering Recognition Using CNNs. In: Bianco, S., Schettini, R., Trémeau, A., Tominaga, S. (eds) Computational Color Imaging. CCIW 2017. Lecture Notes in Computer Science(), vol 10213. Springer, Cham. https://doi.org/10.1007/978-3-319-56010-6_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-56010-6_21
Published: 29 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56009-0
Online ISBN: 978-3-319-56010-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Artistic Photo Filtering Recognition Using CNNs

Abstract

Similar content being viewed by others

An Analysis of the Transfer Learning of Convolutional Neural Networks for Artistic Images

Classification of Images as Photographs or Paintings by Using Convolutional Neural Networks

DeepArtist: A Dual-Stream Network for Painter Classification of Highly-Varying Image Resolutions

Keywords

1 Introduction

2 Photographic Filters

3 Investigated Strategy

4 Experimental Results

5 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Artistic Photo Filtering Recognition Using CNNs

Abstract

Similar content being viewed by others

An Analysis of the Transfer Learning of Convolutional Neural Networks for Artistic Images

Classification of Images as Photographs or Paintings by Using Convolutional Neural Networks

DeepArtist: A Dual-Stream Network for Painter Classification of Highly-Varying Image Resolutions

Keywords

1 Introduction

2 Photographic Filters

3 Investigated Strategy

4 Experimental Results

5 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation