1 Introduction

With the rapid development of computer technology and imaging equipment, the amount of image data grows rapidly. Companies and users typically transmit image data to cloud servers and use it through retrieval. However, as the third party, cloud servers are often at risk. Therefore, how to make the image owner store and manage the image effectively, ensuring the accuracy of the retrieval results of query users and protecting the copyright and privacy of the image owner and query user, becomes a challenging problem.

In recent years, encrypted image retrieval has developed rapidly. Qin et al. [12] combined the improved Harris algorithm with the SUFT feature, used the bag-of-words model to generate image features, and used locally sensitive hashing (LSH) on the feature vector to construct an index for ciphertext image retrieval. Xia et al. [22] used the replacement method to encrypt the image, calculated the local histogram from the ciphertext image block, clustered the histogram features, takes the cluster center as the encrypted visible word, and represents each image as an encrypted feature vector through BOEW model. The Manhattan distance between feature vectors is calculated by the cloud server to measure the similarity between images. Ferreira et al. [3] proposed an encryption domain image retrieval scheme, which can directly extract discrete wavelet transform (DWT) features from encrypted images. Cheng et al. [1] performed stream cipher encryption and scrambling encryption on the DC and AC coefficients of the JPEG image, respectively, and transmitted the encrypted image to the server and stored it on the server. The server could obtain the AC coefficient histogram of the query image. Finally, the image retrieval is realized by calculating the similarity between the histogram of encrypted query image and the database image. Liang et al. [6] proposed a multi-key encryption method in which the query characteristics of different users are encrypted with a unique key. Using global optimization and Gaussian distribution to generate multiple keys effectively improve key security and space utilization. Shen et al. [15] used the secure multi-party computation technique for image feature encryption to improve image security. Ma et al. [9] extract HSV, YUV and semantic features of ciphertext images, and then conduct adaptive multi-features fusion to improve the retrieval performance. Xu et al. [23] proposed a ciphertext image retrieval method based on homomorphic encryption. According to the public key, the differential histogram of the ciphertext image features is extracted and indexed to obtain the indexed image and decrypt it with the private key. Note that different image retrieval algorithms have their own strengths. Compared with single global feature, the retrieval accuracy of complex local features is higher, but the calculation is more complicated. The scrambling encryption calculation is simple, but the security and precision cannot be achieved. The homomorphic encryption has high security, but the calculation is complicated and the efficiency is not high. In actual applications, it should be adjusted according to different needs.

Feature optimization, multi-key encryption and homomorphic encryption methods mostly focus on the research of retrieval accuracy and retrieval security. To improve the security of the entire retrieval system, Xia et al. [21] proposed an image retrieval scheme based on cloud computing that protects privacy and prevents copying content. It uses four MPEG-7 visual descriptors, encrypts feature vectors through K-nearest neighbor (KNN), combines local sensitive hashing to improve retrieval efficiency, and use Euclidean distance to calculate similarity. In addition, considering that authorized query users may illegally copy and distribute retrieved images to unauthorized persons, this method proposes a watermark-based protocol to prevent distribution, adding an additional one under the traditional encryption domain image retrieval system model. The watermark certification center embeds user watermarks in search results to protect user privacy. To a certain extent, the program prevents illegal copying and leakage of images.

In recent years, the protection of copyright and privacy of digital media has become a research focus. Information steganography [5, 7, 8, 13], robust watermarking [16] and other technologies [20, 24]) are developing rapidly. Among them, the reversible data hiding (RDH) technology allows the receiver not only to completely extract the hidden secret data, but also to completely restore the original image without any distortion, so it is widely used in different research fields. RDH can be roughly divided into differential expansion (DE), histogram translation and its variants. Tian [18] proposed the DE method to calculate the difference between two adjacent pixels in the image and double it to get a spare least significant bit (LSB), thereby embedding 1-bit secret information. Based on it, Thodi and Rodriguez [17] added some prediction methods to improve the use of image correlation and reduced distortion. For histogram translation, Ni et al. [10] proposed histogram translation. In the histogram, a pair of peaks (highest frequency) and 0 (zero frequency) are selected as side information, and the middle of the cabinet is shifted to zero frequency to get an empty data hiding space. Coltuc [2] used a semi-closed gradient adjustment predictor (GAP) and selection technology to improve the correlation of the image. This method creates a clearer histogram and has better distortion performance. Sachnev et al. [14] used diamond prediction and sorting technology to take a part of the smoothest area of the image for histogram generation, which has a better distortion rate. To make better use of the different characteristics of different regions of the image, Li et al. [4] uniformly segmented the sequence of sorted elements to generate multiple histograms. Under certain empirical assumptions, brute force search is used to determine the histogram. The performance of the two peak pairs has been improved to a certain extent. The structure and related parameters of the above method are determined based on experience, not based on the cover content. Wang et al. [19] proposed a novel framework for multiple histogram–based reversible data hiding (MH RDH), predicting error sorting, constructing multiple histograms, and performing optimal rate distribution among different histograms according to the payload. In this method, the problem of rate allocation among different histograms was transformed into the optimization problem of rate and distortion, which was solved by an evolutionary algorithm. This method is superior to other advanced SH RDH and related MH RDH methods in capacity and distortion rate.

To further promote the development and application of encrypted image retrieval technology, protect image copyright and privacy, we propose a privacy-preserving and traitor tracking content-based image retrieval scheme in cloud computing system. For image retrieval, the data owner encrypts the image with random pixel scrambling, transmits it to the cloud, and inputs their statistical features into DenseNet on the cloud server to strengthen these features, so that the similarity of enhanced feature calculations achieves a high retrieval accuracy and maintains a low overhead. For privacy protection, the data owner uses the irreversibility and exclusive OR characteristics of the one-way hash to treat the data owner and the querying user’s information encryption, when the query user obtains the original image through the key, the system writes the encrypted information into the image through reversible data hiding technology, which can prevent the disclosure of the information and the malicious slander of the querying user. Experimental results show that our algorithm is better than similar algorithms.

Contributions: In this article, we propose a privacy-preserving and traitor tracking content-based image retrieval scheme in cloud computing system. The contribution can be summarized as follows.

  1. 1.

    A privacy-preserving and traitor tracking content-based image retrieval scheme in cloud computing is proposed. In this scheme, effective traitor tracking can be carried out with higher retrieval accuracy and lower overhead.

  2. 2.

    Encrypted image retrieval: DenseNet is used in our scheme to enhance statistical features, so our retrieval accuracy reaches 1.7 times that of similar algorithms [21], and the retrieval efficiency is also higher than that of Xia et al. [21].

  3. 3.

    Image copyright, privacy protection and traitor tracking: We perform one-way hash and XOR on user information and copyright information to generate watermark code to protect privacy. The reversible data hiding method is used to embed the watermark code into the image, effectively tracking the traitor. Our privacy protection algorithm only communicates between the data owner and the inquiring user, not through the cloud server, and there is no need to build a fully trusted third-party watermark center. Therefore, our algorithm is more concise and efficient.

The rest of this paper is organized as follows. The second part introduces related work. The third part describes the proposed scheme, the fourth section analyzes the experimental results of the scheme, and the fifth section gives the conclusion.

2 Related works

In this section, we will briefly review the algorithms related to our work: SHA256 algorithm and reversible hiding algorithm.

2.1 SHA256 algorithm

Secure hash algorithm (SHA) is a series of cryptographic hash functions issued by the National Institute of Standards and Technology (NIST), and SHA256 is one of them, which is mainly used by government departments and enterprises to process sensitive information. The SHA256 algorithm will always output a 256-bit hash value when the maximum length of the input message does not exceed \(2^{64}\) bits. The hash value does not reveal any plain text information, the plain text information cannot be obtained through the hash value, and different plain texts can obtain different hash values. SHA256 can be briefly summarized as follows: fill the input plain text into a fixed format, get linearly changed data by mixing with a given constant, grouping and iterating. Finally, perform logical operations on the data and the data filled with plain text to obtain the required hash value.

Evaluating the safety of hash function is evaluating the ability to resist strong collisions. The existing hash attacks include birthday attacks, differential attacks and so on. The birthday attack does not use the structure of the hash function and the weak algebraic nature, and is a brute force attack. When the hash space is large enough and the hash value has enough length, it can effectively resist birthday attacks. The differential attack is one of the most effective means to decipher the iterative hash function. It uses the influence of different plain text input differences on the output difference to carry out a differential attack on the unevenness of input and output.

SHA256 has a 256-bit hash value, which can effectively resist birthday attacks. The existing differential attack method attacking SHA256 can only obtain a partial collision, and cannot obtain an overall collision, that is, it can effectively resist differential attacks. Therefore, SHA256 algorithm is used to verify the uniqueness of messages and data integrity, which can effectively resist the existing attacks and have extremely high security. It is also the key factor for governments and enterprises to use SHA256.

2.2 Multiple-histogram-based reversible data hiding (MH RDH)

Histogram migration is a typical type of commonly used reversible data hiding technologies. Among them, RDH based on single histogram (SH) and multiple embedding is a highly mining of a single local feature, and multiple histograms (MH)-based embedding is an integrated use of different features in different regions. Wang et al. [19] proposed a general framework for RDH based on multiple histogram modification, which contains three key steps: multiple histograms optimized by multiple features, optimal load between different histogram ratio distribution and multiple histogram embedding. The basic principle of the framework is summarized as follows: The first step is to evaluate the prediction error (PE) of multiple candidate features in different regions to obtain a complexity measure (CM). By optimizing the parameters of multiple candidate feature models, the correlation between the complexity measure and PEs is maximized, and PEs are sorted under this parameter and divided into multiple classes using the principle of constant proportion. In the second step, the rate allocation problem between different histograms is transformed into a rate and distortion optimization problem. Use the genetic optimization algorithm proposed by this method with special designs such as low, medium, and high payload genes to solve, so as to optimize the distribution of a given payload to multiple histograms and determine the optimal one in each histogram peak and 0 value pairs, while minimizing embedded distortion. The third step is to use the existing single-histogram multiple embedding method to embed information into the image according to the selection scheme of the optimal payload and the optimal peak pair of each histogram given in the previous step. This scheme is superior to other advanced SH RDH and related MH RDH methods.

3 The proposed scheme

In this section, we first describe the system model, then introduce the function models from data owners, cloud servers, and query users, and finally explain in detail the two main functions of this scheme: ciphertext image retrieval and copyright information and user information protection services.

3.1 System model

Before designing the system, we assume that the data owner is completely trusted, the cloud server can effectively complete the calculation but is a risky third party interested in image messages. We also assume that the querying user is semi-trusted and will not collude with the cloud server, but may distribute the image privately to unauthorized users.

Our system has two major functions: one is ciphertext image retrieval, the other is copyright and privacy protection and traitor tracking. It mainly includes three different objects: data owner, cloud server and query user. The overall system framework is shown in Fig. 1,

Fig. 1
figure 1

A privacy-preserving and traitor tracking content-based image retrieval scheme in cloud computing system overall framework diagram

where u is the user authorization information, U is the hash code that contains the user authorization information, c is the copyright information, C is the hash code that contains the copyright information, W is the watermark code, K is the randomly generated key for image encryption and decryption, and S is the query authorization code.

From Fig. 1, it is not difficult to find that ciphertext image retrieval can be divided into two steps. Before retrieval, the data owner owns the image data set \(M={m_1,m_2,\ldots ,m_n}\), where n is the number of images in the dataset. To protect the security of the image information, the data owner first needs to encrypt the image with the key K to generate the encrypted image \({E={e_1,e_2,\ldots ,e_n}}\), then construct the feature processor O based on the encrypted image, and finally upload the encrypted image and feature processor to the cloud server. The cloud server uses the feature extractor O to extract the feature \(F_E={f_(e_1 ),f_(e_2 ),\ldots ,f_(e_n )}\) of the encrypted image E.

During retrieval, the querying user sends a request containing user authoritative information to the data owner. The data owner first encrypts the image copyright information c and user authoritative information u using a hash algorithm to obtain the hash codes C and U, and then the hash code U perform an exclusive OR operation with C to obtain the watermark code W. Finally, the watermark code W and the key K are mixed to generate the authorization code S and sent to the inquiring user, and the image copyright information, user authority information, watermark code W and key K are mapped and stored data storage D. After the query user obtains the authorization code S, the authorization code is separated into a watermark code W and a key K, and the query image Q is uploaded to the cloud server at the same time. The cloud server extracts the feature \(f_q\) of the query image Q, calculates the similarity between the two features, retrieves the top k search results in the similarity ranking, and returns these results to the query user. The query user uses the key K to decrypt the retrieved ciphertext image, at the same time, embeds the watermark code W into the image using the RDH algorithm.

For copyright and privacy protection and traitor tracking, when conducting traitor tracking and evidence collection, the data owner uses the MH RDH algorithm to obtain the watermark code W from the leaked image, sends the watermark code W to the data storage D for retrieval, and then obtains the image copyright information, user authority information, watermark code W and key K of the image, which can effectively track and obtain evidence.

3.2 Overview of function modules

According to data owners, cloud servers, and query users, the system functional modules can be divided below.

The following functional modules are executed in the data owner:

  1. 1.

    Key generation: \(KeyGen(1^{k})\rightarrow K\).The module inputs the security parameter k and generates a random key K, which is used for encryption and decryption of the ciphertext image.

  2. 2.

    Image encryption: \(ImgEnc(K,M)\rightarrow E\).The module inputs the key K and the original image M to get the encrypted image E.

  3. 3.

    Feature processor construction: \(OGen(E)\rightarrow O\). The module inputs the ciphertext image E, extracts the statistical features of the ciphertext image, performs feature enhancement through the convolutional neural network, saves the model and builds a feature processor O.

  4. 4.

    Encryption of copyright and user information: \(InforEmc(c,u)\rightarrow (C,U)\). The module inputs image copyright information c and user authoritative information u, through SHA-256 hash extraction, obtain copyright information hash code C and user information hash code U.

  5. 5.

    Watermark code generation: \(WatermarkGen(C,U) \rightarrow W\). The module inputs the copyright information hash code C and the user information hash code U, performs an exclusive OR operation on U and C, and obtains a unique watermark code W containing copyright information and user information.

  6. 6.

    Authorization code generation: \(AuthGen(W,K)\rightarrow S\). The module inputs the key K and the watermark code W, mixes K and W to obtain the security authorization code S.

The following functional modules are executed in the cloud server:

  1. 1.

    Feature extraction: \(FGen(E,Q,O) \rightarrow (F_E,f_q)\). The module inputs the encrypted image E and the query image Q to the feature processor O, and generates the ciphertext image feature \(F_E\) and the query image feature \(f_q\) after processing by the feature processor.

  2. 2.

    Search: \(Search(F_E,f_q)\rightarrow R\). This module matches the input query image feature \(f_q\) with the ciphertext image feature library \(F_E\), and returns a similar ciphertext image set R to the querying user through similarity calculation.

The following functional modules are executed in the query user:

  1. 1.

    Authorization code separation: \(AuthSep(S)\rightarrow (W,K)\). The module inputs the security authorization code S, separates S to obtain the key K and the watermark code W.

  2. 2.

    Image decryption: \(Dec(K,R)\rightarrow (Img)_R\). The module inputs the key K and the similar ciphertext image R returned by the cloud server, and decrypts the similar ciphertext image to obtain a similar image \((Img)_R\).

  3. 3.

    Watermark embedding: \(WatermarkEmb((Img)_R,W)\rightarrow (WatImg)_R)\). This module inputs the similar image \((Img)_R\) and the watermark code W. Uses the MH RDH algorithm to embed W into \((Img)_R\)to obtain the similar image \((WatImg)_R\) containing the watermark.

These modules work primarily for two main functions: ciphertext image retrieval and copyright information and user information protection services. Next, we will introduce in detail from this two aspects.

3.3 Ciphertext image retrieval

The functional area has three main objects: the cloud server, the image owner and the query user. The cloud server side includes two functional modules:feature extraction and search. The image owner side includes three functional modules:image encryption, key generation, and feature processor. The query user only send a query request.

3.3.1 Image encryption

In the image encryption module, this paper uses an image pixel scrambling encryption method. To protect the security of image information, the entire encryption process needs to generate three different keys, which can be expressed as \(K ={key_R,key_G,key_B}\). For an image m of \(l\times r\times 3\) in the original dataset M, the key is used to generate a pseudo-random sequence \(p_R\), \(p_G\), \(p_B\) to replace the pixel values of the R, G and B channels of the image, respectively, where the range of the pseudo-random sequence \(p_R\), \(p_G\), \(p_B\) is \([1,..,l\times r]\).

In the image encryption process, the pseudo-random sequence is used to scramble the pixels of the R, G and B channels of the image for scrambling to protect the texture information of the image, and then the R, G and B channels of the image are changed to G, B and R channels to protect the color information of the image. Such an encryption method can ensure that the statistical characteristics of the three channels before and after encryption remain unchanged. Algorithm 1 describes image encryption algorithm.

figure a

3.3.2 Feature extraction and processing

In the feature extraction part, to further explore the relationship between the statistical features of similar images and improve the retrieval accuracy, we transform the statistical features and send them to the convolutional neural network for training, thereby constructing a feature processor. The model training process is shown in Fig. 2.

Fig. 2
figure 2

Flowchart of feature extraction and processing

Model training consists of two steps: statistical feature extraction and processing and network model fine-tuning. The detailed process is described below.

Step 1: Read the ciphertext image E, calculate the gray histograms for the three channels of R, G and B, respectively, to obtain the statistical characteristics \(H_R\), \(H_G\) and \(H_B\) of the three channels of the ciphertext image R, G and B, each feature dimension is \(256\times 1\).

Step 2: Transform the features \(H_R\), \(H_G\)and \(H_B\) with dimensions of \(256\times 1\) into features \(H_{R}^{\prime }\), \(H_{G}^{\prime }\) and \(H_{B}^{\prime }\) with dimensions of \(16\times 16\).

Step 3: \(H_{R}^{\prime }\), \(H_{G}^{\prime }\) and \(H_{B}^{\prime }\) are spliced together to generate a feature \(H^{\prime }\) with a dimension of \(16\times 16\times 3\).

Step 4: To better fit the training of the convolutional neural network, this method adds zero to the edge of the feature \(H^{\prime }\) matrix with a dimension of \(16\times 16\times 3\) to obtain a feature G with a dimension of \(48\times 48\times 3\). Complete the extraction and processing of statistical features.

Step 5: Follow Step 1 to Step 5 to extract the feature set \(G_{train}\) of the training set and the feature \(G_{test}\) of the test set.

Step 6: Use the features \(G_{train}\) and \(G_{test}\) of the training set and test set to finetune the DenseNet network and save the optimal DenseNet model.

Upload the trained feature processor to the cloud server and send the ciphertext image set to the feature processor to get the feature \(F_E\). When searching, the cloud server first changes the R, G and B channels of the query image into G, B and R channels, and then uses a feature extractor to extract the feature \(f_q\) of the query image. Euclidean distance is calculated for feature \(F_E\) and feature \(f_q\) to measure similarity. Thereby, the top k search results of similarity ranking are retrieved and returned to the query user.

3.4 Copyright and privacy protection along with traitor tracking

To better protect the copyright information of the image and query user information, we designed a copyright information and user information protection algorithm using the characteristics of the one-way hash algorithm and the exclusive OR operation. It can effectively prevent image leakers from modifying the watermark information in the image to slander other users. The copyright information and user information protection algorithm flow is shown in Fig. 3.

Fig. 3
figure 3

Flowchart of copyright information and user information protection algorithm

The algorithm has two main objects: the image owner and the query user. The image owner side includes three functional modules: copyright and user information encryption, watermark code generation, and authorization code generation. The query client includes two functional modules: authorization code separation and watermark embedding. Algorithm 2 describes the protection of copyright information and user information.

figure b

For example, take Alice as the query user and Bob as the image owner. Alice sends a query request to Bob. Bob first uses the SHA-256 algorithm for Alice’s information u (Alice 12345) and his own copyright information c (Bob22222) to obtain the 256-bit user authorization information hash code U(10011111...) and image copyright information hash code C (11100101...), U and C are XORed to get watermark code W(01111010...), then W and image encryption key K(01011010...) are mixed to generate query authorization code S(0011011111001100...). The authorization code is generated as shown in Fig. 4.

Fig. 4
figure 4

Example diagram of watermark code and authorization code generation

Finally, S is sent to the query user, and the user authorization information, image copyright information and watermark code W are created and stored in a codebook at the same time. After receiving S, the query user separates it to obtain W(01111010...) and K(01011010...), K is used for ciphertext image decryption, and uses MH RDH algorithm to embed W(01111010...) into the digital sequence of the image to obtain the watermark image.

The embedding process of the watermark code in the MH RDH algorithm is described as follows:

  1. 1.

    First, the image is divided into cross set and circle set, the watermark code is expanded to set the number of payload bits and then divided into two parts for embedding the cross set and circle set.

  2. 2.

    Reserve a specific area in the image for embedding auxiliary information of the circle set.

  3. 3.

    Perform prediction error calculation, multi-histogram generation, rate distribution and multiple embedding on the cross set, embed a part of the watermark code in the cross set to obtain the auxiliary information of the cross set.

  4. 4.

    The auxiliary information of the cross set and another part of the watermark code are combined and embedded in the circle set to obtain the auxiliary information of the circle set.

  5. 5.

    Use LSB replacement to embed the circle set auxiliary information into the reserved specific area to complete the embedding.

The extraction of the watermark code and the restoration of the original image are carried out in the reverse order of the embedded data: first restore the marked circle set, then use the marked circle set to restore the cross set, and finally extract the watermark code. Because different information hiding, watermarking and reversible information hiding algorithms have their own advantages, we did not integrate the MH RDH depth of the literature [19] into the proposed one, and you can easily adjust it according to the actual situation.

4 Experimental results and analysis

4.1 Datasets and experimental settings

All experiments are completed in python and Matlab R2016a under Windows 10 with Intel(R) Core(TM) i7-9700KF CPU @ 3.60 GHz, 16.00 GB RAM and Nvidia GeForce GTX 2080Ti GPU.

Data sets: Experiments are performed on Corel10K Norouzi et al. [11], which is a benchmark data set for image retrieval performance testing, including 100 categories, each category contains 100 similar images. From 100 categories of corel10k dataset, 80 images are randomly selected from each class, a total of 8000 images are used as the training set, and other images are used as the test set.

Network fine-tuning: DenseNet121 is selected as the reference network, Stochastic Gradient Descent (SGD) is used as the network optimizer, the learning rate is 0.01, the momentum is 0.9, the batch size is 64, and the number of training iterations is 200.

4.2 Evaluation criteria

Commonly used evaluation criteria for image retrieval systems include retrieval precision (Precision) and retrieval efficiency.

(1) Retrieval accuracy

Retrieval accuracy is the key to measuring the performance of the entire encrypted retrieval scheme, and its formula is shown in Eq. 1.

$$\begin{aligned} {\text {Precision}} =\frac{k^{\prime }}{k}, \end{aligned}$$
(1)

where \(k^{\prime }\) is the number of real similar images in the search result image, and k is the number of search result images.

(2) Retrieval efficiency

Retrieval efficiency mainly refers to the time consumption in the retrieval process, mainly including: index generation time, trapdoor construction time and retrieval time. In this paper, we use them to evaluate the quality of the proposed method.

4.3 Retrieval accuracy

In this experiment, we use the retrieval accuracy (precision) to measure our ciphertext image retrieval program. Feature selection is a key issue for image retrieval. To mine better features in the statistical features, we process the statistical features and send them to the convolutional neural network for feature enhancement. In the processing of statistical features, to better fit the input data form of the convolutional neural network, we choose to fill and enlarge the statistical features with different sizes and different methods, and then enter the convolutional neural network. Because the feature amount of statistical features is relatively small, only \(16\times 16\times 3\), we choose DenseNet201 with deep feature multiplexing for experiments. The training set is 8000 images, and the test set is 2000 images. The retrieval accuracy is the overall average. The experimental results of the retrieval accuracy of different scaling methods and input sizes under DenseNet 201 are shown in Table 1.

Table 1 Retrieval accuracy of different scaling methods and input sizes under DenseNet 201

From Table 1, we conclude that the method of using surrounding 0 padding to enlarge the statistical features to \(48\times 48\) size has the highest retrieval accuracy. Combined with retrieval efficiency, we chose this solution.

To further optimize the feature processing scheme, we use the method of surrounding zero padding to enlarge the statistical features to \(48\times 48\) size and input them into different neural networks for retrieval. The experimental results of different networks are shown in Table 2.

Table 2 Retrieval accuracy of different networks

From Table 2, we find that DenseNet-based methods achieve higher retrieval accuracy than ResNet-based methods, and the accuracy difference between different DenseNet-based methods is small. Therefore, we chose DenseNet121 as the feature enhancement network to ensure retrieval accuracy.

For comparison experiments, we choose statistical feature algorithms and similar privacy-protected ciphertext image retrieval algorithms CSD, SCD, CLD and EHD in Xia et al. [21]. Both algorithms use the same dataset and evaluation method. The experimental results of the retrieval accuracy of different algorithms are shown in Fig. 5.

Fig. 5
figure 5

Comparison of retrieval accuracy of different algorithms

It can be seen from Fig. 5 that the retrieval accuracy decreases with the number of query result images k increases. Compared with only pure statistical feature algorithm, the algorithm that introduces the feature processing network for feature enhancement can greatly improve the retrieval accuracy. Our algorithm has higher retrieval accuracy than existing privacy-protected ciphertext image retrieval algorithms.

4.4 Retrieval efficiency

Retrieval efficiency is also an important indicator to measure ciphertext retrieval system. Similarly, we first optimize the system and compare the retrieval efficiency of different scaling methods and different network input sizes under DenseNet. We evaluate this system from two aspects: feature processing time and query time. We use 2000 images to retrieve and calculate the total feature processing time and query time. The experimental results are shown in Table 3.

Table 3 Comparison of retrieval efficiency under different scaling methods and input sizes

It can be found from Table 3 that the smaller the size of the input network, the shorter the feature extraction time. The retrieval efficiency of the three different filling methods is relatively small under different size and feature extraction time. When the input network feature size is \(48\times 48\times 3\), the total query time is shorter.

Combined with the retrieval accuracy, we use the method of surrounding zero padding to enlarge the statistical features to \(48\times 48\times 3\) size, and then we compare the feature extraction time, query time and model training time under different networks. We use 8000 images as the training set and 2000 images as the query set. The retrieval time overhead of different networks is shown in Table 4.

Table 4 Search time consumption of different networks

From Table 4, we find that the query time and model training time of DenseNet121 are better than other models, and the feature extraction time is better than DenseNet169 and 201. Considering retrieval accuracy and retrieval efficiency, we choose DenseNet121 as the feature enhancement network.

For comparison experiments, we repeated the experiment of Xia et al. [21]. In terms of the retrieval time, our algorithm uses Euclidean distance to calculate the similarity of features in the cloud, and returns the most similar k images. This comparison experiment k is 100, the cloud library is 8000 images, the query image is 20, and the retrieval time is the average value of multiple experiments. The search time results of different algorithms are as shown in Fig. 6.

Fig. 6
figure 6

Comparison of retrieval time of different algorithms

It can be seen that our algorithm consumes slightly higher retrieval time than Xia’s algorithm, because we uses the characteristics of the global average pooling layer of DenseNet121 to enhance the features, which can improve the accuracy, but because the feature dimension will be higher, the similarity is calculation time consumption will be slightly higher.

Our algorithm directly processes the extracted features of the encrypted image, without trapdoor generation and index construction. The comparison with trapdoor generation time of Xia’s algorithm is shown in Figs. 7 and 8

Fig. 7
figure 7

Comparison of trapdoor generation time consumption

Fig. 8
figure 8

Comparison of index construction time consumption

Although our algorithm has slightly higher retrieval time than Xia’s algorithm, we do not need trapdoor generation and index construction, so the overall query time of our algorithm is shorter. Combined with retrieval accuracy, our retrieval efficiency is higher.

4.5 Security analysis

In terms of system security, we analyze the copyright and privacy protection of ciphertext image retrieval.

Encrypted image retrieval part: Similar to Xia et al. [21], we believe that the cloud server can perform operations correctly, but we are curious about the image content, so we need to encrypt the image content. The image before and after encryption is shown in Fig. 9.

Fig. 9
figure 9

Comparison of original image before and after encryption, a is the original image, b is the encrypted image

We encrypt the image through image pixel scrambling encryption, and use the key to scramble and encrypt the pixel. The key is generated by a pseudo-random code generator. The unpredictable PRG is safe, so we can get the content information of the protected image effectively. For the problem of inferring that different images are of the same category or have a certain similarity from the image query results, this is the drawback of the ciphertext image retrieval algorithm returning multiple query results, which cannot have both efficiency and safety, so this paper does not consider this problem.

Copyright and privacy protection section: We use reversible hiding technology for tracking. We first use the SHA-256 algorithm to extract the hash code of the copyright information and user information separately. The SHA-256 algorithm is one-way irreversible. However, simply extracting the hash code still has the defect of separating the copyright information and the user information. For this reason, we further XOR the hash code of copyright information and the hash code of user information to obtain the watermark code W. Finally, the watermark code W and random key K is mixed to generate the authorization code S. So far, the authorization code for each query will not expose any image information, copyright information and user information in the face of multiple queries. For the problem of attacking the image to destroy the watermark information, our system can easily modify the watermark code embedding algorithm. The relevant algorithms are developing rapidly with many types, each with its own advantages, and can be modified according to the actual situation. The sample image of watermark code embedding using MH RDH is shown in Fig. 10 below.

Fig. 10
figure 10

Comparison of watermark code before and after embedding, a is the original image, b is the watermark image

4.6 Experimental details

In the copyright and privacy protection part of our system, the data owner needs to process copyright information and user information to obtain an authorization code. In the experiment, the time consumed for generating the authorization code is 48.9398 ms. Before retrieval, the cloud server needs to finetune the feature network. The finetuning time is 1207.3 s, and the model can be used for a long time after one training. In querying the client side, we use the MH RDH algorithm of Wang et al. [19] to embed the watermark code in the image after recycling, where the auxiliary value is 250, the bpp is 0.005, the side value is 2, and the PSNR value is 69.2477. The image size is almost unchanged before and after embedding watermark, and the successful extraction rate of the watermark code is 100%, which can effectively protect copyright information and private information, as well as traitor tracking. The watermark embedding algorithm in this paper is extremely easy to modify. According to technological development and practical applications, the watermark embedding algorithm can be easily modified, so image attacks are not considered.

5 Conclusions

In this paper, we propose a privacy-preserving and traitor tracking content-based image retrieval scheme in cloud computing system. In this method, random scrambling is used to encrypt the image, DenseNet121 network is used to strengthen the statistical features, and the irreversibility of one-way hash and the reflexivity of XOR are used to design the copyright information and user information protection module, which can effectively protect the copyright information and user information. Compared with similar privacy protection algorithms [21], our algorithm’s retrieval accuracy is not only improved by 70%, but also does not need to build an index and a third-party watermark management center.