1 Introduction

Nanostructures that are constructed by quantum dots (QDs) on GaAs substrate have been under intensive utilization due to their prominent electrical and optical properties such as lower threshold currents, higher temperature operation, reduced sensitivity to crystalline defects, improved stability against optics feedback, and ultrafast gain dynamics applicable to semiconductor optical amplifiers (SOA). Each of these advantages comes about from the discrete density of states and inhomogeneously broadened gain spectra unique to the three-dimensional carrier confinement of QDs Norman (2019).

The four-wave mixing (FWM) is a nonlinear process that results from beating two signals (pump and probe) with different frequencies and intensities among the optical material. FWM is one of the significant effects that is used for wavelength conversion at high-speed optical networks (Nosratpour 2019). The SOAs are widely used as a wavelength converter due to their high conversion efficiency. Quantum-dot semiconductor optical amplifiers (QD-SOAs) based on InAs/InGaAs are a promising nonlinear medium for efficient FWM wavelength converter around the telecommunication wavelength of 1.3 μm (Failed 2017).

Model the QD-SOA, it is necessary to solve two sets of coupled equations consist of the rate equations and the wave propagation equations (Hakimian et al. 2020). Despite many published numerical models for FWM, few of them have been reported on the FWM efficiency of QD-SOA (Zajnulina 2017; Qasaimeh 2004; Flayyih 2017). These papers also represented complexities in their models. As a result, implementing of these models is too troublesome with tedious run time, and attaining convergence is challenging (Zajnulina 2017; Qasaimeh 2004; Flayyih 2017; Izadyar, et al. 2018; Hakimian 2019). The rate equations are sets of the first-order differential equation (ODE), and these equations are solved by the Rung-Kutta method numerically (Hakimian et al. 2020). On the other hand, solving the wave propagation equation at the same time is complicated. For this aim, the wave propagation equation is applied with transfer function by the pump/probe measurement method.

In this paper, the coherent equations for numerical modeling and FWM efficiency calculation of the QD-SOA is showed firstly. By applying the SLICE technique in this model, the amplifier length is broken into equal sections, and then the computation is continued in the spatial and temporal dimensions (Hakimian 2019). Because of problems of numerical modeling of QD-SOA such as sophisticated equations, time-consuming and challengeable convergence, modeling of the QD-SOA is presented using an artificial neural network (ANN). This choice is due to the high capability of ANN in modeling, generalizability, and remarkably shorter computation time (Hakimian et al. 2020). The QD-SOA parameters of the developed ANN model by Ababneh et al. included the pump pulse energy, the pulse width, the normalized injection rate, and the frequency shift between the pump and probe signals as the network inputs and the saturated gain and the FWM as the network outputs were used (Ababneh 2006). In 2011, the approach has been demonstrated to obtain optimal values of parameters for the gain of QD-SOA using the same ANN model and genetic algorithm (GA) (Hakimiyan 2011).

Our presented ANN model consists of a feedforward network with one hidden layer. In our model, the network inputs are parameters such as waveguide loss, coupling coefficient, the QD-SOA length, and the frequency detuning. We train the ANN with the extracted data from the numerical model. The optimal ANN model is derived by performing several experiments for different network structures and various network training algorithms. A new approach to the optimal design of the QD-SOA with the desired FWM efficiency is finally showed using the mentioned optimal ANN model and a GA.

The structure of this paper is as follows: Sect. 2 demonstrated the investigated QD-SOA structure. In Sect. 3, the physical model of the QD-SOA is showed. In Sect. 4, the employed numerical model and the designed optimal ANN model of the QD-SOA are presented. Also, a new approach for optimization of the QD-SOA design using the GA is proposed. Finally, we conclude this paper in Sect. 5.

2 Simulated QD-SOA structure

The simulated QD-SOA structure is similar to the one described in (Akiyama et al. 2002). As shown in Fig. 1, the device is an InAs / GaAs heterogeneous structure operating at a 1.3 μm and consists of an n-type GaAs substrate, the active region of the InAs / GaAs quantum dot on the wetting layer (WL). The investigated device consists of seven stacked layers of InAs/GaAs QDs those are nanosize semiconductor islands with a wetting layer grown via the Strasnki-Krastanow mode. The GaAs optical confinement layer thickness is 180 nm. The SOA has a ridge waveguide structure 1 μm wide.

Fig. 1
figure 1

The simulated QD-SOA structure

For the evaluated QD-SOA structure, the number of energy levels in the conduction band (CB) and the valence band (VB) are three and ten, respectively (Qasaimeh 2004). Also, the separation energy between the states is 60 meV and 10 meV in the CB and the VB, respectively. Moreover, the inhomogeneous broadening effect is 30 meV. Figure 2 illustrated the detailed band structure of the QD-SOA.

Fig. 2
figure 2

The energy band diagram of modeled QD-SOA

The nonlinear characteristics of QD-SOA depend on the transitions in the QD semiconductor. During the pump and probe signal with different frequencies and intensities are injected inside the semiconductor, the nonlinear interaction between two signals occurred. The beating of two signals generates a modulated signal in the results of modulating gain and refractive index. This mechanism leads to the FWM effect.

2.1 Physical model of QD-SOA

The FWM characteristics of QD-SOA can be determined by solving the rate equations of the device. The rate equation for the ground state (GS) and the ith excited state (ES) based on the probability of occupation are given, respectively, as (Qasaimeh 2005)

$$ \frac{{df_{0}^{k} }}{dt} = \frac{{(1 - f_{0}^{k} )f_{1}^{k} }}{{\tau_{10}^{k} }} - \frac{{f_{0}^{k} (1 - f_{1}^{k} )}}{{\tau_{01}^{k} }} - \frac{{f_{0}^{n} f_{0}^{p} }}{{\tau_{0R}^{k} }} - a_{0} (f_{0}^{n} + f_{0}^{p} - 1)S_{ph} $$
(1)
$$ \frac{{df_{i}^{k} }}{dt} = \frac{{f_{i + 1}^{k} (1 - f_{i}^{k} )}}{{\tau_{i + 1,i}^{k} }} - \frac{{(1 - f_{i + 1}^{k} )f_{i}^{k} }}{{\tau_{i,i + 1}^{k} }} - \frac{{(1 - f_{i - 1}^{k} )f_{i}^{k} }}{{\tau_{i,i - 1}^{k} }} + \frac{{f_{i - 1}^{k} (1 - f_{i}^{k} )}}{{\tau_{i - 1,i}^{k} }} - \frac{{f_{i}^{k} }}{{\tau_{iR} }} $$
(2)

where integer i (i = 0, 1, … Mk) indicated the energy levels. The superscript k represents electrons if k = n and holes if k = p (in our modeling n = 2 and p = 9). \(f_{i}^{k}\) is the occupation probability of the ith state. a0 is the differential gain and Sph is the photon density. \(\tau_{i + 1,i}^{k}\) is the relaxation time from the i + 1 to the i state and \(\tau_{i,i + 1}^{k}\) is the escape time from the i to the i + 1 state. τiR is the spontaneous radiative lifetime from the ith state. The rate equation for the WL is given by

$$ \frac{{df_{w}^{k} }}{dt} = \frac{I}{{e_{c} VN_{Q} }} - \frac{{f_{w}^{k} (1 - f_{M}^{k} )}}{{\tau_{wM}^{k} }} + \frac{{(1 - f_{w}^{k} )f_{M}^{k} }}{{\tau_{Mw}^{k} }} - \frac{{f_{w}^{k} }}{{\tau_{wR} }} $$
(3)

where \(f_{w}^{k}\) is the WL occupation probability at the band edge, \(\tau_{wM}^{k}\) is the relaxation time from the WL to the Mk excited state, \(\tau_{Mw}^{k}\) is the escape time from the Mk excited state to the WL, τwR is the spontaneous radiative lifetime in the WL, ec is the electron charge, V is the volume of the active layer, and NQ is QDs density. I is the applied current and Sph is the photon density given as (Hakimian et al. 2020)

$$ S_{ph} = S_{ph}^{pump} + S_{ph}^{probe} + S_{ph}^{conj} $$
(4)

where \(S_{ph}^{pump} ,S_{ph}^{probe} ,S_{ph}^{conj}\) are photon density for pump, probe, and conjugate signals, respectively. Photon density for each signal is [4 and 6]:

$$ S_{ph}^{j} = \left| {E_{j} } \right|^{2} $$
(5)

where j = pump corresponds to the pump signal, j = probe to the probe signal,and j = conj to the conjugate signal. E is the electrical field for signals.

FWM in a QD-SOA is constructed where a pump (Epump) and probe (Eprobe) signals are injected to the input facet of the amplifier. The beating between Epump and Eprobe modulates the carrier density and creates the conjugate signal (Eprobe). The rate equation for the total electrical field is (Qasaimeh 2004)

$$ \frac{{\partial E_{j} }}{\partial z} + \frac{{\partial E_{j} }}{{v_{g} \partial t}} = \frac{{E_{j} }}{2}\left[ { - \alpha_{l} + \left( {1 - j\alpha_{H} } \right)g^{j} } \right] $$
(6)

where vg is the group velocity. αH is the linewidth enhancement factor, αl is the waveguide loss. gj is the gain of an active region.

On the other hand, photon density for each signal is:

$$ S_{ph}^{j} = \left| {F^{j} } \right|^{2} + \left| {B^{j} } \right|^{2} $$
(7)

where j = pump corresponds to the pump signal, j = probe to the probe signal,and j = conj to the conjugate signal. F and B are the forward and backward waves for the pump and probe signals. By applying the (4), (5), and (7), with the help of transfer matrix method, one can write the solution of the coupled Eq. (6) as:

$$ \left[ \begin{gathered} F_{\varepsilon + 1}^{j} \hfill \\ B_{\varepsilon + 1}^{j} \hfill \\ \end{gathered} \right] = M_{\varepsilon }^{j} \left[ \begin{gathered} F_{\varepsilon }^{j} \hfill \\ B_{\varepsilon }^{j} \hfill \\ \end{gathered} \right] $$
(8)

By breaking the length of the amplifier into the UBL sections, each section is named \(\varepsilon = 1,2, \ldots ,UBL\) and thus the matrix \(M_{\varepsilon }^{j}\) is:

$$ M_{\varepsilon }^{j} = \frac{1}{{1 - r_{j\varepsilon }^{2} }}\left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {e^{{i\gamma_{j\varepsilon } L}} - r_{j\varepsilon }^{2} e^{{ - i\sqrt {\left( {\Delta \beta_{\varepsilon }^{j} } \right)^{2} - \kappa^{2} } L}} } & { - r_{j\varepsilon } (e^{{i\sqrt {\left( {\Delta \beta_{\varepsilon }^{j} } \right)^{2} - \kappa^{2} } L}} - e^{{ - i\sqrt {\left( {\Delta \beta_{\varepsilon }^{j} } \right)^{2} - \kappa^{2} } L}} )} \\ \end{array} } \\ {\begin{array}{*{20}c} {r_{j\varepsilon } (e^{{i\sqrt {\left( {\Delta \beta_{\varepsilon }^{j} } \right)^{2} - \kappa^{2} } L}} - e^{{ - i\sqrt {\left( {\Delta \beta_{\varepsilon }^{j} } \right)^{2} - \kappa^{2} } L}} )} & {e^{{ - i\sqrt {\left( {\Delta \beta_{\varepsilon }^{j} } \right)^{2} - \kappa^{2} } L}} - r_{j\varepsilon }^{2} e^{{i\sqrt {\left( {\Delta \beta_{\varepsilon }^{j} } \right)^{2} - \kappa^{2} } L}} } \\ \end{array} } \\ \end{array} } \right] $$
(9)
$$ \gamma_{j\varepsilon } = \sqrt {\left( {\Delta \beta_{\varepsilon }^{j} } \right)^{2} - \kappa^{2} } $$
(10)
$$ r_{j\varepsilon } = \frac{ - \kappa }{{\gamma_{j\varepsilon } + \Delta \beta_{\varepsilon }^{j} }} $$
(11)
$$ \Delta \beta_{\varepsilon }^{j} = \delta_{j} - i\frac{{g_{\varepsilon }^{j} }}{2}(1 - i\alpha_{H} ) + i\frac{{\alpha_{l} }}{2} $$
(12)

where L is the length of the amplifier and κ is the coupling coefficient. δ is the initial detuning. \(g_{\varepsilon }^{j}\) is the gain of an active region, as follow:

$$ g_{{}} = \sum\limits_{k = 0}^{q} {g_{densk} (f_{k}^{n} + f_{k}^{p} - 1) = } g_{dens0} (f_{0}^{n} + f_{0}^{p} - 1) + g_{dens1} (f_{1}^{n} + f_{1}^{p} - 1) + \cdots $$
(13)

where q is the number of transitions (in this model q = 3), and gdensk is the gain of photon energy between two identical states in both bands, and is defined as a Gaussian function, as follow (Hakimian 2019):

$$ g_{densk} = g_{k}^{\max } \left( {\frac{{\hbar \omega_{k}^{\max } }}{\hbar \omega }} \right)\exp \left( { - \frac{1}{2}\left( {\frac{{\hbar \omega - \hbar \omega_{k}^{\max } }}{\sigma }} \right)^{2} } \right) $$
(14)

where gkmax is the maximum optical amplification for k-th transition. ħω and ħωk are input signal photon energy and the energy of the photon emitted from k-th transition of the QDs, respectively. σ is the inhomogeneous line broadening parameter. The physical parameters used in the modeling are summarized in Table 1.

Table 1 The physical parameters used in the modeling

3 Modeling of the QD-SOA

3.1 Numerical procedure

The nonlinear dynamics of the carriers and pump/probe effect determines the efficiency of the QD-SOAs. It is necessary to solve the equations from (1) to (3) for modeling the nonlinear dynamics of the carriers numerically. These Equations consist of 15 coupled ODEs corresponding to all energy levels of the CB and VB. Also, the transfer matrix should be solved synchronously. This matrix describes the nonlinear changes of the refractive index, and the gain of the optical signal propagated in the amplifier waveguide in the z-propagation direction in terms of time.

Since the numerical modeling of FWM efficiency in both time and space dimensions is challenging, we use the SLICE technique. In this technique, we divide the amplifier length into equal sections, as shown in Fig. 3 (Hakimian 2019).

Fig. 3
figure 3

SLICE technique for modeling in nonlinear characteristics of QD-SOA

For each section, the optical amplitude, and photon density for pump, probe, and conjugate signal are calculated by using the initial condition, as follow (Qasaimeh 2004):

$$ F^{conj} (0,t) \approx 0 $$
(15)
$$ F^{pump} (0,t) = \sqrt {0.1A(0,t)} $$
(16)
$$ F^{probe} (0,t) = \sqrt {0.01A(0,t)} $$
(17)
$$ A(0,t) = P_{s} \exp ( - \frac{{2t^{2} }}{{\delta_{p}^{2} }}) $$
(18)

where \(P_{s}\) is the maximum input value and \(\delta_{p}\) is the width of the input signal. To \(P_{s}\) simplify (Qasaimeh 2013):

$$ P_{s} = \frac{{N_{D} }}{{g_{0}^{\max } \tau_{0R} }} $$
(19)

where \(N_{D}\) is the volume density of QDs. Then, the calculated values are used in the next section (Hakimian 2019). Finally, by calculating the amplitude of the optical signals for probe and conjugate in the last section of the QD-SOA, the FWM efficiency (ƞ) is calculated as follows

$$ \eta = \frac{{\left| {F^{conj} (t,z = L)} \right|^{2} }}{{\left| {F^{probe} (t,z = 0)} \right|^{2} }} $$
(20)

The numerical modeling of the QD-SOA for nonlinear characteristics is performed by using the SLICE technique. Since the detuning frequency is the difference between the pump and the probe, as follows:

$$ \, f{\text{ = c}}\left( {\delta_{{{\text{pump}}}} - \delta_{probe} } \right) $$
(21)

According to Eq. (21), FWM is calculated for different frequencies by changing \(\delta_{probe}\). The FWM characteristic for the different detuning frequency is shown in Fig. 4. As can be seen, the FWM efficiency decreases by increasing frequency. Figure 4 has a good agreement with other simulated data and even experimental data (Zajnulina 2017) and (Flayyih 2017). At zero detuning frequency, which happened the highest efficiency of FWM in all models, the error rate of our models is less than 5% in comparison with (Zajnulina 2017).

Fig. 4
figure 4

FWM efficiency a for different detuning frequency

Using a Core 2 Duo laptop with 2 GB of memory, the run-time is approximately 5 min. Therefore, the run-time is long and this proposed model is not a good option for the computer-aided-design and the optimal design of the QD-SOAs.

3.2 QD-SOA modeling using ANN

A simple, accurate, and fast artificial neural network (ANN) model for the nonlinear characteristics of the QD-SOA is introduced in this section. Figure 5 showed the ANN model proposed for the QD-SOA. As shown in the figure, the length of the amplifier (L) in cm, waveguide loss (αl) in cm−1, the detuning frequency (f) in GHz, and the coupling coefficient (κ) in cm−1 are chosen as inputs of the network. The output of this network is the FWM efficiency (ƞ) of the QD-SOA in decibels.

Fig. 5
figure 5

ANN model for the QD-SOA

For using ANN as the simulator, a data set of 256 samples is extracted from numerical modeling. Figure 6 shows the range of the generated data with their scattering. This range was selected based on the typical values for the design of QD-SOAs.

Fig. 6
figure 6

The scattered range of input parameters

Three main steps are considered to achieve the ANN modeling in QD SOAs. Those are the training step, validation step, and the test step. The training step implies to adjust the weights and bias of each neuron in the network. By applying the early stopping technique, The validation step occurs to prevent the overfitting problem. The test step determines the accuracy and validity of the designed ANN model.

We use a multilayer feedforward network as a common and popular structure of the ANN in our modeling. A multilayer feedforward network includes an input layer, one or more hidden layers, and an output layer. Our model is a three-layer feedforward network that is consisted of an input layer, one hidden layer, and an output layer. As a consequence of the comprehensive experiments and past research, backpropagation also is the best training algorithm for these networks (Svozil 1997; Abid 2001; Lv 2018).

As shown in Fig. 5, the sigmoid nonlinearity transfer function is used as the transfer function of the neurons in the hidden layer. Different experiments are performed for various network structures, and learning and training functions to have an accurate ANN model. The best model has the least mean square error (MSE). At first, the modeling with one neuron in the hidden layer is implemented to identify the most appropriate training function. Figure 7 shows the MSE of train and test data with one neuron in the hidden layer for different training functions such as one step secant (OSS), gradient descent with adaptive learning rate (GDA), gradient descent with momentum and adaptive learning rate (GDX), levenberg-marquarth (LM), broyden, fletcher, goldfarb quasi-newton (BFG), conjugate gradient with powell/beale restarts (CGB), fletcher-powell conjugate gradient (CGF), polak-ribiére conjugate gradient (CGF), resilient backpropagation (RP), and scaled conjugate gradient (SCG).

Fig. 7
figure 7

The MSE of train and test data for different training function and one neuron in hidden layer

As shown in Fig. 7, MSE of train data for LM is 1.499 which is the least. So, we continue experiments while the training function is considered LM. Then, experiments are repeated to determine the best learning function. As can be observed in Table 2, the least MSE of train and test data is related to GDM for different learning function. In structure n:nh:o, n is the number of input layer neurons that is dependent on the number of network inputs, nh is the number of hidden layer neurons, and o is the number of neurons in the output layer. In this table, the epoch shows the number of iteration the network has stopped.

Table 2 The results of QD-SOA modeling with different learning function

At last, ANN modeling of QD-SOA by LM training function and GDM learning function is repeated for different neurons in the hidden layer. As shown in Fig. 8, the 4:18:1 is the best structure of the ANN model has the lowest training error and test error values.

Fig. 8
figure 8

The results of QD-SOA modeling with different neurons

Figure 9 shows that the obtained results of the designed ANN model are in good agreement with the numerical model. For the proposed ANN, the MSE for the test data is 0.1, which shows acceptable accuracy for the nonlinear issue.

Fig. 9
figure 9

Matching the output of the ANN with the desired output for test data

On the other hand, the run-time of our ANN model using a 2 GHz core 2 Duo computer with 2 GB of memory is less than 1 s, which is much shorter than the time required for numerical methods.

3.3 Design optimization of the QD-SOA for FWM wavelength conversion

After achieving the best ANN model, we present a novel approach to determine the optimal parameters for the QD-SOA to achieve the FWM efficiency using a genetic algorithm. However, the determination of the problem variables and the generation of the fitness function is too important. The problem variables are selected from the four effectual parameters (L,αl, f, and κ) that are named p1 to p4. The cost function is defined as follows

$$ c(x) = \sum\limits_{l = 1}^{4} {w_{i} \times p_{i} } $$
(22)

pi is the variables of the algorithm, and wi is the weight attributed to the variables. The variables are weighed proportionate to their importance by the device designer. Each more significant variable will have a higher weight assigned to it. The goal is to achieve the desired FWM efficiency while minimizing the cost function. For this aim, the fitness function is defined as follows

$$ F.F = v \times f_{1} + c(x) $$
(23)
$$ f_{1} = \left\{ \begin{gathered} 1\begin{array}{*{20}c} {} & {} & {for} & {} \\ \end{array} \eta_{Cal.} < \eta_{Des.} \hfill \\ 0\begin{array}{*{20}c} {} & {} & {for} & {} \\ \end{array} \eta_{Cal.} \ge \eta_{Des.} \hfill \\ \end{gathered} \right. $$
(24)

where ƞDes. is the QD-SOA desired FWM efficiency, and ƞCal. is the calculated FWM efficiency using the designed ANN model. The fitness function consists of two terms. The first term depends on the algorithm error, and the second term is described as the cost function. If the obtained FWM efficiency using the ANN model is higher than the desired FWM efficiency, the first term becomes ineffective, and hence only the cost function term gets effective. Moreover, constant v is the weight that determines the degree of importance of each term of the fitness function.

After performing several experiments, the best value for v is attained. Table 3 shows the results of these experiments. According to Table 3, The optimal value of the parameter v is 10, that both the FWM efficiency reach nearer to the desired efficiency and the value of the cost function less.

Table 3 The results of QD-SOA modeling with different values of v

The best feature of the GA was obtained by changing population size, fitness scaling, crossover, and selection function. The results of these experiments are shown in figures from 10 to 12. Figure 10 shows the cost value as a function of population size under different selection functions. As can be seen, the cost function value is minimized for the reminder selection function and the population size of 40.

Fig. 10
figure 10

Cost value as a function of population size for different selection functions

The value of cost function as a function of population size for different crossover functions is shown in Fig. 11. The cost value is minimized for the two point function and the population size of 50.

Fig. 11
figure 11

Value of the cost as a function of population size under different crossover functions

The value of cost function as a function of population size for different fitness scaling is shown in Fig. 12. The cost value is minimized for the rank fitness scaling and the population size of 50.

Fig. 12
figure 12

Cost as a function of population size for different scaling functions

Therefore, the best optimization of QD-SOA is performed by the population size of 50, reminder selection function, rank fitness scaling function, and two point crossover. The results of optimizing the QD-SOA parameters for the desired FWM of − 4.55 dB with weight [5 1 5 2] and v = 10 using the designed GA are shown in Table 4.

Table 4 Results of optimal design for FWM = − 4.55 dB

As can be seen, the optimal values of four effectual parameters (according to the degree of importance of each parameter considered by the designer) are [2.445312 2.799823 0.001471 50.004119] using the GA to achieve the desired FWM. Indeed, of all the designs, the case with the least cost function is the optimal design.

4 Conclusion

In this paper, we first performed a numerical simulation for characteristics of FWM in a QD-SOA based on slice technique and pump/probe measurement experiments. By collecting the required data from this numerical simulation, a feedforward artificial neural network (ANN) model has been proposed, which is simple and requires a short computational time compared with numerical simulation. The inputs of the ANN model are the parameters of the QD-SOA include the length of the device, waveguide loss, detuning frequency, and coupling coefficient, and the output of this model is the FWM efficiency. The designed ANN model with an 18-neuron hidden layer and the backpropagation algorithm was characterized by very well accuracy and validity for a wide range of the input and output parameters. Then, we proposed a new technique to design the optimal parameters ​​of the QD-SOA for FWM wavelength conversion using the obtained ANN model and the genetic algorithm (GA). In this technique, the QD-SOA designer determines the degree of importance of each parameter as the cost function, and the optimal design is obtained for the desired FWM efficiency finally. The best function of selection, crossover, and fitness scaling was determined by performing different and several experiments for the designed GA. The proposed intelligence approaches demonstrated high accuracy and too short computational times, make them suitable for the modeling and the optimal design of the QD-SOAs.