“If there is magic on this planet, it is contained in water.” Loren Eiseley, The Immense Journey (Random House, New York, 1957).

Introduction

As an integral part of living organisms, water is responsible, due to its unique hydrogen bonding capacity, for much of the characteristic structure and chemistry of biomolecules.1 Water is required for most globular proteins to fold into their native conformations, and is responsible for the formation of lipid bilayers, micelles, and vesicles. When a solute molecule is in an aqueous environment, its functional groups must interact with the inherent structural requirements of the surrounding water, and its presence can impose a structuring pattern on the adjacent solvent molecules which differs from that of pure bulk water. Such solvent structuring is of critical importance in determining the properties of aqueous solutions.24 In practice, it has been very difficult to directly determine the nature of this structuring. The organization of solvent water molecules around a particular solute will, in general, involve both positional and orientational correlations with the specific chemical architecture of the solute, and thus will vary in its details from one molecule to another. For biological solutes, the structuring imposed on water can be quite complex due to the complicated mix of chemical functionalities found in typical biopolymers, which frequently have polar or hydrogen bonding functional groups juxtaposed in close proximity to non-polar groups.59 There is a clear need for experimental techniques that can directly probe this structuring in atomic detail. Such knowledge could then be used in a number of practical applications, ranging from drug design and vaccine development to increasing the stability of proteins and other polymers. Traditionally, neutron diffraction measurements have been the most powerful experimental method for characterizing liquid structuring in aqueous solutions, but for biological solutions, the scattering data can be too complex to be usefully interpreted. We describe here two new approaches to the interpretation of neutron diffraction data using MD simulations of identical systems that extend the ability of diffraction experiments to separately examine both the inter- and intramolecular structure of biological solutions.

Methods

The average structure of simple liquids is traditionally discussed in terms of spherically averaged radial distribution functions g(r),10,11 defined as

$$ g{(}r{)} = \frac{1}{{{4}\pi \rho {r^{{2}}}}}\frac{{dN{(}r{)}}}{{dr}} $$
(1)

where ρ is the bulk density and g(r) represents the probability of finding another atom as a function of the distance r away from a particular atom.12,13 The radial distribution function is fundamental to the discussion of the properties of simple liquids since it can be used to calculate the thermodynamic functions of the liquid such as the free energy,10,13 and since g(r) can be determined from diffraction experiments. The problem is more complex for a polyatomic liquid like water, however. Furthermore, for polyatomic solutes, and particularly complex biological solutes, the solvent structures differently around each atom of the solute, and each has its own distribution function.14 In addition, the close proximity of these atoms in the solute molecule means that these functions are not isotropic, but rather are typically asymmetric, and that the structuring around one atom contributes to the structuring around adjacent atoms as well. Such structuring can be very complex, and averaging over all angles at each radial distance would obscure its anisotropic character.

Computer Simulation

Computer simulations of aqueous solutions are capable of capturing the anisotropic character of this solvent structuring around complex polyatomic solutes in complete detail. Molecular mechanics studies, such as MD simulations or Monte Carlo calculations, directly model the properties of molecular systems using an assumed knowledge of the way in which the energy of the system varies with the atomic coordinates. These simulations, and particularly MD simulations, have been very useful for studying both biological molecules1517 and liquids,12,13 and have been extensively applied to aqueous solutions.6,8,14,1829 Such simulations have examined fully spatially resolved structuring in liquids3032 and biological solutions,6,7 providing a direct picture of the non-uniform distribution of molecules in three dimensions and the locations of preferred positions for “first shell” solvent molecules. Measuring such structuring experimentally, however, has proven more difficult.

Neutron Scattering

Neutron diffraction measurements have generally been one of the most powerful experimental methods for probing the structure of liquid solutions. Such experiments are extremely information rich, since they produce structure factors corresponding to the correlation of every atom in the system with every other atom. In the case of complex polyatomic solutes in water, however, this information density becomes a handicap, since the correspondingly complex diffraction pattern becomes impossible to assign and interpret, and thus becomes effectively almost content free. The scattering intensities measured in these experiments are the summed scattering of all atoms in the system, so that for complex systems such as proteins or carbohydrates in water, the resulting intensity is highly averaged, representing the scattering of many different atoms, including atoms of the same element with different local environments.33 To address this problem, the technique of neutron diffraction with isotopic substitution (NDIS)34,35 was developed as a powerful method for the simplification of diffraction spectra.

A conventional neutron diffraction measurement provides only the neutron-weighted average structure factor F N(Q),

$$ {F^N}(Q) = \sum\limits_{{ij}} {\frac{{{c_i}{c_j}{b_i}{b_j}}}{{{{\left\rangle b \right\langle }^2}}}} {S_{{ij}}}(Q) $$
(2)

where c i and b i are the atomic concentration and coherent scattering length of element i, <b> is the average scattering length over all atoms in the sample, and S ij (Q) is the partial structure factor for the element pair (i,j) which could be Fourier transformed to give the radial distribution function g ab (r). Structural correlations between all pairs of elements enter into the average F N(Q). As the number n of elements in the sample gets larger, these correlations become increasingly difficult to disentangle. The NDIS method partially resolves this difficulty34,35 by performing paired experiments in which one atom type is replaced by an isotope of the same element with a different neutron scattering length. Subtracting the structure factors for these two experiments then eliminates any correlations that do not involve the substitution labeled atom.

For example, if the isotope substitution aa′ can be carried out on a particular element a, in general the scattering length b a of that element will change. If two measurements are then made with chemically identical samples but with different isotopes of element a, the difference of the two measured structure factors gives a more restricted average,

$$ {\Delta_a}(Q) = \frac{{{{\left\rangle b \right\langle }^2}{F^N}(Q) - {{\left\rangle {b\prime} \right\langle }^2}F{\prime^N}(Q)}}{{{{\left\rangle b \right\langle }^2} - {{\left\rangle {b\prime} \right\langle }^2}}} = \sum\limits_j {\frac{{{c_j}{b_j}}}{{\left\rangle b \right\langle }}} {S_{{aj}}}(Q) $$
(3)

where the primes refer to the measurement with the sample containing the substituted isotope. Since the difference structure factor Δ a (Q) is an average over only the n terms involving element a, as opposed to the full number n(n + 1)/2 of terms involving all element pairs, it is generally much easier to distinguish structural elements arising from the different pairs. Moreover, only pairs involving the element of interest are involved. For example, in the simple case of NaCl in water, there are ten structure factors, for O–O, H–H, O–H, Na–O, Na–H, Cl–O, Cl–H, Na–Na, Cl–Cl, and Na–Cl. If, however, one did a 35Cl/37Cl isotopic substitution for the Cl atoms, leaving all of the other atoms as the naturally occurring isotopes, and measured the difference in the scattering between the two solutions, all those terms that did not depend on the Cl would subtract out, leaving only four structure factors contributing to the intensity difference. This approach is known as the “first difference method.”36 Most significantly, at concentration below ∼3 m, Δ a (Q) is dominated by the chlorine–oxygen and chlorine–hydrogen partial structure factors, thus providing accurate information on the hydration of this ion. The difference function can also be transformed into a real space radial distribution function summed over all pairs, \( G_a^X(r) \). If a particular peak in \( G_a^X(r) \) can be identified with a specific atom pair (a,b), the corresponding coordination number can be directly obtained from the integral over the peak.

NDIS experiments have been the most useful for studying water structuring around simple solutes such as rare gases,37 mono-atomic ions,38,39 or approximately spherical molecular solutes.40 In recent years, diffraction methods have been applied to more complex molecular species, including tert-butyl alcohol,41,42 ethanol, phenol, d-glucose,43, and amino acids.4446 Unfortunately, in the case of large asymmetric biological solutes like sugars, containing multiple atoms of the type being isotopically substituted, with each of these atoms in a different local environment, the structure factor still remains too complex to be interpreted even in a conventional NDIS experiment. This limitation has presented a critical barrier to the use of neutron diffraction experiments to describe solvent structuring around complex biological solutes. However, if an experimental sample is synthesized in which only a single atomic position in the molecule is substituted, such as is shown in Figure 1, then properly designed NDIS experiments for this labeled solute and the unlabeled, natural abundance molecule, in both H2O and D2O, can produce the structure factor and radial distribution function for atoms around just the labeled position. The next generation detectors now available at state-of-the-art facilities such as the Institut Laue-Langevin (ILL) in Grenoble permit significantly higher resolution measurements to be made, making single-atom substitution labels practical, and thus allowing more detailed information about the structuring about individual atoms to be extracted.

Fig. 1
figure 1

The use of neutron diffraction results to probe the structure of an aqueous solution of D-xylopyranose (shown as an inset in a). a Simulation data illustrating why it is difficult to extract directly the solvent structuring around a specific position in the sugar (in this case, the H4 position labeled in the corresponding neutron diffraction NDIS experiment). The function g HsubHex(r) as calculated from MD is shown by the top black line. The subcomponents of this function due to the different types of exchangeable hydroxyl protons Hex are also shown; the contributions for Hex on water (green line), HO4 (red), HO3 (blue), HO2 (gray), and HO1 (purple) are illustrated. The inset shows the β-d-xylopyranose molecule, with the sphere indicating the deuterium-labeled H4 position, and showing the possible rotamers of the OH4 hydroxyl. b In the left panel: the structure factor SHsubHex(Q) as calculated from the MD simulations and determined from NDIS experiments. a: SHsubHex(Q) as calculated from the unconstrained simulation; 300°, 180°, and 60°: as calculated from the simulations with the H4–C4–O4–HO4 torsional angle constrained to each of the specified values. In each case, the experimental NDIS data is overlain in gray. In the right panel, the difference resulting from subtracting the NDIS data from the MD predictions. The 180° case gives a particularly poor fit over the reliable portion of the NDIS data, and can thus be excluded as a possible conformation

In such a specialized NDIS experiment, even with only one labeled position, the scattering function can still be too complex to interpret without help in the assignment of the features. For this reason, some form of theoretical model becomes essential for understanding the information contained in the diffraction from such solutions.47 This help can come from MD simulations because of the ability to examine the trajectories in such detail as to be able to determine the correlations which give rise to each feature. These structure factors can then be used heuristically to identify which possible calculated structures best matches the experimental data.

Results

The basic approach used has been to conduct realistic MD simulations of exactly the same system being studied experimentally. This method thus differs from the EPSR method of Soper and co-workers,48 since the theoretical model is a completely independent prediction of the expected scattering, with all of the relevant energetic parameters developed and validated separately, without using the neutron diffraction experiments under analysis.4953

This approach has been used in an extensive series of NDIS experiments conducted on specifically labeled sugars5458 and other biological solutes, along with small-angle neutron scattering experiments59 on solutions suspected of exhibiting large-scale aggregation. Each system studied was also modeled using MD simulations to provide the data necessary to interpret the scattering data. A new procedure for separating the intermolecular and intramolecular contributions in neutron scattering data has also been developed.

Using NDIS experiments on specifically synthesized sugar samples in which one particular non-exchangeable hydrogen atom is labeled by an H/D substitution, it has been possible to extract the radial distribution function for atoms around just that labeled atom.57 No previous neutron diffraction studies of complex molecular solutes have achieved such specific resolution. The experimental structure factors contain information about both the intramolecular atomic distribution as well as about the distribution of solvent molecules. The nature of the experiments, which have inherently small contrasts due to the fact that only one atomic position in a sugar is labeled, requires a very intense neutron source and sensitive and stable detectors. As a result, reactors like the one at the ILL with its D4 and D20 instrument, or the facilities of the Spallation Neutron Source at Oak Ridge, or the Target Station 2 at ISIS, are required. Experiments have been conducted for a series of d-glucose and d-xylose samples specifically labeled at each of their non-exchangeable proton positions. The doubly deuterated amino acid glycine and phenols doubly labeled at H3 and H5 and at H2 and H6 have also been studied. However, an unexpected difficulty was encountered, which required the development of new analysis procedures. The basic difficulty encountered was that even in a singly labeled NDIS experiment, carried out in both H2O and D2O, to give double difference structure factors and radial distribution functions, the residual scattering data is dominated by intramolecular correlations, which can obscure the desired intermolecular structure.

Figure 1 illustrates this point most effectively with data from MD simulations for d-xylose, which, unlike the experiment, can be resolved into their individual components in complete details. For the indicated H4-substituted case, the solvent first peak, shown in green, is swamped by the intramolecular correlations, and in general the most prominent peaks in the total radial distribution functions extracted from the experimental structure factors are intramolecular correlations. Clearly, this problem potentially limits the determination of intermolecular structuring. An approach has been developed to overcome this difficulty, which is described below. However, we have also exploited this sensitivity to intramolecular correlations to develop a new type of structural probe for molecules with conformational flexibility. For example, consider the d-xylose molecule shown in the inset of Figure 1. Clearly, some of the intramolecular distances in this molecule, such as the H4–C3 and H4–C5 correlations, are approximately fixed, apart from vibrational oscillations, relative to the labeled H4 aliphatic proton. Others, however, such as the distance from H4 to the exchangeable proton HO4, vary with conformation as the hydroxyl group rotates. This variation can thus be exploited to yield a measure of the rotameric conformation in solution. By performing a series of constrained MD simulations for each hydroxyl conformer, the resulting computed structure factors or radial distribution functions can be compared to the experimental results to determine which conformation best reproduces the experimental data. In the case of the H4-labeled xylose example, the diffraction data was found not to be consistent with a conformation with the hydroxyl proton trans to the labeled H4 atom. However, because of the near degeneracy of the two gauche positions with respect to H4, this experiment could not distinguish between these conformers. A subsequent experiment in which the two non-exchangeable hydrogen atoms on the C5 carbon were substituted with deuterium allowed the experimental conformation to be identified as a mixture of the trans-C5 and trans-C3 conformers in the approximate ratio 75%:25%.58 Such conformational determination experiments will be referred to as conformational analysis using neutron diffraction with isotopic substitution experiments or CANDIS. This heuristic CANDIS approach has been used to successfully determine the conformations of several groups in aqueous solution,5658 including the exocyclic hydroxymethyl group d-glucose.57 No previous neutron diffraction experiments have used such approaches to extract conformational information about a large solute molecule in aqueous solution, and these results serve to demonstrate and validate this newly developed technique.

The separation of the intramolecular and intermolecular correlations in the data from a scattering experiment, even in a singly labeled NDIS experiment such as just described, is a difficult problem. However, a method has been developed that allows this separation.60 This approach involves performing experiments at different concentrations and is based on the fact that the multiplicative prefactors for the intermolecular and intramolecular contributions to the total radial distribution function have different concentration dependences.60 That is, while the intermolecular contribution varies with concentration, the intramolecular contribution remains constant. Exploiting this intramolecular coordination number concentration invariance (ICNCI), a pair of first-order NDIS experiments of natural abundance and labeled solutes in aqueous solution can be carried out at two different concentrations and used to extract information about the intermolecular radial distribution difference function around the labeled atom. MD simulations of the system under the same conditions can then be used to guide in the interpretation of the peaks.

The first application of this newly developed ICNCI method for disentangling inter- and intramolecular correlations in experimental data was to pyridine in aqueous solution.60 Experiments were conducted on natural abundance and perdeuterated pyridine at two different concentrations, 1.0 and 5.0 m. It can be shown that, in practical terms, a simple weighted subtraction of two first-order difference total radial distribution functions \( G_{{{H_{\rm{non}}}}}^X(r) \) yields the ICNCI function \( \Delta G_{{{H_{\rm{non}}}}}^X(r) \) which, ignoring a structureless offset, will eliminate all intramolecular correlations,

$$ \Delta G_{{{H_{\rm{non}}}}}^X(r) = \frac{{{\rho_2}}}{{{c_{{{H_{{{\rm{non}}2}}}}}}}}G_{{{H_{\rm{non}}}}}^X{(r)_2} - \frac{{{\rho_1}}}{{{c_{{{H_{{{\rm{non}}1}}}}}}}}G_{{{H_{\rm{non}}}}}^X{(r)_1} $$
(4)

Since \( \frac{{{\rho_2}}}{{{c_{{{H_{\rm{non}}}_2}}}}}G_{{{H_{\rm{non}}}}}^X{(r)_2} \) may be back-transformed into Q space to yield \( \frac{1}{{{c_{{{H_{\rm{non}}}_2}}}}}\Delta S_{{{H_{\rm{non}}}}}^X{(Q)_2} \), and similarly for solution 1, the difference function \( \Delta G_{{{H_{\rm{non}}}}}^X(r) \) can be expressed in Q space as

$$ \Delta \Delta S_{{{H_{\rm{non}}}}}^X(Q) = \frac{1}{{{c_{{{H_{\rm{non}}}_2}}}}}\Delta S_{{{H_{\rm{non}}}}}^X{(Q)_2} - \frac{1}{{{c_{{{H_{\rm{non}}}_1}}}}}\Delta S_{{{H_{\rm{non}}}}}^X{(Q)_1} $$
(5)

where \( \Delta \Delta S_{{{H_{\rm{non}}}}}^X(Q) \) represents the experimental information that contains no correlations that do not vary with concentration; that is, only intermolecular correlations. Figure 2 displays these functions from the pyridine experiments, compared with the simulation data. As can be seen, the agreement between the experiment and simulations is quite good. As was known from previous studies, the pyridine was found to aggregate in aqueous solution as could be seen both in the MD and experimental data.

Fig. 2
figure 2

The comparison of neutron diffraction data for pyridine in water with the results of MD simulations. Left, the function \( \Delta \Delta S_{{{H_{\rm{non}}}}}^X(Q) \); right, the function \( \Delta G_{{H{\rm{sub}}}}^X(r) \). In both, the blue curve is the prediction from MD, and the raw experimental data is shown in black, while the experimental function after removal of the Placzek effect, the application of a smoothing function, and the cropping of the data at 8 Å−1 in reciprocal space, is shown in red. Note, in particular, that the peak at 7.5 Å, primarily related to the longer range aggregation of pyridine, is reproduced by the molecular dynamics

Figure 3 illustrates density contours calculated from the associated MD simulations, demonstrating the manner in which pyridine molecules associate in aqueous solution. As can be seen from the positions of the separate density clouds for the aromatic protons and the carbon atoms, the pyridine molecules in these simulations aggregate in a “T”-shaped geometry. Water molecules tended to hydrate the pyridine by hydrogen bonding as a donor with the ring nitrogen atom, although those other first-neighbor water molecules that could not make such hydrogen bonds tended to turn their negatively charged oxygen atoms toward the weakly positive pyridine protons (Figure 3).

Fig. 3
figure 3

Contours of the density of pyridine carbon atoms (in orange), hydrogen atoms (in white), and water oxygen atoms (in red) around a central pyridine molecule, as calculated from MD simulations. The contour level corresponds to 2.9× bulk density

Conclusions

With advances in neutron sources and detector stability and sensitivity, it has become possible to conduct neutron diffraction with isotopic substitution experiments in which single atomic positions in complex biological solutes are specifically isotopically labeled. Combined with realistic computer simulations of the same systems, it becomes possible to identify and analyze specific peaks in terms of their structural origins. Such experiments can then be used to not only probe the conformation of the solute in aqueous solution (CANDIS), but also to isolate the intermolecular correlations (ICNCI). This separation of contributions to the scattering structure factors allows high-resolution neutron diffraction experiments to be used to study such questions as how solvation changes conformational equilibria, as well as how solute molecules aggregate in an aqueous environment. For example, diffraction data at different concentrations and the ICNCI method, combined with MD simulations, confirm the already known tendency of pyridine to aggregate in water, and to associate via a “T”-shaped geometry. Hopefully, many other such applications will be possible, opening up a range of new possibilities for using neutron diffraction data in the study of biological aqueous solutions.