Extending the Entity-Attribute-Value Model

Nadkarni, Prakash M.

doi:10.1007/978-0-85729-510-1_11

Prakash M. Nadkarni M.D.²

Part of the book series: Health Informatics ((HI))

985 Accesses

Abstract

In a previous chapter, I introduced the use of the Entity-Attribute-Value model for representing highly variable clinical data. EAV is appropriate for clinical data because the universe of clinical attributes that can apply to a patient are both numerous and sparse. I emphasized that the use of the EAV model is unworkable in the absence of a supporting metadata infrastructure that supports validation and presentation and, of course, code that operates on this metadata.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

PostGreSQL PostGreSQL documentation. Available from: www.postgresql.org/docs/. [cited 10/5/10], 2010.
World Wide Web Consortium. SPARQL query language for RDF. Available from: http://www.w3.org/TR/rdf-sparql-query/. [cited 11/1/10] 2008.
Nadkarni PM, Marenco L, Chen R, Skoufos E, Shepherd G, Miller P. Organization of heterogeneous scientific data using the EAV/CR representation. J Am Med Inform Assoc. 1999;6(6):478-493.
Article PubMed CAS Google Scholar
Nadkarni P, Marenco L Easing the transition between attribute-value databases and conventional databases for scientific data: Proceedings of the AMIA Fall Symposium, Washington, DC, Hanley & Belfus; 2001:483-487.
Google Scholar
Marenco L, Tosches N, Crasto C, Shepherd G, Miller P, Nadkarni P. Achieving evolvable web-database bioscience applications using the EAV/CR framework: recent advances. J Am Med Inform Assoc. 2003;10(5):444-453.
Article PubMed Google Scholar
Marenco L, Nadkarni P, Skoufos E, Shepherd G, Miller P. Neuronal database integration: the Senselab EAV data model: Proceedings of the AMIA Symposium; 1999:102-106.
Google Scholar
Kimball R, Caserta J. The Data Warehouse ETL Toolkit. New York: Wiley Computer Publishing; 2008.
Google Scholar
Slezak T, Branscomb E. An integrated browser for chromosome 19 physical mapping data: The Cold Spring Harbor Meeting on Genome Mapping and Sequencing. Cold Spring Harbor, New York; 1992:220.
Google Scholar
Friedl S SQL injection attacks by example Available from: http://unixwiz.net/techtips/sql-injection.html. [cited 11/5/10] 2010

Download references

Author information

Authors and Affiliations

School of Medicine, Yale University, Cedar St. 333, 06510, New Haven, Connecticut, USA
Dr. Prakash M. Nadkarni M.D.

Authors

Dr. Prakash M. Nadkarni M.D.
View author publications
You can also search for this author in PubMed Google Scholar

Appendices

Appendix 11.1: Metadata Schema Details

The text in this section comprises a detailed annotation of Fig. 11.1.

11.1.1 Meta_Classes

Class ID: Surrogate Integer Primary Key.
Class Name, Caption, User Description, Developer Description, Datetime_Created, DateTime_LastModified: Mostly self-explanatory. User Description is presented in automatically generated end-user help. Developer Description contains developer-oriented documentation. For physical tables, the Class Name is the physical table name.
Class Type: One of: View, Physical Table, EAV, Hybrid.
Row-level security required: If true, every row in a class instance is stamped with the user who created it - only this user can alter a given record, others can only look at it.
Primary Key Expression: The comma-separated list of fields that comprise the table’s primary key.
Is Autonumber Primary Key: If true, the primary key is based on an auto-numbered artificial key.
Current Version Number: An integer incremented each time the class is modified.

11.1.2 The Meta_Attributes Table

Contains one entry for every field in every table in the TrialDB schema (including itself), so as to make the database self-describing. Note that many of the columns that describe an attribute are identical in function (and therefore, have the same name, as the validation/presentation-information columns in the Parameters table of chapter 5.

The columns are described below.

Attribute_ID: Artificial Long Integer Primary Key.
Attribute Name, Caption, User Description, Developer Description, Datetime_Created, Datetime_LastModified: As for Meta_Classes.
Attribute Representation: This value is editable (and consulted by the software) only if the attribute’s parent class type is Hybrid. Can be one of: Physical Column or EAV. (If the parent class type is a Physical Table, then a representation of Physical Column is implied: if EAV, then an EAV representation is implied.)
Class Id: Foreign key into the Meta_Classes table. Refers to the Class of which the Attribute is a member.
Data type: The logical data type of the attribute (e.g., integer, decimal,string, class, etc.).
Schema_datatype: For attributes that are physical columns, the data type as defined in the DBMS (e.g., varchar(255)).
Physical_Order_Sequence_Number: For physical tables and views, the position of the column within the table.
Sequence_Number: The order in which this attribute is presented when inspecting members of the class. This may be different from physical order.
Attribute_Class: This is not null only if the Attribute’s Data Type is Class (and the attribute’s data is represented in EAV form): it records the ID of the Class to which the referenced object must belong. In such a case, data for this attribute is stored in the EAV_Objects table.
Attribute Group: A string that serves as a grouping label. Attributes with the same label are placed on a page on a multi-tabbed section of the form: the string also doubles as the label on the tab. If no string is supplied, the attribute is placed in a non-tabbed section of the form, so that it is constantly visible.
Present_In_Listview: If true, the attribute should be shown in List view.
URL Template: a string used to generate a hyperlink to an external data source: see the discussion of PubMed ID earlier. The placeholder in the template (where the value of the attribute is inserted) is indicated by a vertical bar. If the URL Template is not null, when the attribute is in display mode, it is displayed as a hyperlink.
Class_Select_Method How are instances of the class accessed- by pull-down menu or through a search form? Use the latter if the number of instances is large.
Part_Of_Search_Fields If true, this attribute is shown if the class is displayed in a pull-down or list box if a class member is to be searched for and selected.
Searchable: Is the field important enough to be commonly searched by? If so, its content is incorporated in a full-text index for that class.
Multi_Instance In our own design, this is applicable only when the data type is Object. If true, this allows an array of objects. For multi-instance attributes, we also record the integer values of Minimum Repeat Count and Maximum Repeat Count, to specify the minimum and maximum number of times this particular attribute must occur. (If not specified, the number of occurrences is indefinite.)
Is_Stored_Encrypted If true, the field is stored encrypted. Certain fields related to Personal Health Information are stored this way: they are decrypted/encrypted in the middle tier.

Choice_Set_ID, Mandatory, Default_Value, Upper_Bound, Lower_Bound, Upper_Bound_Warning, Lower_Bound_Warning, RegExp, RegExp_Error_Message, Allow_Future_Dates, Conditonal_Choice_Set, Max_String_Length, BLOB_File_Extension, Max_BLOB_Size_KB, Computed_Formula, Display_Width, Display_Height, Orientation, ReadOnly, Invisible: Validation and Presentation-related columns. See the description of these columns in the narrative accompanying Fig. 5.2 of Chap. 5. (Most of these are far more important than some of the fields described earlier: in the metadata- browsing interface, several of these would be presented in list view. I’ve just placed them at the end to avoid repetition.)

Appendix 11.2: Alternative Means of Representing EAV Data

11.2.1 Using an XML Column for Sparse Attributes

For the special case of hybrid classes, the database/XML guru Dejan Sarka, in the very instructive text “Inside Microsoft SQL Server 2008: T-SQL Programming” (Itzik Ben-Gan et al., Microsoft Press) illustrates that XML can be used as an alternative to EAV. This is not surprising: XML is a kind of attribute-value data representation that is based on structured text rather than relational tables. The high-end RDBMSs support XML as a native data type (including the ability to associate a specific XML schema with a specific column for validation purposes).

Sarka’s solution depends on first defining a temporary table with all the columns that you would consider using EAV for, associating each column with the necessary validation constraints (which are defined using your vendor’s dialect of SQL) and then using your database’s XML-schema-generating capability to define an XML equivalent of this table. Then this table is dropped, and the physical table that forms the core of your hybrid class is modified by adding an XML column, whose associated schema is then based on the one just generated.

This solution is undoubtedly much more CPU-efficient than the EAV approach for displaying the content of a single object – it is not scattered in multiple type-specific tables, but is right there in the same row of the physical table. While the contents of an XML column can be indexed, the indexes are nowhere as efficient or compact as those for traditional columns, so that in circumstances where cross-object query based on the sparse attributes would be important, it would arguably be less efficient.

However, the XML is only an alternative storage mechanism: this solution does not bypass the need to describe the sparse attributes in metadata. It does not solve the user-interface generation problem (including presenting the data in a friendly fashion and allowing its editing through forms). By the same token, without metadata, it does not address the issue of validating the user’s input and providing end-user-comprehensible error messages (as opposed to the non-informative computer-ese that would be emitted by default if the XML content failed validation).

A programming issue that you need to be aware of is that, when you are retrieving heterogeneous data using a SQL generator that considers how individual classes are represented, you will have to generate XQuery code instead of SQL. Code generation may be considerably more complicated unless you use a standard design approach for structuring your XML. Further, individual vendors’ implementations of XQuery vary with respect to which features of the XQuery 1.0 standard are/are not supported. Finally, while an implementation will allow you to access the contents of non-XML columns in your XQuery code, the functions that will let you do so will necessary be vendor-specific.

11.2.2 Microsoft SQL Server Sparse Columns

Another approach to modeling hybrid classes is Microsoft SQL Server 2008s sparse column mechanism. Columns with a basic (string, numeric, datetime) data type (e.g., numeric, varchar or datetime columns) can be designated as sparse: up to 30,000 columns per table can be created this way. Sparse columns take up zero space for NULL valuesm and only rows containing values are indexed. When listing the contents of a table containing sparse columns with a “SELECT * ” statement, all sparse columns’ contents are concatenated into a chunk of XML, where each value is sandwiched within open-and close-column-name tags.

Once again, as in the case of XML, sparse columns are only a storage mechanism. Do not delude yourself into thinking that they will save you the task of defining metadata: as a matter of fact, SQL constraints (i.e., any validation other than type checking) cannot be defined on sparse columns (simply because Microsoft says so). Unless you define such constraints in metadata, and implement them in software, using sparse columns “as is” is an invitation to trouble. In fact, I consider the current implementation of sparse columns as a poorly-conceived design trap for the naive.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nadkarni, P.M. (2011). Extending the Entity-Attribute-Value Model. In: Metadata-driven Software Systems in Biomedicine. Health Informatics. Springer, London. https://doi.org/10.1007/978-0-85729-510-1_11

Download citation

DOI: https://doi.org/10.1007/978-0-85729-510-1_11
Published: 29 April 2011
Publisher Name: Springer, London
Print ISBN: 978-0-85729-509-5
Online ISBN: 978-0-85729-510-1
eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics