Get PDF Developing High Quality Data Models

Free download. Book file PDF easily for everyone and every device. You can download and read online Developing High Quality Data Models file PDF Book only if you are registered here. And also you can download or read online all Book PDF file that related with Developing High Quality Data Models book. Happy reading Developing High Quality Data Models Bookeveryone. Download file Free Book PDF Developing High Quality Data Models at Complete PDF Library. This Book have some digital formats such us :paperbook, ebook, kindle, epub, fb2 and another formats. Here is The CompletePDF Book Library. It's free to register here to get Book file PDF Developing High Quality Data Models Pocket Guide.

  • Works Cited?
  • Stay ahead with the world's most comprehensive technology and business learning platform.?
  • International Trade Law, 4th Edition.
  • The Organizational Network Fieldbook: Best Practices, Techniques and Exercises to Drive Organizational Innovation and Performance.
  • DocBook XSL: The Complete Guide (4th Edition).
  • January 2011?
  • Developing High Quality Data Models - 1st Edition.

Zoom Zoom. Availability Usually despatched within 2 weeks. With Free Saver Delivery. Facebook Twitter Pinterest Share.

Data modeling

Description Also available on eBook. Click here to purchase from Kobo. About the Author Matthew West spent over 20 years as a leading data modeler for Shell where he was a key technical contributor to data modeling and data management standards and their application. Matthew was responsible for Shell's Downstream Data Model.


He currently serves as the Director of Information Junction, a data architecture and analysis consultancy in the UK. Matthew is a Visiting Professor at the University of Leeds. Free Returns We hope you are delighted with everything you buy from us. By the way, the first prerequisite to data model quality is simply a passion for data modeling. If you are not a fanatic about making sure that each attribute and entity type has a precise definition, that each box is properly aligned, and that each symbol is in exactly the right position, you will never produce a great model, no matter what technique or approach you use.

At the end of the article, we will discuss how to go about producing a quality data model. The first of these specifically addressed data model quality. The first principle goes to the heart of the meaning of an entity type: An entity type in a conceptual model is a thing of significance to the organization, about which it wishes to keep information. It is the thing itself, not the thing playing a role. Unfortunately, if you look through your interview notes looking for nouns as candidate entity types, many of the nouns will combine references to things with the roles they play.

Figure 1 shows two common models, describing customer and vendor: Each customer may be a buyer in one or more sales orders; Each vendor may be a seller in one or more purchase orders. These models raise a question, however: what if the same person or organization is both a customer and a vendor?

  • The New Encyclopedia of Southern Culture: Volume 2: Geography.
  • Developing High Quality Data Models_百度文库.
  • Analytical linguistics.
  • Mathematical and Physical Data, Equations, and Rules of Thumb?
  • Customer Reviews.

Or, what if a division of your own company is a customer? Figure 2 shows an alternative. This model actually addresses two issues. First, it recognizes that we will often have relationships that apply to either a person or an organization. By defining the super-type party, we can talk about the groups of people and organizations together. Second, we have generalized the concept of sales order and purchase order to simply, order. This recognizes that the two kinds of orders are in fact the same thing.

Data Modeling - Building a Data Model (Part 1)

The only difference is whether the viewer of the model is buying or selling. More than that, by explicitly recognizing that every order has both a buyer and a seller in it, we have also recognized that we are a party, just like all those others we deal with. This turns out to be a useful thing. Indeed, the values of our attributes can be stored just like those for everyone else. By the way, note that if you are successful in creating a model of true entity types, and have correctly assigned single-valued attributes to them, you will have by definition created a fully normalized model.

To say that all attributes of an entity type are a function of the key, the whole key, and nothing but the key is to say that they are in fact attributes of this thing. By definition, a relational database consists of flat, two-dimensional tables only. Among other things, it allows you to show that the concrete things most people see are examples of more general concepts.

The informal justification for a sub-type is to illustrate the kinds of things represented by the super-type, even if there are no differences in their attributes or relationships. This can be very useful in the presentation of a model, to clarify the meaning of the super-type. Figure 3 shows examples of both of these uses. It represents the kinds of facilities that constitute an oil field.

The kinds of subsurface facilies the stuff in an oil well have distinct attributes, as well as specific relationships among them that could not be expressed if they were not broken out into sub-types. Specifically, each well may be composed of one or more completions, and each well may be composed of one or more wellbores. On the other hand, however, there seems to be no formal justification for the sub-types of surface facility.

As nearly as we can tell from the figure, the attributes of a steam generator, a dehydration plant or any other surface facility are all the same. In fact, further analysis may in fact show different attributes for the different sub-types, but even if it did not, this would still be a reasonable way to present surface facility. The first draft of a model is likely to contain many-to-many relationships. Indeed even many of the one-to-many relationships, upon further reflection, turn out to be many-to-many relationships.

Normalization will require you then to define associative intersect entities to account for them.

WikiZero - Data model

Figure 4 shows employment as a relationship between person and organization. The only problem with this is that it represents considerably more company loyalty than is normally the case. This results in Figure 5. As it happens, over time , nearly all one-to-many relationships between reference entity types are really many-to-many relation-ships. Ok, if associations are to be represented as entity types, what do relationships represent? In Figure 5, above, there are two relationships between any associative entity and its reference entities. Each employment may be of one person and in one organization. First, we draw an entity type for household, as shown in Figure 6. The attributes shown are those delivered by the market research company. This means that the model can be better represented as shown in Figure 7.

Using this structure allows you, as you could not before, define in advance a specific list of sociological groups that are of interest. You can also now see that a car ownership is of a car model. If it were of interest, you could add a relationship to assert that each car model must be manufactured by an organization-a car company.

Income ranges can similarly be grouped into pre-defined categories. If it is not clear what kinds of categories will be of interest in the future, this model can be further generalized, as shown in Figure 8. A word of caution: Generalizing a model makes it much more flexible and robust. It insulates it against the effects of future changes to the business.

At the same time, however, it removes the model from the language of your customers. It is important to use judgment in determining how general to become. It is important to do this teaching carefully, however.

  • Developing High Quality Data Models -;
  • Developing High Quality Data Models by Matthew West.
  • Behavioral Pharmacology: The Current Status.

In presenting this model, begin with Figure 6 and work your way to Figures 7 and 8. The sixth principle in the Shell paper asserts that only surrogate identifiers should be used as unique identifiers for entity types. A surrogate identifier is a system generated number that is uniquely assigned to each row. It has no inherent meaning and is used only to distinguish one row from another. Your author respectfully disagrees with part of this one. It is the case that for reference entity types those that are not required to be related to any other entity types , surrogate identifiers are best.

It is extremely rare to find any attributes for product, person, activity, etc. Moreover, the temptation left over from manual identification systems , to encode all manner of information into the identifier is genuinely a bad idea. Surrogate identifiers are simpler, easier to implement, and a better guarantor of the integrity of the data.

In the case of associative entity types, however, the identity of an occurrence is, by definition, derived from the things it is associating. If the related things are cleanly identified with surrogate identifiers, there is no harm in using the relationships to those things as the identifier for the association. By definition, you cannot change the values of such a composite identifier for an occurrence and keep the same occurrence. Even though this is using a surrogate for part of the identifier, it is useful to keep the natural identifier of the entity type visible.

Figure 9 shows this. In that case a surrogate identifier is better. By the way, this is a very gentle argument. To assign a surrogate identifier to all entity types, including associative ones does minimal harm. To do so, however, does cloud the meaning of each entity type. The second important dimension of data model quality is the use of adequate names for entity types, attributes, and relationships.

As cited in the description of Principle 1, above, an entity type is a thing of significance to the organization. As such, it should have a common, recognized natural language name. In Figure 9, for example, the names party, vendor, and product type are all names of real things. An entity type name should not be:.

An attribute is the definition of a kind of data about an entity type. This also should be a simple natural language term. There is some discussion in the industry about standardizing the structure of attribute names, but this is problematic. The idea of classifying attribute names is not a bad one, but it must be carried out with care.

A structure that produces names that are not simple or intuitive is counter-productive if the model is to be understood by the public at large.

Passar bra ihop