Part 03: Modeling methodologies

“Data modeling in software engineering is the process of creating a data model for an information system by applying formal data modeling techniques.”

Data modeling is a process used to define and analyze data requirements needed to support the business processes within the scope of corresponding information systems in organizations. Therefore, the process of data modeling involves professional data modellers working closely with business stakeholders, as well as potential users of the information system.

There are three different types of data models produced while progressing from requirements to the actual database to be used for the information system. The data requirements are initially recorded as a conceptual data model which is essentially a set of technology independent specifications about the data and is used to discuss initial requirements with the business stakeholders. The conceptual model is then translated into a logical data model, which documents structures of the data that can be implemented in databases. Implementation of one conceptual data model may require multiple logical data models. The last step in data modeling is transforming the logical data model to a physical data model that organizes the data into tables, and accounts for access, performance and storage details. Data modeling defines not just data elements, but also their structures and the relationships between them.

Conceptual, logical and physical schemas

In 1975 ANSI described three kinds of data-modelinstance:

  • Conceptual/ semantic schema: describes the semantics of a domain (the scope of the model). For example, it may be a model of the interest area of an organization or of an industry. This consists of entity classes, representing kinds of things of significance in the domain, and relationships assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that sense, it defines the allowed expressions in an artificial “language” with a scope that is limited by the scope of the model. Simply described, a conceptual schema is the first step in organizing the data requirements.
  • Logical schema: describes the structure of some domain of information. This consists of descriptions of (for example) tables, columns, object-oriented classes, and XML tags. The logical schema and conceptual schema are sometimes implemented as one and the same.[3]
  • Physical schema: describes the physical means used to store data. This is concerned with partitions, CPUs, tablespaces, and the like.

According to ANSI, this approach allows the three perspectives to be relatively independent of each other. Storage technology can change without affecting either the logical or the conceptual schema. The table/column structure can change without (necessarily) affecting the conceptual schema. In each case, of course, the structures must remain consistent across all schemas of the same data model.

Semantic/ Conceptual data model Schema
A conceptual schema is a high-level description of a business’s informational needs. It typically includes only the main concepts and the main relationships among them. Typically this is a first-cut model, with insufficient detail to build an actual database.This level describes the structure of the whole database for a group of users. The conceptual model is also known as the data model as data model can be used to describe the conceptual schema when a database system is implemented. It hides the internal details of physical storage and targets on describing entities, datatype, relationships and constraints.

A conceptual schema or conceptual data model is a map of concepts and their relationships used for databases. This describes the semantics of an organization and represents a series of assertions about its nature. Specifically, it describes the things of significance to an organization (entity classes), about which it is inclined to collect information, and characteristics of (attributes) and associations between pairs of those things of significance (relationships).

Because a conceptual schema represents the semantics of an organization, and not a database design, it may exist on various levels of abstraction. The original ANSI four-schema architecture began with the set of external schemas that each represent one person’s view of the world around him or her. These are consolidated into a single conceptual schema that is the superset of all of those external views. A data model can be as concrete as each person’s perspective, but this tends to make it inflexible. If that person’s world changes, the model must change. Conceptual data models take a more abstract perspective, identifying the fundamental things, of which the things an individual deals with are just examples.

The model does allow for what is called inheritance in object oriented terms. The set of instances of an entity class may be subdivided into entity classes in their own right. Thus, each instance of a sub-type entity class is also an instance of the entity class’s super-type. Each instance of the super-type entity class, then is also an instance of one of the sub-type entity classes.

Logical data model schema
A logical schema is a data model of a specific problem domain expressed in terms of a particular data management technology.

Without being specific to a particular database management product, it is in terms of either relational tables and columns, object-oriented classes, or XML tags. This is as opposed to a conceptual data model, which describes the semantics of an organization without reference to technology, or a physical data model, which describes the particular physical mechanisms used to capture data in a storage medium.

The next step in creating a database, after the logical schema is produced, is to create the physical schema.

Physical data model schema
Physical schema is a term used in data management to describe how data is to be represented and stored (files, indices, et al.) in secondary storage using a particular database management system (DBMS) (e.g., Oracle RDBMS, Sybase SQL Server, etc.).

In the ANSI/SPARC Architecture three schema approach, the internal schema is the view of data that involved data management technology. This is as opposed to an external schema that reflects an individual’s view of the data, or the conceptual schema that is the integration of a set of external schemas.

Subsequently the internal schema was recognized to have two parts:

The logical schema was the way data were represented to conform to the constraints of a particular approach to database management. At that time the choices were hierarchical and network. Describing the logical schema, however, still did not describe how physically data would be stored on disk drives. That is the domain of the physical schema. Now logical schemas describe data in terms of relational tables and columns, object-oriented classes, and XML tags.

A single set of tables, for example, can be implemented in numerous ways, up to and including an architecture where table rows are maintained on computers in different countries.


About Tobias Riedner
Tobias Riedner foundet in 2015. He works and worked as innovator, consultant, analyst and educator in the fields of business intelligence and data warehousing. He learned a lot from the best consultants, managers und educators in the past and shares his knowledge worldwide. He works for a steady growing traditional company which is a leader in industry 4.0.