BA371 — Database Concepts
- Relational database: developed by E.F. Codd (IBM) in 1970s.
- Founded on relational algebra.
- Dominant database model in business (Note: NoSQL databases are up and coming).
- Relation (≠ relationship):
- Relations (tables in an RDBMS) represent types of 'things' or entities; basic representations of concepts.
- Rows (or records or tuples) represent instances of those entities.
- Columns (or fields) represent characteristics (attributes) of the entity.
- Columns represent only a single attribute.
- Cells (row-column combinations) hold single values of a single instance of an entity.
- All rows in a relation/table describe the same type of entity (different from a spreadsheet).
- Rows are (must be) unique within the table.
- Column names are (must be) unique within the table.
- !!! Row and column order are irrelevant !!!
- What can we store in a cell (row-column combination)?
- Character: ASCII
(American Standard Code for Information Interchange) or Unicode characters.
- String (n chars).
- Number (integer, float, size in digits).
- Date or date/time.
- Manufacturer-provided date types:
- e.g., Oracle standard date: 7 bytes: century(1), year(2), month(3), day(4), hour(5), minute(6), second(7).
- You understand how this works?
- Why is this representation wasteful?
- Microsoft SQL Server:
- Datetime: 8 bytes (accuracy: 3/100 second).
- Smalldatetime: 4 bytes (accuracy: 1 minute).
- Microsoft Access: 8 bytes (accuracy: 1 second).
- General computing principle at work here: detail comes at a cost (of memory and/or storage and/or speed).
- Always(!!) consider that cost: e.g., assume the following (near) real time system:
- 500 data sources with 1-second accuracy; e.g., a building control system.
- 1 data point per source per minute.
- = 500 x 60 = 30,000 data points per hour.
- = 500 x 60 x 24 = 720,000 data points per day.
- = 500 x 60 x 24 x 365 = 262,800,000 data points per year.
- = 262,800,000 * (Oracle) 7 bytes = 1,839,600,000; 1.8 GB of date/timestamps
- Does not include the observed values!!
- Size roughly doubles when a DB index is added.
- Takes RDBMS time to find data.
Santander's (Spain - 2013): network of 12,000 sensors
- Toyota (2018): working on collecting data from cars every 200 ms. —> 7M datapoints per car per day.
- IoT: 'Internet of Things.'
- Internet of Aircraft Things:
"Pratt & Whitney's Geared Turbo Fan (GTF) engine, which is fitted with 5,000 sensors that generate up to 10 GB of data per second."
- Problem: Can we do better? i.e., can
we think of a different date/time model that still is accurate to the second yet saves lots of space? At what cost?
- Julian date.
- Bytes (BLOB: Binary Large OBject / LBO (Large Binary Object); series of bytes; e.g., image, text, memory, etc.).
- Can contain any series of bytes.
- No structure for the BLOB/LBO is assumed other than it being a series of bytes.
- Problem: When is this useful?
- Problem: When is this less useful? (e.g., when storing data in binary files; e.g., images).
- Object-oriented data models:
- Data are organized hierarchically; e.g., a person has a name, gender and address.
- Employees are persons, but also have a social security number and benefits.
- Customers are persons, but also have a credit limit.
- Relational — OO comparison:
|table ||lists or array|
|table row ||object|
|table column ||attribute/variable|
- Became en vogue during the object-oriented programming (OOP) boom (second half of the 1980s, early 1990s).
- Most vendors offer a version.
- Most OOP languages / DBMS offer conversion methods.
- XML databases:
- Data are also organized hierarchically:
<street>>1000 Some Street</street>
Although XML is very common, XML databases are less common.
Since XML is text based as well as verbose, it is an effective but not very efficient data storage medium.
XML is mostly used for data description and API data transfer.
Will discuss and practice later in the course.
Logical/Conceptual vs. Physical model:
- Nonrelational, non-hierarchical.
- Arbitrarily complex data structures.
- Invented to bypass relational integrity, normality and cost when scaled up big.
- No data representation standards; no (standard) query language.
- No integrity constraints.
- Will discuss and practice later in the course.
- Logical/Conceptual: the way we express the organization of the database; e.g., tables, rows, columns.
- Physical: the way data are stored on our machines; e.g., indexed binary files; distribution of table data over different disks, etc.