Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Database System Design: File Organization and Indexing, Slides of Introduction to Database Management Systems

An in-depth analysis of various record organization and indexing techniques used in database management systems. It covers topics such as fixed length records, variable length records, field separation alternatives, and indexing. The document also discusses the advantages and disadvantages of each technique and their impact on file access methods.

Typology: Slides

2011/2012

Uploaded on 02/13/2012

richiey
richiey 🇨🇦

32 documents

1 / 43

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS 448/648
1
System Structure Revisited
Naïve users Application
programmers
Casual users Database
administrator
Forms Application
Front ends DML Interface CLI DDL
Indexes System
Catalog
Data
files
DDL
Compiler
Disk Space Manager
Buffer Manager
File & Access Methods
Query Evaluation Engine
SQL Commands
Recovery Manager
Transaction
&
Lock
Manager
DBMS
2
nSome DBMS component indicates it wants to read
record R
nFile Manager
lDoes security check
lUses access structures to determine the page it is on
lAsks the buffer manager to find that page
nBuffer Manager
lChecks to see if the page is already in the buffer
lIf so, gives the buffer address to the requestor
lIf not, allocates a buffer frame
lAsks the Disk Manager to get the page
nDisk Manager
lDetermines the physical address(es) of the page
lAsks the disk controller to get the appropriate block of data
from the physical address
nDisk controller instructs disk driver to do dirty job
Disk Access Process
(Overly Simplifed)
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b

Partial preview of the text

Download Database System Design: File Organization and Indexing and more Slides Introduction to Database Management Systems in PDF only on Docsity!

1

Naïve users^ System Structure Revisited Application

Casual users (^) programmers Databaseadministrator Forms ApplicationFront ends DML Interface CLI DDL

Indexes (^) CatalogSystem Datafiles

Compiler^ DDL

Disk Space Manager

Buffer Manager

File & Access Methods

Query Evaluation Engine

SQL Commands

Recovery Manager

Transaction& ManagerLock

DBMS

2

n Some DBMS component indicates it wants to read record R n File Manager

l Does security check

l Uses access structures to determine the page it is on

l Asks the buffer manager to find that page

n Buffer Manager

l Checks to see if the page is already in the buffer

l If so, gives the buffer address to the requestor

l If not, allocates a buffer frame

l Asks the Disk Manager to get the page

n Disk Manager

l Determines the physical address(es) of the page

l Asks the disk controller to get the appropriate block of data

from the physical address

n Disk controller instructs disk driver to do dirty job

Disk Access Process

(Overly Simplifed)

3

Storage Hierarchy

Cache

Main Memory

Virtual Memory

File System

Tertiary Storage

Programs

DBMS

Capacity vs Cost & Speed

Secondary Storage

2-5 ns Registers

3-10 ns

80-400 ns

5,000,000 ns

4

Disks

n Direct access storage devices (DASD)

l It is possible to access data directly as well as sequentially l Accessing data has a lower overhead than serial devices (e.g., tapes)

n Types

l Fixed- head hard disks l Removable hard disks l Floppy disks

7

Disk Packs

cylinders

tracks

8

n Disk Access Time = Seek time + Rotational delay +

transfer time

n Seek time: time to move access arm to correct cylinder

l T (seek) = S + Const ∗ N , where S = Initial time, N = # of cylinders moved

n Rotational delay: time to find correct place on track

l on average: half a revolution

n Transfer time: time to move data

l (# of bytes transferred)÷(# of bytes of track ) ∗ rotation time

Cost of Disk Access

9

Improving Access Time

n Organize data by cylinders

n Use multiple disks - use mirroring

l RAID

n More intelligent disk scheduling

l Elevator algorithm

n Prefetching and buffering

10

Buffer Management

n Goal: reduce the number of disk accesses n Reside in RAM n Buffer strategies

l multiple buffering

l buffer pooling

Program data area

Program data area

I/O Buffer I/O Buffer I/O Buffer I/O Buffer

Disk

Disk

13

Field Separation Alternatives

n Fixed length fields

l A given field (e.g., NAME) is the same size for all records

l Easy and fast reading but wastes space

320587 Joe Smith SC 95 3 184923 Kathy Lee EN 92 3 249793 Albert Chan SC 94 3 n Length indicator at the beginning of each field

l Also wastes space (at least 1 byte per field)

l You have to know the length before you store

63205879Joe Smith2SC 61849238Kathy Li2EN 624979311Albert Chan2SC

14

Field Separation Alternatives

n Separate fields with delimeters

l Use white space characters (blank, new line, tab)

l Easy to read, uses one byte per field, have to be careful in the choice of

the delimeter

| 320587 | Joe Smith | SC | 95 | 3 | | 184923 | Kathy Li | EN | 92 |3| | 249793 | Albert Chan | SC | 94 | 3 | n Use keywords

l Each field has a keyword that indicates what the field is

l Self describing but high space overhead

ID= 320587 NAME= Joe Smith FACULTY= SC DEG= 92 YEAR= 3 ID= 184923 NAME= Kathy Li FACULTY= EN DEG= 92 YEAR= ID= 249793 NAME= Albert Chan FACULTY= SC DEG= 94 YEAR= 3

15

Record Organization Alternatives

n Fixed length records

l All records are the same length

320587 Joe Smith SC 95 3 184923 Kathy Lee EN 92 3 249793 Albert Chan SC 94 3

l The number and size of fields in each record may be variable

| 320587 | Joe Smith | SC | 95 | 3 | Padding | 184923 | Kathy Li | EN | 92 |3| Padding | 249793 | Albert Chan | SC | 94 | 3 | Padding

16

Record Organization Alternatives

n Variable Length Records

l Fixed number of fields

‡ Count the fields to detect the end of record

l Length field at the beginning

‡ Put the length of each record in front of it

‡ You have to buffer the record before writing

24 (^320587) Li | EN | 92 | Joe Smith | 3 26 249793 | SC || Albert Chan 95 | 3 23 (^184923) | SC | Kathy | 94 | 3

19

Accessing a File

n Sequential access

l Scan the file l Useful when file is small or most (all) of the file needs to be searched l Complexity O( n ) where n is the number of disk reads l Block records to reduce n l Block size should match physical disk organization

‡ multiples of sector size

n Direct access

l Based on relative record number (RRN) l Record-based file systems can jump to the record directly l Stream-based systems calculate byte offset = RRN * record length

20

Search Problem

Find a record with a given key value

n Sequential search: O( n )

n Binary search: O(log n )

l the file must be sorted l how to maintain the sorting order?

‡ deleting, insertion

l variable length records

n Sorting

l RAM sort: read the whole file into RAM, sort it, and then write it back to disk l Keysort: read the keys into RAM, sort keys in RAM and then rearrange records according to sorted keys l Index

21

Keysorting

320587 Joe Smith SC 95 3

184923 Kathy Lee EN 92 3

249793 Albert Chan SC 94 3

Before sorting RRN

320587 Joe Smith SC 95 3

184923 Kathy Lee EN 92 3

249793 Albert Chan SC 94 3

After sorting

Problem: Now the physical file has to be rearranged

22

Indexing

n A tool used to find things

l book index, student record indexes l A function from keys to addresses

n A record consisting of two fields

l key: on which the index is searched l reference: location of data record associated with the key

n Advantages

l smaller size of the index file makes RAM index possible l binary search from files of variable length records l rearrange keys without moving records l multiple indexes

‡ primary and secondary

25

Primary Index on Unordered Files

St. Id. Name Major Yr. 10567 15973 96256 29579 11589 84920 34596 75623

J. Doe CS 3 M. Smith P. Wright B. Zimmer T. Allen S. Allen T. Atkins J. Wong

CS

ME

BS

BA

CS

ME

BA

26

Operations

n Record addition

l Append the record to the end; Insert a record to the appropriate place in the index l Requires reorganization of the index

n Record deletion

l Delete the physical record using any feasible technique l Delete the index record and reorganize

n Record updates

l If key field is affected, treat as delete/add l If key field is not affected, no problem.

27

Primary Index on Ordered Files

n Physical records may be kept ordered on the

primary key

n The index is ordered but only one index record for

each block

n Reduces the index requirement, enabling binary

search over the values (without having to read all

of the file to perform binary search).

28

Primary Index on Ordered Files

10567 J. Doe CS 3

15973 M. Smith CS 3

11589 T. Allen BA 2

29579 B. Zimmer BS 1 34596 T. Atkins ME 4 75623 J. Wong BA 3 84920 S. Allen CS 4 96256 P. Wright ME 2

31

Secondary Index

St. Id. Name Major Yr. 10567 15973 96256 29579 11589 84920 34596 75623

J. Doe CS 3 M. Smith P. Wright B. Zimmer T. Allen S. Allen T. Atkins J. Wong

CS ME BS BA CS ME BA

3 2 1 2 4 4 3 10567 11589 15973 (^2957934596) 75623 (^8492096256)

CS

BA

ME

BS

3

1

4

2

32

Problems With Inverted Files

∂ Inverted file indexes cannot be maintained in main

memory as the database sizes and number of

keywords increase.

∑ Binary search over indexes that are stored in

secondary storage is expensive.

l Too many seeks and I/Os.

∏ Maintaining the indexes in sorted key order is

difficult and expensive.

33

B-Trees

n A B-tree of order m is a paged multi- way search tree such that

l Each page contains a maximum of m -1 keys

l Each page, except the root, contains at least

l Root has at least 2 descedants unless it is the only node

l A non-leaf page with k keys has k +1 descendants

l All the leaves appear at the same level

n Build the tree bottom- up to maintain balance

l Split & promotion for overflow during insertion

l Redistribution & concatenation for underflow during deletion

 m 2 −^1 keys

34

B-Tree Structure

P 1 K 1 D 1 P 2 K 2 D 2 P 3 K 3 D 3 P 4

-1 -1 -1 -

pointertree data pointer

Ki Kj Kr^ leafnode

K 2 < X < K 3

X

37

Insertion Example

(c) Insert A

(d) Insert M (e) Insert P, I, B, W

C D S

(a)

0

2 S

A C D T

2

0 1

D

A C I T

S

B M P W

2

0 3 1

Promotion from left

C D T

0 1 S

S

(b) Insert T

T M

A C M T

0 3 1 D

D S^2

38

Insertion Example

(f) Insert N

(g) Insert G, U, R

D

I P

N

A B C M T W

S 0 3 4 1

2

D

G P

N

A B C I M R T U W

S 0 3 4 1

2

39

Insertion Example

(g) Insert K

D

G P

N

A B C I M R T U W

S 0 3 4 1

2 K

A B C G I K M

0 3 5 P R T U W

4 1

D N S 2 ?

40

Insertion Example

(g) Insert K

0

S

B C G I K M

3 5 P R T U W

4 1

D N

2 6 S

A

S