Teradata Wiki: Teradata Indexes

Showing posts with label Teradata Indexes. Show all posts

Join Index

A Join Index is an optional index which may be created by a User. Join indexes provide additional processing efficiencies:

Eliminate base table access
Eliminate aggregate processing
Reduce joins
Eliminate redistributions
Eliminate Summary processing

The following are wide variety of Join Index names. We will discuss each in clear detail.

Single Table Join Index
Multi-Table Join Index
Multi-Table Compressed Join Index
Aggregate Join Index
Sparse Join Index
Global Join Index
Hash Index

A Single-Table Join Index duplicates a single table, but changes the Primary Index. Users will only query the base table, but the Parsing Engine will use the Join Index.

TeradataWiki-Teradata Single-Table Join Index

The reason to create a Single-Table Join Index is so joins can be performed faster because no Redistributions or Duplication needs to occur.

A Multi-Table Join index is a Join Index that involves two or more tables.
Facilitates join operations by possibly eliminating join processing or by reducing/eliminating join data redistribution.

TeradataWiki-Teradata Multi-Table Join index

A Compressed Join Index is designed to save space by not REPEATING the repeating values

TeradataWiki-Teradata Compressed Join Index

An Aggregate Join Index will allow tracking of the Aggregates SUM and COUNT on any table

TeradataWiki-Teradata Aggregate Join Index

A Sparse Join Index is a Join Index that doesn‘t use every row because it has a WHERE Clause. This is done to save space and time.

Hash Indexes are used similar to a Join Index, but Hash Indexes are maintained in AMP-Local tables and used to quickly find certain key columns in a base table.

Join Index Details:

Max 64 columns per table per Join Index.
BLOB and CLOB types cannot be defined.
Triggers with Join Indexes are allowed V2R6.
After Restoring a Table, Drop and Recreate the Join Index.
Automatically updated as table changes.
Fast load/Multi load wont load with them.
can have NUPI and NUSI.
Collect statistics on Primary and Secondary.

Partitioned Primary Index

Partitioned Primary Index (PPI) is an indexing mechanism in Teradata Database.

PPI is used to improve performance for large tables when you submit queries that specify a range constraint.
PPI allows you to reduce the number of rows to be processed by using partition elimination.
PPI will increase performance for incremental data loads, deletes, and data access when working with large tables with range constraints

Lets take Order_Table, where we have both January and February dates in column Order_Date.

The Order_Table spread across the AMPs.Notice that January and February dates are mixed on every AMP in what is a random order. This is because the Primary Index is Order_Number.

When we apply Range Query, that means it uses the keyword BETWEEN.

The BETWEEN keyword in Teradata means find everything in the range BETWEEN this date and this other date. We had no indexes on Order_Date so it is obvious the PE will command the AMPs to do a Full Table Scan. To avoid full table scan, we will Partition the table.

After Partitioned Table,
The example of AMPs on the top of the page. This table is not partitioned.

The example of AMPs on the bottom of the page. This table is partitioned.

Each AMP always sorts its rows by the Row-ID in order to do a Binary Search on Primary Index queries.

Notice that the rows on an AMP don‘t change AMPs because the table is partitioned. Remember it is the Primary Index alone that will determine which AMP gets a row. If the table is partitioned then the AMP will sort its rows by the partition.

Now we are running our Range Query on our Partitioned Table,each AMP only reads from one partition. The Parsing Engine will not to do full table scan. It instructs the AMPs to each read from their January Partition. You Partition a Table when you CREATE the Table.

A Partitioned Table is designed to eliminate a Full Table Scan, especially on Range Queries.

Types of partitioning:

RANGE_N Partitioning

Below is the example for RANGE_N Partition by day.
CREATE TABLE ORDER_TABLE

(

ORDER_NO INTEGER NOT NULL,

CUST_NO INTERGER,

ORDER_DATE DATE,

ORDER_TOTAL DECIMAL(10,2)

)

PRIMARY INDEX(ORDER_NO)

PARTITION BY RANGE_N

(ORDER_DATE BETWEEN

DATE '2012-01-01' AND DATE '2012-12-31'

EACH INTERVAL '7' DAY);

Case_N Partitioning
CREATE TABLE ORDER_TABLE

(

ORDER_NO INTEGER NOT NULL,

CUST_NO INTERGER,

ORDER_DATE DATE,

ORDER_TOTAL DECIMAL(10,2)

)

PRIMARY INDEX(ORDER_NO)

PARTITION BY CASE_N

(ORDER_TOTAL < 1000,

ORDER_TOTAL < 2000,

ORDER_TOTAL < 5000,

ORDER_TOTAL < 10000,

ORDER_TOTAL < 20000,

NO CASE, UNKNOWN);

The UNKNOWN Partition is for an Order_Total with a NULL value. The NO CASE Partition is for partitions that did not meet the CASE criteria.

For example, if an Order_Total is greater than 20,000 it wouldn‘t fall into any of the partitions so it goes to the NO CASE partition.

Multi-Level Partitioning:
You can have up to 15 levels of partitions within partitions.

CREATE TABLE ORDER_TABLE

(

ORDER_NO INTEGER NOT NULL,

CUST_NO INTERGER,

ORDER_DATE DATE,

ORDER_TOTAL DECIMAL(10,2)

)

PRIMARY INDEX(ORDER_NO)

PARTITION BY (RANGE_N

(ORDER_DATE BETWEEN

DATE '2012-01-01' AND DATE '2012-12-31'

EACH INTERVAL '1' DAY)

CASE_N (ORDER_TOTAL < 5000,

ORDER_TOTAL < 10000,

ORDER_TOTAL < 15000,

ORDER_TOTAL < 20000,

NO CASE, UNKNOWN));

Character Based Partitioning(New Feature V13.10) :
There are four new data types available for Character Based PPI. They are CHAR, VARCHAR, GRAPHIC, and VARGRAPHIC.

Example for RANGE Based Character PPI

CREATE TABLE EMP_TBL

(

EMP_NO INTEGER NOT NULL,

DEPT_NO INTEGER,

FIRST_NAME CHAR(20),

LAST_NAME VARCHAR(20),

SALARY DECIMAL(10,2),

) PRIMARY INDEX(EMP_NO)

PARTITION BY RANGE_N

(LAST NAME BETWEEN ( 'A ','B ','C ','D ','E ','F ','G ','H ',

'I ','J ','K ','L ','M ','N ','O ','P ','Q ','R ','S ','T ',

'U ','V ','W ','X ','Y ','Z ' AND 'ZZ',UNKNOWN));

Example for CASE Based Character PPI

CREATE TABLE PRODUCT_TBL

(PRODUCT_ID INTEGER NOT NULL

,PRODUCT_NAME CHAR(30)

,PRODUCT_COST DECIMAL(10,2)

,PRODUCT_DESCRIPTION VARCHAR(100)

)PRIMARY INDEX(PRODUCT_ID)

PARTITION BY CASE_N

(PRODUCT_NAME < 'Apples'

PRODUCT_NAME < 'Bananas'

PRODUCT_NAME < 'Cantaloupe'

PRODUCT_NAME < 'Grapes'

PRODUCT_NAME < 'Lettuce'

PRODUCT_NAME < 'Mangos'

PRODUCT_NAME >='Mangos' and <='Tomatoes');

Ex-Query: Find all Products between Apples and Grapes?

Ans: SELECT * FROM PRODUCT_TBL WHERE PRODUCT_NAME BETWEEN 'Apples' and 'Grapes';

Partitioning Rules:

A table can have up to 65,535 Partitions.
Partitioning never determines which AMP gets row.
Partitioning determines how an AMP will sort the row on its own.
Table can have up to 15 levels of partitions.
A table cannot have an UPI as the Primary Index if the Partition table does not include PI.
Total 3 forms of Partitioning Simple, RANGE and CASE.

Advantages of partitioned tables:

They provide efficient searches by using partition elimination at the various levels or combination of levels.
They reduce the I/O for range constraint queries
They take advantage of dynamic partition elimination
They provide multiple access paths to the data, and an MLPPI provides even more partition elimination and more partitioning expression choices, (i.e., you can use last name or some other value that is more readily available to query on.)
The Primary Index may be either a UPI or a NUPI; a NUPI allows local joins to other similar entities
Row hash locks are used for SELECT with equality conditions on the PI columns.
Partitioned tables allow for fast deletes of data in a partition.
They allow for range queries without having to use a secondary index.
Specific partitions maybe archived or deleted.
May be created on Volatile tables; global temp tables, base tables, and non-compressed join indexes.
May replace a Value Ordered NUSI for access.

Disadvantages of partitioned tables:
•Rows in a partitioned table are 2 bytes longer.
•Access via the Primary Index may take longer.
•Full table joins to a NPPI table with the same PI may take longer.

Secondary Index

Before going to learn about Secondary Index, i would like to suggest you learn Primary Index.

A Secondary Index (SI) is an alternate data access path. It allows you to access the data without having to do a full-table scan.

You can drop and recreate secondary indexes dynamically, as they are needed.Secondary Indexes are stored in separate subtables that requires additional disk space and maintenance which is handled automatically by the system.

The entire purpose for the Secondary Index Subtable will be to point back to the real row in the base table via the Row-ID.

Secondary Index Rules

Rule 1: Secondary Indexes are optional.

Rule 2: Secondary Index values can be unique or non-unique.

Rule 3: Secondary Index values can be NULL.

Rule 4: Secondary Index values can be modified.

Rule 5: Secondary Indexes can be changed.

Rule 6: A Secondary Index has a limit of 64 columns.

Like Primary Indexes,Secondary Indexes also two Types (USI or NUSI).

A Unique Secondary Index (USI) serves two purposes.

Enforces uniqueness on a column or group of columns. The database will check USIs to see if the values are unique.
Speeds up access to a row (data retrieval speed).

When A USI is created Teradata will immediately build a secondary index subtable on each AMP.

Each AMP will then hash the secondary index value for each of their rows in the base table.

In the Below diagram,each AMP hashes the Emp_no column for all employee rows they hold.

The output of the Emp_no hash will utilize the hash map to point to a specific AMP and that AMP will hold the secondary index subtable row for the secondary index value.

That means the subtable row will hold the base table Row-ID and Teradata will then find the base row immediately.

Syntax to create a Unique Secondary Index is:

CREATE UNIQUE INDEX (Column/Columns) ON .;

A Non-Unique Secondary Index (NUSI) is usually specified to prevent full-table scans, in which every row of a table is read.

When A NUSI is created Teradata will immediately build a secondary index subtable on each AMP.

Each AMP will hold the secondary index values for their rows in the base table only. In our example, each AMP holds the name column for all employee rows in the base table on their AMP (AMP local).

Each AMP Local name will have the Base Table Row-ID (pointer)so the AMP can retrieve it quickly if needed. If an AMP contains duplicate first names, only one subtable row for that name is built with multiple Base Row-IDs.

Syntax to create a Non-Unique Secondary Index is:

CREATE INDEX (Column/Columns) ON .;

There can be up to 32 Secondary Indexes on a table

An USI is always a Two-AMP operation so it is almost as fast as a Primary Index, but a NUSI is an All-AMP operation, but not a Full Table Scan.

Data Access With USI:

The SQL is submitted, specifying a USI (in this case, a customer number of 56).
The hashing algorithm calculates a row hash value (in this case, 602).
The hash map points to the AMP containing the subtable row corresponding to the row hash value (in this case, AMP 2).
The subtable indicates where the base row resides (in this case, row 778 on AMP 4).
The message goes back over the BYNET to the AMP with the row and the AMP accesses the data row (in this case, AMP 4).
The row is sent over the BYNET to the PE, and the PE sends the answer set on to the client application.
As shown in the example above, accessing data with a USI is typically a two-AMP operation. However, it is possible that the subtable row and base table row could end up being stored on the same AMP, because both are hashed separately. If both were on the same AMP, the USI request would be a one-AMP operation.

Data Access With NUSI:

The SQL is submitted, specifying a NUSI (in this case, a last name of "Smith").
The hashing algorithm calculates a row hash value for the NUSI (in this case, 567).
All AMPs are activated to find the hash value of the NUSI in their index subtables. The AMPs whose subtables contain that value become the participating AMPs in this request (in this case, AMP1 and AMP2). The other AMPs discard the message.
Each participating AMP locates the row IDs (row hash value plus uniqueness value) of the base rows corresponding to the hash value (in this case, the base rows corresponding to hash value 567 are 640, 222, and 115).
The participating AMPs access the base table rows, which are located on the same AMP as the NUSI subtable (in this case, one row from AMP 1 and two rows from AMP 2).
The qualifying rows are sent over the BYNET to the PE, and the PE sends the answer set on to the client application (in this case, three qualifying rows are returned).

-----*-----

Social Icons

Pages

Facebook fan

Recent Updates

Interview Questions

Links

Total Pageviews

Join Index

Partitioned Primary Index

Secondary Index

Quick Links

Utilities

SQL