What are the advantages of partitioned tables:
1)They provide efficient searches by using partition elimination at the various levels or combination of levels.
2)They reduce the I/O for range constraint queries
3)They take advantage of dynamic partition elimination
4)They provide multiple access paths to the data, and an MLPPI provides even more partition elimination and more partitioning expression choices,
(i.e., you can use last name or some other value that is more readily available to query on.)
5)The Primary Index may be either a UPI or a NUPI; a NUPI allows local joins to other similar entities
6)Row hash locks are used for SELECT with equality conditions on the PI columns.
7)Partitioned tables allow for fast deletes of data in a partition.
8)They allow for range queries without having to use a secondary index.
9)Specific partitions maybe archived or deleted.
10)May be created on Volatile tables; global temp tables, base tables, and non-compressed join indexes.
11)May replace a Value Ordered NUSI for access.
=========================================================================
What are the Partitioning Rules:
1)A table can have up to 65,535 Partitions.
2)Partitioning never determines which AMP gets row.
3)Partitioning determines how an AMP will sort the row on its own.
4)Table can have up to 15 levels of partitions.
5)A table cannot have an UPI as the Primary Index if the Partition table does not include PI.
6)Total 3 forms of Partitioning Simple, RANGE and CASE.
=========================================================================
What are the Disadvantages of partitioned tables
1)Rows in a partitioned table are 2 bytes longer.
2)Access via the Primary Index may take longer.
3)Full table joins to a NPPI table with the same PI may take longer.
=========================================================================
Difference between Set and Multiset Table?
The SET table does not allow duplicate rows while the multiset table allow duplicate rows. For a SET table, to enhance the performance and to enforce uniqueness, a UPI or a USI can be created on a particular column. For a multiset table, a NUPI or NUSI can be created on a particular column to enhance performance.
Characteristics of SET tables:
Logically correct Implementation
Automatic duplicate row elimination
Characteristics of MULTISET tables:
No duplicate row check overhead
Data integrity becomes a user responsibility
=========================================================================What are the Advantages and Disadvantages of multi-column PI?
As many as 64 columns may be included in an index. Multi-column Primary Indexes may provide better distribution.
Advantage
*More columns = more uniqueness
*Distinct value increases.
*Rows per value decreases.
*Selectivity increases.
Disadvantage
*More columns = less usability
*PI can only be used when values for all PI columns are provided in SQL statement.
*Partial values do not generate a useful hash result.
=========================================================================
What is Value List Compression? how would you implement compression ?
The Value list compression cannot compress an existing column but it can be used to add a compressed column to a table.
Use the ALTER TABLE statement to compress columns and reduce the number of I/O operations. Consider the following:
*Set the column default value to most frequent value.
*Compress to the default value.
*This is especially useful for sparsely populated columns.
*Overhead is not high.
*The I/O savings correlates to the percentage of data compressed out of a row.
Value List Compression (VLC) provides the Teradata database with the capacity to support multiple value compression for fixed width columns. When you specify a values or values, the system suppresses any data matching the compress value from a row. This saves disk space. Smaller physical row size results in less data blocks and fewer I/Os and improved overall performance.
Because VLC allows you to specify a list of compress values for a column, the system suppresses data when one of the specified values exists in the column. Up to 255 distinct values (plus NULL) may be compressed per fixed-width column.
VLC improves performance as follows:
*Reduces the I/O required for scanning tables when the tables have compressible values in their columns.
*Reduces disk space because rows are smaller.
*Permits joins to look up the tables to be eliminated.
*Improves data loading because more rows may fit into one data block after compression is applied.
=========================================================================
What have you done to resolve the running out of spool problem?
To resolve running out of spool problem,
1) The column involved in the where clause/joining column is an index column or not, should be checked .
2) Also whether statistics have been collected on the index column. If no statistics have been collected then collect stats on it.
3) Check maxspool and peak spool from dbc.diskspace per amp basis and check for data skew.
4) Also check for improper join condition or join condition missing.
4) Despite of the steps taken, the running out of spool problem comes then explicitly increase the spool space.
=========================================================================
What are the limitations of Fast load?
1) No Secondary Indexes are allowed on the Target Table: Usually UPI and NUPI are used in Teradata to distribute the rows evenly across the AMPs.Secondary indexes are stored in a subtable block and many times on a different AMP from the data row.
2)No Referential Integrity is allowed: The Referential Integrity defined on a table would take more system checking to prevent referential constraints.
3)No Triggers are allowed at load time: Fast load focused on data load with high speed. So triggers not allowed.
4)Duplicate Rows (in Multi-Set Tables) are not supported: Multiset tables are allowed duplicate data. Fastload can load the data into multiset tables but duplicate rows are discarded.
5)No AMPs may go down (i.e., go offline) while FastLoad is processing: The down AMP must be repaired before the load process can be restarted
6)No more than one data type conversion is allowed per column: Data type conversion cause high resource utilization on the system
=========================================================================
What are the limitations of Multi load?
1)Unique Secondary Indexes are not supported on a Target Table:Like FastLoad, MultiLoad does not support Unique Secondary Indexes (USIs). But unlike FastLoad, it does support the use of Non-Unique Secondary Indexes (NUSIs) because the index subtable row is on the same AMP as the data row.
2)Referential Integrity is not supported: The Referential Integrity defined on a table would take more system checking to prevent referential constraints.
3)Triggers are not supported at load time: Disable all the Triggers prior to using it.
4)No concatenation of input files is allowed: It could impact are restart if the files were concatenated in a different sequence or data was deleted between runs.
5)No Join Indexes: All the join indexes must be dropped before running a MultiLoad and then recreate them after the load is completed
6)Will not process aggregates, arithmetic functions or exponentiation:If you need data conversions or math, you might be better off using an INMOD to prepare the data prior to loading it
=========================================================================
Explain the phases in Multiload?
MultiLoad Has Five IMPORT Phases:
Phase 1: Preliminary Phase : Ita Basic setup phase.Its used for several preliminary set-up activities for a successful data load.
Phase 2: DML Transaction Phase: All the SQL Data Manipulation Language (DML) statements are sent to Teradata database as Multilaod supports multiple DML functions.
Phase 3: Acquisition Phase: Once the setup completes the PE's plan stored on each AMP.Then Locks the table headers and the actual input data will also be stored in the worktable.
Phase 4: Application Phase: In this phase all DML opreations are applied on target tables.
Phase 5: Cleanup Phase: Table locks will be released and all the intermediate work tables will be dropped.
=========================================================================
What are the limitations of Tpump?
=========================================================================
How many types of error tables in Fast Load?
Each FastLoad requires two error tables. These are automatically created during run. These will populated only errors occurred during the load.
The first error table is for any translation errors or constraint violations
For example, if a column is defined as integer but the data from source the data is coming in CHAR format.i.e wrong data.
The second error table is for errors caused by duplicate values for Unique Primary Indexes.
=========================================================================
What is Quantiles?
Quantiles are used to divide a number of rows into a number of partitions of roughly equal size. The most common quantile is the percentile which is based on a value of 100. It is also not uncommon to have quartiles (based on 4), tertiles (based on 3) or deciles (based on 10). Notice that by default, both the quantile column and the quantile value itself will be output in ascending sequence.
Aggregations may not be combined with OLAP functions, thus a derived or temporary table is sometimes necessary to produce the aggregation before the function is applied. The standard form of the quantile function is:
QUANTILE (quantile_constant,sortlist)
SELECT employee_number, salary_amount,
QUANTILE (100, employee_number,salary_amount )
FROM employee
QUALIFY QUANTILE (100, employee_number) < 25;
=========================================================================
What is WIDTH_BUCKET ?
The WIDTH_BUCKET function assigns partition numbers to values associated with a column or expression, based on a specified range and based on the number of partitions desired.
Like many OLAP and aggregate functions, WIDTH_BUCKET is used to extract statistical information from sample populations.
The format of the function is as follows:
WIDTH_BUCKET (column_expr,lower_bound, upper_bound ,partition_count)
=========================================================================
What is Random function?
The RANDOM function may be used to generate a random number between a specified range.
RANDOM (Lower limit, Upper limit) returns a random number between the lower and upper limits inclusive. Both limits must be specified, otherwise a random number between 0 and approximately 4 billion is generated.
SEL DEPARTMENT_NUMBER,RANDOM(1,9) FROM EMPLOYEE;
limitations:
RANDOM is non-ANSI standard
RANDOM may be used in a SELECT list or a WHERE clause, but not both
RANDOM may be used in Updating, Inserting or Deleting rows
RANDOM may not be used with aggregate or OLAP functions
RANDOM cannot be referenced by numeric position in a GROUP BY or ORDER BY clause
=========================================================================
Related Posts:
Interview Questions Part-1 Interview Questions Part-2
1)They provide efficient searches by using partition elimination at the various levels or combination of levels.
2)They reduce the I/O for range constraint queries
3)They take advantage of dynamic partition elimination
4)They provide multiple access paths to the data, and an MLPPI provides even more partition elimination and more partitioning expression choices,
(i.e., you can use last name or some other value that is more readily available to query on.)
5)The Primary Index may be either a UPI or a NUPI; a NUPI allows local joins to other similar entities
6)Row hash locks are used for SELECT with equality conditions on the PI columns.
7)Partitioned tables allow for fast deletes of data in a partition.
8)They allow for range queries without having to use a secondary index.
9)Specific partitions maybe archived or deleted.
10)May be created on Volatile tables; global temp tables, base tables, and non-compressed join indexes.
11)May replace a Value Ordered NUSI for access.
=========================================================================
What are the Partitioning Rules:
1)A table can have up to 65,535 Partitions.
2)Partitioning never determines which AMP gets row.
3)Partitioning determines how an AMP will sort the row on its own.
4)Table can have up to 15 levels of partitions.
5)A table cannot have an UPI as the Primary Index if the Partition table does not include PI.
6)Total 3 forms of Partitioning Simple, RANGE and CASE.
=========================================================================
What are the Disadvantages of partitioned tables
1)Rows in a partitioned table are 2 bytes longer.
2)Access via the Primary Index may take longer.
3)Full table joins to a NPPI table with the same PI may take longer.
=========================================================================
Difference between Set and Multiset Table?
The SET table does not allow duplicate rows while the multiset table allow duplicate rows. For a SET table, to enhance the performance and to enforce uniqueness, a UPI or a USI can be created on a particular column. For a multiset table, a NUPI or NUSI can be created on a particular column to enhance performance.
Characteristics of SET tables:
Logically correct Implementation
Automatic duplicate row elimination
Characteristics of MULTISET tables:
No duplicate row check overhead
Data integrity becomes a user responsibility
=========================================================================What are the Advantages and Disadvantages of multi-column PI?
As many as 64 columns may be included in an index. Multi-column Primary Indexes may provide better distribution.
Advantage
*More columns = more uniqueness
*Distinct value increases.
*Rows per value decreases.
*Selectivity increases.
Disadvantage
*More columns = less usability
*PI can only be used when values for all PI columns are provided in SQL statement.
*Partial values do not generate a useful hash result.
=========================================================================
What is Value List Compression? how would you implement compression ?
The Value list compression cannot compress an existing column but it can be used to add a compressed column to a table.
Use the ALTER TABLE statement to compress columns and reduce the number of I/O operations. Consider the following:
*Set the column default value to most frequent value.
*Compress to the default value.
*This is especially useful for sparsely populated columns.
*Overhead is not high.
*The I/O savings correlates to the percentage of data compressed out of a row.
Value List Compression (VLC) provides the Teradata database with the capacity to support multiple value compression for fixed width columns. When you specify a values or values, the system suppresses any data matching the compress value from a row. This saves disk space. Smaller physical row size results in less data blocks and fewer I/Os and improved overall performance.
Because VLC allows you to specify a list of compress values for a column, the system suppresses data when one of the specified values exists in the column. Up to 255 distinct values (plus NULL) may be compressed per fixed-width column.
VLC improves performance as follows:
*Reduces the I/O required for scanning tables when the tables have compressible values in their columns.
*Reduces disk space because rows are smaller.
*Permits joins to look up the tables to be eliminated.
*Improves data loading because more rows may fit into one data block after compression is applied.
=========================================================================
What have you done to resolve the running out of spool problem?
To resolve running out of spool problem,
1) The column involved in the where clause/joining column is an index column or not, should be checked .
2) Also whether statistics have been collected on the index column. If no statistics have been collected then collect stats on it.
3) Check maxspool and peak spool from dbc.diskspace per amp basis and check for data skew.
4) Also check for improper join condition or join condition missing.
4) Despite of the steps taken, the running out of spool problem comes then explicitly increase the spool space.
=========================================================================
What are the limitations of Fast load?
1) No Secondary Indexes are allowed on the Target Table: Usually UPI and NUPI are used in Teradata to distribute the rows evenly across the AMPs.Secondary indexes are stored in a subtable block and many times on a different AMP from the data row.
2)No Referential Integrity is allowed: The Referential Integrity defined on a table would take more system checking to prevent referential constraints.
3)No Triggers are allowed at load time: Fast load focused on data load with high speed. So triggers not allowed.
4)Duplicate Rows (in Multi-Set Tables) are not supported: Multiset tables are allowed duplicate data. Fastload can load the data into multiset tables but duplicate rows are discarded.
5)No AMPs may go down (i.e., go offline) while FastLoad is processing: The down AMP must be repaired before the load process can be restarted
6)No more than one data type conversion is allowed per column: Data type conversion cause high resource utilization on the system
=========================================================================
What are the limitations of Multi load?
1)Unique Secondary Indexes are not supported on a Target Table:Like FastLoad, MultiLoad does not support Unique Secondary Indexes (USIs). But unlike FastLoad, it does support the use of Non-Unique Secondary Indexes (NUSIs) because the index subtable row is on the same AMP as the data row.
2)Referential Integrity is not supported: The Referential Integrity defined on a table would take more system checking to prevent referential constraints.
3)Triggers are not supported at load time: Disable all the Triggers prior to using it.
4)No concatenation of input files is allowed: It could impact are restart if the files were concatenated in a different sequence or data was deleted between runs.
5)No Join Indexes: All the join indexes must be dropped before running a MultiLoad and then recreate them after the load is completed
6)Will not process aggregates, arithmetic functions or exponentiation:If you need data conversions or math, you might be better off using an INMOD to prepare the data prior to loading it
=========================================================================
Explain the phases in Multiload?
MultiLoad Has Five IMPORT Phases:
Phase 1: Preliminary Phase : Ita Basic setup phase.Its used for several preliminary set-up activities for a successful data load.
Phase 2: DML Transaction Phase: All the SQL Data Manipulation Language (DML) statements are sent to Teradata database as Multilaod supports multiple DML functions.
Phase 3: Acquisition Phase: Once the setup completes the PE's plan stored on each AMP.Then Locks the table headers and the actual input data will also be stored in the worktable.
Phase 4: Application Phase: In this phase all DML opreations are applied on target tables.
Phase 5: Cleanup Phase: Table locks will be released and all the intermediate work tables will be dropped.
=========================================================================
What are the limitations of Tpump?
- No concatenation of input data files is allowed.
- TPump will not process aggregates, arithmetic functions or exponentiation.
- The use of the SELECT function is not allowed.
- No more than four IMPORT commands may be used in a single load task.
- Dates before 1900 or after 1999 must be represented by the yyyy format for the year portion of the date, not the default format of yy.
- On some network attached systems, the maximum file size when using TPump is 2GB.
- TPump performance will be diminished if Access Logging is used.
=========================================================================
How many types of error tables in Fast Load?
Each FastLoad requires two error tables. These are automatically created during run. These will populated only errors occurred during the load.
The first error table is for any translation errors or constraint violations
For example, if a column is defined as integer but the data from source the data is coming in CHAR format.i.e wrong data.
The second error table is for errors caused by duplicate values for Unique Primary Indexes.
=========================================================================
What is Quantiles?
Quantiles are used to divide a number of rows into a number of partitions of roughly equal size. The most common quantile is the percentile which is based on a value of 100. It is also not uncommon to have quartiles (based on 4), tertiles (based on 3) or deciles (based on 10). Notice that by default, both the quantile column and the quantile value itself will be output in ascending sequence.
Aggregations may not be combined with OLAP functions, thus a derived or temporary table is sometimes necessary to produce the aggregation before the function is applied. The standard form of the quantile function is:
QUANTILE (quantile_constant,sortlist)
SELECT employee_number, salary_amount,
QUANTILE (100, employee_number,salary_amount )
FROM employee
QUALIFY QUANTILE (100, employee_number) < 25;
=========================================================================
What is WIDTH_BUCKET ?
The WIDTH_BUCKET function assigns partition numbers to values associated with a column or expression, based on a specified range and based on the number of partitions desired.
Like many OLAP and aggregate functions, WIDTH_BUCKET is used to extract statistical information from sample populations.
The format of the function is as follows:
WIDTH_BUCKET (column_expr,lower_bound, upper_bound ,partition_count)
=========================================================================
What is Random function?
The RANDOM function may be used to generate a random number between a specified range.
RANDOM (Lower limit, Upper limit) returns a random number between the lower and upper limits inclusive. Both limits must be specified, otherwise a random number between 0 and approximately 4 billion is generated.
SEL DEPARTMENT_NUMBER,RANDOM(1,9) FROM EMPLOYEE;
limitations:
RANDOM is non-ANSI standard
RANDOM may be used in a SELECT list or a WHERE clause, but not both
RANDOM may be used in Updating, Inserting or Deleting rows
RANDOM may not be used with aggregate or OLAP functions
RANDOM cannot be referenced by numeric position in a GROUP BY or ORDER BY clause
=========================================================================
Related Posts:
Interview Questions Part-1 Interview Questions Part-2