The teradata SAMPLE function is used to generate samples of data from a table or view.
It can be done in two ways.
It can be done in two ways.
- SAMPLE n - Will results a sample of n rows.
- If the number n is greater than the number of rows in the table, the sample will consist of the number of rows in the table.
SAMPLE n - where n is a decimal value less than 1.00 and greater than .00
The SAMPLE function allows sampling of data based on:
The SAMPLE function allows sampling of data based on:
- A percentage of a table.
- An actual number of rows
Example 1:
SELECT Employee_number
FROM Employee
SAMPLE 10;
SELECT Employee_number
FROM Employee
SAMPLE 10;
Result
Employee_number
---------------
1003
1023
1004
1015
801
1065
1022
1002
1018
1005
Example 2:
SELECT Employee_number
FROM Employee
SAMPLE .25
ORDER BY 1;
In the above example, 25% of the rows of the employee table are to be returned. The employee table has 26 rows.
26 * .25 = 6.50 = 7 rows in the sample
Result:
Employee_number
---------------
1003
1023
1001
1054
1004
1015
1012
Note:
Using SAMPLEID
SAMPLEID is an extension for Sample to get multiple set in a single query.
The SAMPLEID may be selected, used for ordering, or used as a column in a new table.
Example 1:
Lets consider department table has 9 rows and get three samples from the department table, one with 25% of the rows, another with 25% and a third with 50%.
SELECT department_number
,sampleid
FROM department
SAMPLE .25, .25, .50
ORDER BY sampleid;
Result:
department_number SampleId
----------------- -----------
301 1
403 1
402 2
201 2
100 3
501 3
302 3
401 3
600 3
Note that all 9 of the 9 rows of the department table are returned.
This is due to the individual calculations.
Consider the following calculations
9 *.25 = 2.25 = 2
9 *.25 = 2.25 = 2
9 *.50 = 4.50 = 5
-----
9
Example 2
Get three samples from the department table, one with 27% of the rows, another with 35% and a third with 2%.
SELECT department_number
,SAMPLEID
FROM department
SAMPLE .27, .35, .02
ORDER BY SAMPLEID;
Result
department_number SampleId
----------------- -----------
402 1
403 1
100 2
302 2
401 2
Note:
Example 3
Get three samples from the department table, one with 3 rows, another with 5 and a third with 8.
SELECT department_number
,sampleid
FROM department
SAMPLE 3, 5, 8
ORDER BY sampleid;
*** Query completed. 9 rows found. 2 columns returned.
*** Warning: 7473 Requested sample is larger than table rows.
Result
department_number SampleId
----------------- -----------
501 1
402 1
403 1
100 2
302 2
301 2
401 2
201 2
600 3
Note:
---------------
1003
1023
1004
1015
801
1065
1022
1002
1018
1005
Example 2:
SELECT Employee_number
FROM Employee
SAMPLE .25
ORDER BY 1;
In the above example, 25% of the rows of the employee table are to be returned. The employee table has 26 rows.
26 * .25 = 6.50 = 7 rows in the sample
Result:
Employee_number
---------------
1003
1023
1001
1054
1004
1015
1012
Note:
- 7 rows out of 26 are returned.
- Fractional results greater than .4999 generate an added row.
- 25% of 26 = 6.5 which rounds to 7.
Using SAMPLEID
SAMPLEID is an extension for Sample to get multiple set in a single query.
The SAMPLEID may be selected, used for ordering, or used as a column in a new table.
Example 1:
Lets consider department table has 9 rows and get three samples from the department table, one with 25% of the rows, another with 25% and a third with 50%.
SELECT department_number
,sampleid
FROM department
SAMPLE .25, .25, .50
ORDER BY sampleid;
Result:
department_number SampleId
----------------- -----------
301 1
403 1
402 2
201 2
100 3
501 3
302 3
401 3
600 3
Note that all 9 of the 9 rows of the department table are returned.
This is due to the individual calculations.
Consider the following calculations
9 *.25 = 2.25 = 2
9 *.25 = 2.25 = 2
9 *.50 = 4.50 = 5
-----
9
Example 2
Get three samples from the department table, one with 27% of the rows, another with 35% and a third with 2%.
SELECT department_number
,SAMPLEID
FROM department
SAMPLE .27, .35, .02
ORDER BY SAMPLEID;
Result
department_number SampleId
----------------- -----------
402 1
403 1
100 2
302 2
401 2
Note:
- The first two samples are able to return rows.
- The last sample is too small to return a row.
Example 3
Get three samples from the department table, one with 3 rows, another with 5 and a third with 8.
SELECT department_number
,sampleid
FROM department
SAMPLE 3, 5, 8
ORDER BY sampleid;
*** Query completed. 9 rows found. 2 columns returned.
*** Warning: 7473 Requested sample is larger than table rows.
Result
department_number SampleId
----------------- -----------
501 1
402 1
403 1
100 2
302 2
301 2
401 2
201 2
600 3
Note:
- Because the rows are not repeated to different sample sets, the supply of rows is exhausted before the third set can be completed.
- This results in a warning that there were not enough rows to populate all samples as requested.
- This warning is seen in the BTEQ environment, but not in ODBC.
- The third sample gets the only remaining row.