Teradata Wiki: Teradata Architecture

Showing posts with label Teradata Architecture. Show all posts

Architecture

Teradata relies on three architectural components.

Teradata Architecture

Parsing Engine (PE)
The Parsing Engine (PE) is a component that interprets SQL requests, receives input records, and passes data. It sends the messages through the BYNET to the AMPs.

BYNETs
The BYNET act as message-passing layer. It decides which AMP should receive a message.

Access Module Processor (AMP)
The AMP is a virtual processor (vproc) designed to managing a portion of the entire database.
It performs all database management functions such as sorting, aggregating, and formatting data.
The AMP receives data from the PE, formats rows, and distributes them to the disk storage units it controls. The AMP also retrieves the rows requested by the Parsing Engine.

Disks
Disks are disk drives associated with an AMP that store the data rows.

VProcs

Teradata utilizes Parsing Engines (PE) and Access Module Processors (AMPs) in which they call VProcs. These refer to virtual processors or VProcs. Each AMP and PE lives inside the memory of a Node. There are anywhere between 25 and 35 VProcs inside each node.

Think of a Node as a giant Personal Computer. One that has 4 Intel Processors that work and act as if there were 8 Intel Processors. This node also has up to 16 GBs of memory.

The VProcs get loaded inside the Nodes memory and then we connect this node via the BYNET with all the other nodes and now we are part of the Teradata warehouse.

BYNET

The BYNET gets its name from the Banyan tree. The Banyan tree has the ability to continually plant new roots to grow forever. Likewise, the BYNET scales as the Teradata system grows in size.

All communication between PEs and AMPs is done via the BYNET.
When the PE dispatches the steps for the AMPs to perform, they are dispatched onto the BYNET.The messages are routed to the appropriate AMP(s).
Each AMP or PE can use one BYNET to retrieve communication and simultaneously accept messages using the other BYNET. Both BYNETs can be used to send a message or to receive a message!

BYNET

The BYNET has several unique features:

Scalable: As you add more nodes to the system, the overall network bandwidth scales linearly. This linear scalability means you can increase system size without performance penalty -- and sometimes even increase performance.

High performance: An MPP system typically has two BYNET networks (BYNET 0 and BYNET 1). Because both networks in a system are active, the system benefits from having full use of the aggregate bandwidth of both the networks.

Fault tolerant: Each network has multiple connection paths. If the BYNET detects an unusable path in either network, it will automatically reconfigure that network so all messages avoid the unusable path. Additionally, in the rare case that BYNET 0 cannot be reconfigured, hardware on BYNET 0 is disabled and messages are re-routed to BYNET 1.

Load balanced: Traffic is automatically and dynamically distributed between both BYNETs.

AMP

The PE passes the PLAN to the AMPs over the BYNET. The AMPs then retrieve the rows they own from their disks and pass it back to the PE over the BYNET.

When a table is first created each AMP creates a table header on their disk. Even though the table is empty the AMPs at least know the table name, the columns in the table, and any indexes the table.

When the table is loaded each AMP receives rows for that table that they and only they own. They carefully place the rows inside data blocks where they can easily be retrieved.

Now each AMP will own their own Table Header for the table and they will also own data blocks where they place the rows for that table

AMP

Teradata took every table and spread the rows across all the AMPs in the system and the birth of parallel processing happened.

The first picture on the opposite page never happens. The second picture below that is exactly the design behind Teradata.

Teradata NEVER lays out data like this

Teradata lays out data like this!

Parsing Engine

A Parsing Engine (PE) is a virtual processor (vproc). It is made up of the following software components:

Session Control
Parser
Optimizer
Dispatcher

Parsing Engine

Each PE can support a maximum of 120 sessions.

The Session Control component verifies the request for session authorization (user names and passwords), and either allows or disallows the request.

The Parser does the following:

Interprets the SQL statement received from the application.
Verifies SQL requests for the proper syntax and evaluates them semantically.
Consults the Data Dictionary to ensure that all objects exist and that the user has authority to access them.

The Optimizer is cost-based and develops the least expensive plan (in terms of time) to return the requested response set.The optimizer must know about system configuration, available units of parallelism (AMPs and PE's), and data demographics. The Teradata Optimizer is robust and intelligent. The optimizer enables Teradata to handle multiple complex, ad hoc queries efficiently. It is parallel-aware and cost-based and uses full look-ahead capability.

The Dispatcher controls the sequence in which the steps are executed and passes the steps received from the optimizer onto the BYNET for execution by the AMPs.

After the AMPs process the steps, the PE receives their responses over the BYNET.

The Dispatcher builds a response message and sends the message back to the user.

Social Icons

Pages

Facebook fan

Recent Updates

Interview Questions

Links

Total Pageviews

Architecture

VProcs

BYNET

AMP

Parsing Engine

Quick Links

Utilities

SQL