arsitektur komputer: SUPERSCALAR

Superscalar

From Test Dari Uji

Jump to: navigation , search Langsung ke: navigasi, cari
Image:Superscalarpipeline.png Gambar: Superscalarpipeline.png
Simple superscalar pipeline. Sederhana superscalar pipa. By fetching and dispatching two instructions at a time, a maximum of two instructions per cycle can be completed. Dengan mengambil dua dispatching dan petunjuk sekaligus, maksimal dua instruksi per siklus dapat diselesaikan.
Image:Processor board cray-2 hg.jpg Gambar: Processor papan cray-2 hg.jpg
Processor board of a CRAY T3e parallel computer with four superscalar Alpha processors Processor dewan yang CRAY T3e paralel komputer dengan empat superscalar Alpha prosesor
A superscalar CPU architecture implements a form of parallelism called Instruction-level parallelism within a single processor. A superscalar CPU arsitektur menerapkan suatu bentuk paralel dinamakan parallelism Instruksi-tingkat dalam satu prosesor. It thereby allows faster CPU throughput than would otherwise be possible at the same clock rate . Ia sehingga memungkinkan lebih cepat CPU Throughput daripada mungkin akan lain yang sama di tingkat jam. A superscalar processor executes more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to redundant functional units on the processor. Sebuah prosesor superscalar melaksanakan lebih dari satu instruksi selama satu jam secara bersamaan dengan siklus dispatching beberapa petunjuk ke membazir fungsional unit pada prosesor. Each functional unit is not a separate CPU core but an execution resource within a single CPU such as an arithmetic logic unit , a bit shifter, or a multiplier . Setiap unit fungsional tidak terpisah CPU inti, tetapi sebuah sumber daya eksekusi dalam satu CPU seperti aritmetika logis unit, sedikit Shifter, atau kelipatan.
While a superscalar CPU is typically also pipelined , they are two different performance enhancement techniques. Sementara superscalar CPU biasanya juga pipelined, mereka adalah dua teknik peningkatan kinerja yang berbeda. It is theoretically possible to have a non-pipelined superscalar CPU or a pipelined non-superscalar CPU. Hal ini secara teoritis memungkinkan untuk mendapatkan non-pipelined superscalar CPU atau pipelined non-superscalar CPU.
The superscalar technique is traditionally associated with several identifying characteristics. Teknik superscalar yang secara tradisional terkait dengan mengidentifikasi beberapa karakteristik. Note these are applied within a given CPU core. Catatan ini diterapkan dalam suatu CPU inti.
• Instructions are issued from a sequential instruction stream Petunjuk adalah sebuah berurut dikeluarkan dari instruksi streaming
• CPU hardware dynamically checks for data dependencies between instructions at run time (versus software checking at compile time ) CPU hardware secara dinamis untuk memeriksa dependensi data antara petunjuk berjalan di waktu (versus perangkat lunak memeriksa di waktu kompilasi)
• Accepts multiple instructions per clock cycle Menerima petunjuk beberapa jam per siklus

SEJARAH

Seymour Cray 's CDC 6600 from 1965 is often mentioned as the first superscalar design. Seymour Cray 's CDC 6600 dari 1965 sering disebut sebagai pertama superscalar desain. The Intel i960 CA (1988) and the AMD 29000 -series 29050 (1990) microprocessors were the first commercial single chip superscalar microprocessors. RISC CPUs like these brought the superscalar concept to micro computers because the RISC design results in a simple core, allowing straightforward instruction dispatch and the inclusion of multiple functional units (such as ALUs) on a single CPU in the constrained design rules of the time. Intel i960 CA (1988) dan AMD 29000-seri 29050 (1990) mikro yang komersial pertama chip tunggal superscalar mikro. RISC CPU seperti ini membawa konsep superscalar untuk mikro komputer RISC karena hasil desain yang sederhana inti, agar mudah instruksi dispatch dan keterlibatan beberapa unit fungsional (seperti ALUs) pada satu CPU dalam rancangan peraturan yang terpaksa waktu. This was the reason that RISC designs were faster than CISC designs through the 1980s and into the 1990s. Ini adalah alasan yang RISC desain yang lebih cepat dari CISC desain melalui ke dalam tahun 1980-an dan 1990-an.
Except for CPUs used in some battery -powered devices, essentially all general-purpose CPUs developed since about 1998 are superscalar. Kecuali untuk CPU yang digunakan dalam beberapa baterai-daya dari perangkat, pada dasarnya semua tujuan-CPU umum dikembangkan sejak 1998 adalah superscalar. Beginning with the " P6 " ( Pentium Pro and Pentium II ) implementation, Intel 's x86 architecture microprocessors have implemented a CISC instruction set on a superscalar RISC microarchitecture . Diawali dengan "P6" (Pentium Pro dan Pentium II) pelaksanaan, Intel 's arsitektur x86 mikro yang telah menerapkan CISC pada set instruksi RISC superscalar mikro. Complex instructions are internally translated to a RISC-like "micro-ops" RISC instruction set, allowing the processor to take advantage of the higher-performance underlying processor while remaining compatible with earlier Intel processors. Kompleks petunjuk yang diterjemahkan secara internal ke-RISC seperti "micro-ops" set instruksi RISC, prosesor yang memungkinkan untuk mengambil keuntungan dari performa yang lebih tinggi-prosesor yang melandasi tetap kompatibel dengan prosesor Intel sebelumnya.

FROM SCALAR to SUPERSCALAR

The simplest processors are scalar processor s. Mudah prosesor adalah prosesor skalar s. Each instruction executed by a scalar processor typically manipulates one or two data items at a time. Setiap instruksi dijalankan oleh prosesor skalar manipulates biasanya satu atau dua item data sekaligus. By contrast, each instruction executed by a vector processor operates simultaneously on many data items. Sebaliknya, setiap instruksi yang dijalankan oleh prosesor vector beroperasi secara simultan pada banyak data item. An analogy is the difference between scalar and vector arithmetic. Sebuah analogi adalah perbedaan antara skalar dan vector aritmatika. A superscalar processor is sort of a mixture of the two. Sebuah prosesor superscalar adalah jenis campuran ke dua. Each instruction processes one data item, but there are multiple redundant functional units within each CPU thus multiple instructions can be processing separate data items concurrently. Setiap instruksi proses data satu item, namun ada beberapa fungsional berlebihan dalam setiap unit CPU sehingga beberapa petunjuk dapat memproses data terpisah item serentak.
Superscalar CPU design emphasizes improving the instruction dispatcher accuracy, and allowing it to keep the multiple functional units in use at all times. Superscalar desain CPU menekankan peningkatan instruksi memberangkatkan akurasi, dan mengoperasikannya menyimpan beberapa unit fungsional digunakan setiap waktu. This has become increasingly important when the number of units increased. Hal ini menjadi semakin penting ketika jumlah unit meningkat. While early superscalar CPUs would have two ALU s and a single FPU , a modern design such as the PowerPC 970 includes four ALUs, two FPUs, and two SIMD units. Sementara awal superscalar CPU akan memiliki dua ALU s dan satu fpu, desain yang modern seperti PowerPC 970 mencakup empat ALUs, dua FPUs, dan dua SIMD unit. If the dispatcher is ineffective at keeping all of these units fed with instructions, the performance of the system will suffer. Jika memberangkatkan adalah menjaga tidak efektif di semua unit bosan dengan petunjuk, kinerja sistem akan menderita.
A superscalar processor usually sustains an execution rate in excess of one instruction per machine cycle . A superscalar prosesor biasanya sustains yang menilai pelaksanaan melebihi satu instruksi per siklus mesin. But merely processing multiple instructions concurrently does not make an architecture superscalar, since pipelined , multiprocessor or multi-core architectures also achieve that, but with different methods. Tetapi hanya memproses beberapa instruksi serentak tidak membuat sebuah arsitektur superscalar, sejak pipelined, multiprocessor atau multi-inti yang mencapai arsitektur juga, tetapi dengan metode yang berbeda.
In a superscalar CPU the dispatcher reads instructions from memory and decides which ones can be run in parallel, dispatching them to redundant functional units contained inside a single CPU. Dalam superscalar CPU yang memberangkatkan bacaan instruksi dari memori dan memutuskan mana yang dapat dijalankan secara paralel, dispatching mereka ke membazir unit fungsional yang terdapat di dalam satu CPU. Therefore a superscalar processor can be envisioned having multiple parallel pipelines, each of which is processing instructions simultaneously from a single instruction thread. Oleh karena itu prosesor superscalar dapat envisioned memiliki beberapa pipa paralel, yang masing-masing adalah instruksi pemrosesan secara simultan dari sebuah instruksi benang.

LIMOTIONS BATASAN

Available performance improvement from superscalar techniques is limited by two key areas: Tersedia dari peningkatan kinerja superscalar teknik dibatasi oleh dua bidang utama:
1. The degree of intrinsic parallelism in the instruction stream, ie limited amount of instruction-level parallelism, and Tingkat dasar paralel dalam instruksi streaming, yakni terbatasnya jumlah instruksi level parallelism, dan
2. The complexity and time cost of the dispatcher and associated dependency checking logic. Kompleksitas waktu dan biaya yang terkait memberangkatkan dan ketergantungan memeriksa logika.
Existing binary executable programs have varying degrees of intrinsic parallelism. Binari yang ada telah dijalankan program tahap hakiki paralel. In some cases instructions are not dependent on each other and can be executed simultaneously. Dalam beberapa kasus petunjuk tidak tergantung pada satu sama lain dan dapat dijalankan secara bersamaan. In other cases they are inter-dependent: one instruction impacts either resources or results of the other. Dalam kasus lain mereka yang antar-tergantung: satu instruksi dampak baik sumber daya atau hasil lainnya. The instructions a = b + c; d = e + f can be run in parallel because none of the results depend on other calculations. Petunjuk a = b + c; d = e + f dapat berjalan secara bersamaan karena tidak ada yang bergantung pada hasil perhitungan lain. However, the instructions a = b + c; d = a + f might not be runnable in parallel, depending on the order in which the instructions complete while they move through the units. Namun, petunjuk a = b + c; d = a + f mungkin tidak akan runnable secara paralel, tergantung pada urutan petunjuk yang lengkap saat mereka bergerak melalui unit.
When the number of simultaneously issued instructions increases, the cost of dependency checking increases extremely rapidly. Bila jumlah yang dikeluarkan secara simultan petunjuk meningkat, biaya memeriksa dependensi meningkat sangat pesat. This is exacerbated by the need to check dependencies at run time and at the CPU's clock rate. Hal ini diperparah oleh kebutuhan untuk memeriksa dependensi di waktu dan menjalankan di CPU jam menilai. This cost includes additional logic gates required to implement the checks, and time delays through those gates. Ini termasuk biaya tambahan gerbang logika diperlukan untuk melaksanakan pemeriksaan, dan waktu tunda yang melalui pintu. Research shows the gate cost in some cases may be

n^k

gates, and the delay cost

k^2 \log n

, where

n

is the number of instructions in the processor's instruction set, and

k

is the number of simultaneously dispatched instructions. Penelitian menunjukkan pintu gerbang biaya dalam beberapa kasus dapat

n ^ k pintu, dan biaya keterlambatan k ^ 2 \ log n, dimana n adalah jumlah instruksi pada prosesor's set instruksi, dan k adalah jumlah bersamaan menurunkan petunjuk. In mathematics, this is called a combinatoric problem involving permutation s. Dalam matematika, ini disebut sebagai combinatoric masalah melibatkan urutan s.

Even though the instruction stream may contain no inter-instruction dependencies, a superscalar CPU must nonetheless check for that possibility, since there is no assurance otherwise and failure to detect a dependency would produce incorrect results. Meski mungkin berisi instruksi streaming tidak antar-instruksi dependensi, superscalar CPU yang sebenarnya harus memeriksa bahwa kemungkinan, karena tidak ada jaminan lain dan kegagalan untuk mendeteksi suatu dependensi akan menghasilkan hasil yang salah.
No matter how advanced the semiconductor process or how fast the switching speed, this places a practical limit on how many instructions can be simultaneously dispatched. Tidak peduli bagaimana lanjutan proses yang semikonduktor atau cara cepat kecepatan yang berpindah, ini tempat yang praktis membatasi berapa petunjuk dapat menurunkan secara bersamaan. While process advances will allow ever greater numbers of functional units (eg, ALUs), the burden of checking instruction dependencies grows so rapidly that the achievable superscalar dispatch limit is fairly small. Meskipun proses kemajuan akan mengijinkan pernah lebih besar jumlah unit fungsional (misalnya, ALUs), beban instruksi memeriksa dependensi sehingga tumbuh pesat yang dicapai superscalar dispatch batas relatif kecil. -- likely on the order of five to six simultaneously dispatched instructions. - Kemungkinan pada urutan lima hingga enam secara bersamaan menurunkan petunjuk.
However even given infinitely fast dependency checking logic on an otherwise conventional superscalar CPU, if the instruction stream itself has many dependencies, this would also limit the possible speedup. Namun akhirnya tak terhingga cepat memeriksa ketergantungan pada logika konvensional yang lain superscalar CPU, jika instruksi streaming itu sendiri memiliki banyak dependensi, ini juga akan membatasi speedup mungkin. Thus the degree of intrinsic parallelism in the code stream forms a second limitation. Dengan demikian tingkat hakiki paralel dalam kode streaming bentuk kedua keterbatasan.

ALTERNATIVE

Collectively, these two limits drive investigation into alternative architectural performance increases such as Very Long Instruction Word (VLIW), Explicitly Parallel Instruction Computing (EPIC), simultaneous multithreading (SMT), and multi-core processors . Secara kolektif, kedua batas berkendara investigasi ke alternatif arsitektur meningkatkan kinerja seperti Long Sangat Instruksi Word (VLIW), secara paralel Instruksi Computing (EPIC), serentak multithreading (SMT), dan multi-core.
With VLIW, the burdensome task of dependency checking by hardware logic at run time is removed and delegated to the compiler . Explicitly Parallel Instruction Computing (EPIC) is like VLIW, with extra cache prefetching instructions. Dengan VLIW, tugas yang memberatkan ketergantungan memeriksa hardware dengan logika berjalan di waktu akan dihapus dan didelegasikan kepada compiler. Paralel secara eksplisit Instruksi Computing (EPIC) adalah seperti VLIW, dengan tambahan cache prefetching petunjuk.
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs. Serentak multithreading, sering disingkat sebagai SMT, adalah teknik untuk meningkatkan efisiensi superscalar CPU. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures. SMT izin dari beberapa rangkaian independen untuk pelaksanaan lebih baik memanfaatkan sumber daya yang disediakan oleh prosesor arsitektur modern.
Superscalar processors differ from multi-core processors in that the redundant functional units are not entire processors. Superscalar berbeda dari prosesor multi-core yang berlebihan di unit fungsional tidak seluruh prosesor. A single processor is composed of finer-grained functional units such as the ALU , integer multiplier , integer shifter, floating point unit , etc. There may be multiple versions of each functional unit to enable execution of many instructions in parallel. Satu prosesor terdiri dari halus-halus unit fungsional seperti ALU, bulat kelipatan, Shifter integer, floating point unit, dll Mungkin ada beberapa versi dari masing-masing unit fungsional untuk memungkinkan pelaksanaan banyak instruksi secara paralel. This differs from a multicore CPU that concurrently processes instructions from multiple threads, one thread per core. Ini berbeda dari multicore CPU yang serentak proses instruksi dari beberapa rangkaian, satu per benang inti. It also differs from a pipelined CPU , where the multiple instructions can concurrently be in various stages of execution, assembly-line fashion. Ia juga berbeda dari pipelined CPU, dimana beberapa instruksi dapat dilakukan serentak di berbagai tahapan pelaksanaan, assembly-line mode.
The various alternative techniques are not mutually exclusive—they can be (and frequently are) combined in a single processor. Berbagai alternatif teknik tidak saling eksklusif-mereka dapat (dan sering adalah) digabungkan dalam satu prosesor. Thus a multicore CPU is possible where each core is an independent processor containing multiple parallel pipelines, each pipeline being superscalar. Dengan demikian yang multicore CPU dapat di mana masing-masing inti adalah independen prosesor berisi beberapa pipa paralel, masing-masing pipa yang superscalar. Some processors also include vector capability. Beberapa prosesor juga termasuk vector kemampuan.

arsitektur komputer

Sabtu, 06 Desember 2008

SUPERSCALAR

Tidak ada komentar:

Pengikut

Arsip Blog

Mengenai Saya