The data flow task in SSIS (SQL Server Integration Services) sends data in series of buffers. How much data does one buffer hold? This is bounded by DefaultBufferMaxRows and DefaultBufferMaxSize, two Data Flow properties. They have default values of 10,000 and 10,485,760 (10 MB), respectively. That means, one buffer will contain either 10,000 rows or 10 MB of data, whichever is less.
You can adjust these two properties based on your scenario. Setting them to a higher value can boost performance, but only as long as all buffers fit in memory. In other words, no swapping please!
Implement Parallel Execution in SSIS
SQL Server Integration Services (SSIS) allows parallel execution in two different ways. These are controlled by two properties as outlined below.
The first one is MaxConcurrentExecutables, a property of the package. It defines how many tasks (executables) can run simultaneously. It defaults to -1 which is translated to the number of processors plus 2. Please note that if your box has hyperthreading turned on, it is the logical processor rather than the physically present processor that is counted.
Suppose we have a package with 3 Data Flow Tasks. Each task has 10 flows in the form of “OLE DB Source -> SQL Server Destination”.
Set MaxConcurrentExecutables to 3, then all 3 Data Flow Tasks will run simultaneously.
Now whether all 10 flows in each individual Data Flow Task get started concurrently is a different story. This is controlled by the second property: EngineThreads.
The EngineThreads is a property of the Data Flow Task that defines how many work threads the scheduler will create and run in parallel. Its default value is 5.
Again let’s use the above example.
If we set EngineThreads to 10 on all 3 Data Flow Tasks, then all the 30 flows will start off at once.
One thing we want to be clear about EngineThreads is that it governs both source threads (for source components) and work threads (for transformation and destination components). Source threads and work threads are both engine threads created by the Data Flow’s scheduler. So in the above example, a value of 10 for Engine Threads means up to 10 source threads and 10 work threads.
Multi tasking is a double-edge sword. In SSIS, we don’t affinitize the threads that we create to any of the processors. So if the number of threads exceeds the number of available processors, you might end up hurting throughput due to an excessive amount of context switches. Be cautious!
Set up OLE DB source to read from View efficiently
OLE DB source adapter is one of the most commonly used components in SSIS data flow task. In this article, we will discuss a very important performance observation about this adapter.
Use “SQL Command” to pull data from a view
OLE DB source adapter can be set up to work in either “SQL command” data access mode or “Table or view” data access mode. See the figure below for where to set Data Access Mode in OLE DB Source Editor.
In most cases, whether you use “SQL command” data access mode or “Table or view” data access mode does not make any difference performance wise. But when you are setting up OLE DB source adapter to read data from a view, the performance difference can be huge. In such an occasion, we suggest you set data access mode to “SQL command” and specify the command as, for example, “SELECT * FROM view_name”. In our in-house testing, we have seen “SQL command” data access mode runs about 17 times faster than “Table or view” data access mode in certain scenario.
If you are interested in knowing the cause for such a difference, read on.
In “Table or view” access mode, the OLE DB source adapter calls OpenRowset to get column metadata at Validate phase. The data returned from OpenRowset include more than just column metadata. Thus the adapter issues “SET ROWCOUNT 1” statement to retrieve column metadata information. “SET ROWCOUNT 1” causes an inefficient execution plan (i.e. Nested Loop) to be cached and later used in the subsequent Execute phase.
In “SQL command” access mode, the OLE DB source adapter calls “sp_prepare” to get column metadata at Validate phase, and “sp_execute” at Execute phase. The execution plan used at Execute phase is Hash Match which is more efficient than Nested Loop.
This article has described a performance tip when setting up OLE DB source adapter to read data from a view. We suggest user to set OLE DB source adapter in “SQL command” data access mode for better performance in such a case. We hope you will find this tip useful and helpful in your package design.
Set BLOBTempStoragePath and BufferTempStoragePath to Fast Drives
BLOBTempStoragePath and BufferTempStoragePath are two properties on Data Flow Task. They define where on the hard disk(s) to page buffer data to. BLOBs (text, next, image) are written to BLOBTempStoragePath. All other buffer data will be swapped to BufferTempStoragePath.
Now what could cause a buffer to swap? There are two possible causes. The first one is when a memory allocation fails. The second one is when Windows signals the low memory resource notification event. Both will trigger SSIS to reduce its working set. SSIS does so by moving buffer data to disk.
When BLOBTempStoragePath is not set, the paths as defined by the system variables TEMP and TMP will be used. The same rule applies to BufferTempStoragePath.
For better performance, we recommend both BLOBTempStoragePath and BufferTempStoragePath point to fast drives. We also suggest that the drives for BufferTempStoragePath and BlobTempStoragePath be on separate spindles in order to maximize I/O throughput.
Something about SSIS Performance Counters
SQL Server Integration Services provide a set of performance counters. Among them the following few are helpful when you tune or debug your package:
- Buffers in use
- Flat buffers in use
- Private buffers in use
- Buffers spooled
- Rows read
- Rows written
“Buffers in use”, “Flat buffers in use” and “Private buffers in use” are useful to discover leaks. During package execution time, you will see these counters fluctuating. But once the package finishes execution, their values should return to the same value as what they were before the execution. Otherwise, buffers are leaked. In occasions like that, please contact Microsoft PSS.
“Buffers spooled” has an initial value of 0. When it goes above 0, it indicates that the engine has started memory swapping. In this case to set Data Flow Task properties BLOBTempStoragePath and BufferTempStoragePath appropriately for maximal I/O bandwidth.
“Rows read” and “Rows written” show how many rows the entire Data Flow has processed. They give you an overall idea about the execution progress.