September 2008


All files located in %temp% and %tmp% folders of the Controller, VuGen, and LoadGenerator machines can be safely removed after a script has been saved, the scenario run, and the results collated, analyzed, and saved.

 

The following are temporary files created by LoadRunner:

 

The Brr_xxxx folder contains raw data collected by the Controller during runtime. These data will be collated and copied to the result folder after the scenario completes and are safe to delete if collation completes.

 

The lr5tmpdirLNL.xxx folder contains runtime information of the load test and is safe to delete.

 

The unq1Ln.xxx folder is another runtime information folder; it contains information about monitors and is safe to delete.

 

Res is the default result folder, you should move it to a different location for safe keeping. You can also specify the result folder by selecting Result -> Result Settings in the Controller.

 

Aes_log.txt, drv_log.txt, ResmonLog.txt, ResmonLogBrief.txt, and LoadRunner_agent_startupYAR.xxx.log are all log files and are needed only for debugging purposes in case of a problem.

 

Noname*, vuac*, vxml*, and unq* are all script-related files and can be removed once the script has been saved.

 

JET1858.tmp is an MS Jet temp file used during the collation process. It is safe to delete once collation finishes.

There is a simple way to examine, whether the load test which we conducted is to a real extent or not.

 

The factors which we have to capture is

 

Hits per second,

Avg Response time,

Think Time

No of users simulated.

 

Thus substituting all the gathered data in the formula,

 

Total Number of users executed = (Response Time + Think Time) * Request per second

 

Note:

Thus while calulating using the formula, the result value should equal or should be greater than the total number of users executed.

 

From this deal we can able to find out, how much users the servers can handle. If the result value goes below the total numbers executed, then the servers are not capable to server these many users.

 

The Page size also can be verified using,

 

Page size = Throughput / Hits per second.

Abstract
This chapter discusses the most important counters to monitor (in both Windows NT and SQL Server), consider time intervals, and recommend a long-term strategy for monitoring performance. A table summarizes which counters to consider for particular problems.



PERFORMANCE MONITOR 

Performance Monitor collects data about different counters, such as memory use. Performance Monitor can show you data in graphical format in real time, or you can save the data to log files. Pay particular attention to the discussion of log files in this section, because you will also use log files for long-term performance monitoring, which we will discuss in detail later. Working with a log file can be difficult to learn on your own because the options are not intuitive and are hidden on different screens.

You can choose between two Performance Monitors: one in the Administrative Tools group and one in the SQL Server group. They are the same basic program, but you need to run the one in the SQL Server group because it automatically loads the SQL Server-related counters. You run this version of the program with the following command: Perfmon.exe C:MssqlBinn Sqlctrs.pmc, where the .pmc file is the Performance Monitor counter file that contains the SQL counters. You can write applications that provide your own counters, and you can modify the new system stored procedures called sp_user_counter1 through sp_user_counter10 and track them, too.

When you run the program from the SQL Server group, the window appears. Because you started the program that includes the set of SQL counters, five counters appear at the bottom of the window when Performance Monitor starts. The five counters are

·     Cache Hit Ratio

·     I/O — Transactions per second

·     I/O — Page Reads per second

·     I/O Single Page Writes per second

·     User Connections

These counters will be explained in more detail later, but first, let’s learn how to navigate in Performance Monitor.

Changing Menu Options
The first set of buttons on the tool bar at the top of the window corresponds to the four views of the monitor: chart, alert, log, and report views. You can get to the same options using the View menu.

The menu options change depending upon which view is currently active. Without going into too much detail about the View menu options, their basic purpose is to let you set up and save standard viewing templates for each of the four views.

Understanding Counters
Windows NT lets you watch the performance of the system by “counting” the activity associated with any of its objects. Examples of objects in Windows NT are processors, disk drives, and processes. Each object has specific counters associated with it; for example, the % User Time counter is associated with a CPU or processor to designate what percent of the CPU is taken up by user programs (as opposed to system processes). This chapter gives you enough information to help you choose the right counters at the right time.

SQL Server includes many predefined counters, most of which you aren’t likely to use except in special cases. It can be difficult to know which counters are the basic ones to watch. If you have chosen the SQL Server Performance Monitor, several counters have been set up as default counters, such as Cache Hit Ratio and User Connections. You can create your own defaults by creating a .pmc file.

The counters are hooks into the operating system and other programs, like SQL Server, that have been built into the software to let Performance Monitor get data. Data collection is performed efficiently so that the additional load on the system is minimized. Windows NT needs most of the information gathered for managing memory, processes, and threads, and Performance Monitor is a good program to display the results. 

On the tool bar, the button next to the four view buttons at the top of the window is a big plus sign, which you use to add counters to monitor. Click the + button, and the window will appear. The first field, Computer, has a search button at the end of the field. You can click on this field to bring up a list of all computers in your domain and choose a computer from the list, or you can type the name of a server you want to monitor. To monitor other servers, you need Windows NT administrative privileges on them.

In the next field, Object, you choose an object to monitor. The default is Processor, and the default counter shown in the field below is % Processor Time. The box on the right is the Instance. Any particular resource may have more than one instance ; that is, more than one of that particular resource — in this case, processors — may exist. This computer has only one processor (CPU) because the instance in the box is 0. Instance 3 refers to the fourth CPU. 

From the fields along the bottom, you can pick the color, scale, line width, and line style of the information that will be displayed about the counter you are adding. These options let you choose a different look for each counter you add to the window. The only display choice that may need explanation is scale. The scale field is a multiplier that helps you fit the values on the screen in the range you have set on the y-axis, which by default is 0 –100.

After you choose the Object, Counter, and Instance you want to monitor and determine how you want the information to appear, click Add. The counter is added at the bottom of the list on the main window and starts graphing the next time your data is refreshed.

If you click the Explain button, a brief explanation of the counter you specified will appear. Sometimes, though, it uses abbreviations and acronyms that require further research, unless you are a Windows NT internals guru.

Setting Up Alerts
An alert is the warning the computer sends you when a resource such as memory or the network becomes a bottleneck. When an alert occurs, it is written to a log file, along with the date and time it occurred. The log file is a circular file, allowing at most 1,000 entries before it starts overwriting the oldest alerts. The alert can also be written to the Windows NT event log. 

To add a new alert, click the second button on the toolbar, then click the + button. The dialog box will appear. Choose the counters you want to create alerts for, then click Add. The example on the screen will create an alert when the Cache Hit Ratio drops below 85 percent.

Notice the Run Program option in the lower right portion of the screen. You can use it to execute a program when the alert occurs. For example, you can choose SQL Server — Log in the Object field, Log Space Used (%) for the Counter, and the database you want to monitor from the Instance list. When the log file for that database gets above 90 percent, you can execute a batch file that runs an ISQL script to dump the transaction log. In this way you can reduce your chances of running out of log space.

Starting Log Files
Learning how to establish log files is very important, because log files are a critical part of the long-term strategy recommended later in this chapter. It can be a bit confusing, so let’s go through the steps.

1.  Click the third button on the toolbar — View Output Log File Status. Notice that the Log File entry at the top is blank, the status is closed, the file size is zero, and the log interval is 15.00 (seconds).

2.  Click +, and the list of objects will appear. Select the ones you want to add to the log and click Add. If you hold down the Ctrl key while selecting, you can choose more than one counter, and holding down Shift lets you highlight all the items in a range. All counters in the objects you pick will be tracked in the log file. We will discuss what to monitor later.

3.  Now we need to specify a log file. From the Options menu, choose Log. The dialog box will appear.

4.  This dialog box looks almost like the standard file dialog box, but it has two very important additions. At the bottom of the screen, the Update Time section shows the refresh interval. For short-term tracking, keep it at 15 seconds. For long-term tracking, set it at 300 seconds (5 minutes). The other important difference between this dialog box and the standard file name dialog box is the Start Log button. Nothing happens until you click this button to start collecting data. Once you do, the text of the button will change to Stop Log.

Type a log file name in the File Name box at the top. Then click Start Log.

5.  Click OK to close this dialog box, then minimize the window and let the log run for a while.

6.  Maximize the window and click the Stop Log button. Then switch to the Chart view by clicking the first button on the toolbar.

7.  From the Options menu, choose Data From. Select the log file you named earlier. You can then choose the counters you want to view from the log.

The best part about using log files is that you can view a few counters at a time to avoid overcrowding the window. You can also mix and match the counters you want to analyze at the same time. This feature is important because many of the counters depend on other counters.

Special Note: The log file does not do anything until you click the Start Log button in the Log Options dialog box (also available by choosing Log in the Options menu).


Reports
The fourth button on the toolbar, the Reports button, lets you print customized reports of the data collected in your log file. Experiment with the available reports when you have a chance; we won’t cover this option here.

TIME INTERVALS

The default refresh interval for Performance Monitor is one second. Every second, you get new information about your system’s performance. This interval is good for a very short-term examination of the system, but it can be a drain on the server. A five-second interval causes much less overhead, probably in the neighborhood of five percent extra activity. However, for long-term monitoring, 5 seconds produces a very large log file.

Setting the interval to 5 minutes creates a reasonable size log file, but this large an interval can mask performance peaks. However, because each entry in the log file stores the minimum, average, and maximum values for each counter, or aspect of SQL Server you want to monitor, you can discover the peaks with a little extra analysis. Five minutes is a good setting for long-term logging. You can always fire up another copy of Performance Monitor and look at one- to five-second intervals if you want a short-term peek into the system’s performance. 

To determine the amount of drain on the system from Performance Monitor, shut down all the services and start the Monitor again. Add the CPU usage and watch it for about 30 seconds at the default interval of one second. Then change the interval to 0.1 seconds. Your CPU usage will jump dramatically. One odd observation is that the effect of changing from one second to 0.1 seconds is different on different computers, and it is different between Windows NT 4.0 and Windows NT 3.51. For example, when changing the interval on two 133 MHz computers — a laptop and a tower box — the tower machine has the better performance at the shorter interval, showing about 55 percent utilization, while the laptop shows about 60 percent utilization.

Special Note: The faster your refresh option, the more the drain on the system. The default one-second refresh interval creates less than 5 percent overhead on a single-processor machine. For multiprocessor machines, the overhead is negligible. With the refresh interval set to 0.01 seconds, Performance Monitor takes about 60 percent of the resources. At 10 seconds per refresh, the drain is almost too small to measure, even with a lot of counters turned on.

WHAT TO MONITOR

Now that you know how to use the program, let’s get to the section you’ve been waiting for: How do you know what to monitor? Of the hundreds of Windows NT counters and 50 or so SQL counters, how do you choose? Should you monitor everything? How long should you monitor the system?

Monitoring performance helps you perform two related tasks: identifying bottlenecks and planning for your future hardware and software needs (capacity planning). Learning about the important counters will help identify potential bottlenecks. The strategy section later in this chapter will help you put together a good plan for creating a general monitoring strategy.

What do you want to monitor? Everything! Well, monitoring everything may be a good idea for a short period, but the results will show that many of the counters are always at or near zero; monitoring them all the time may be a waste of time and resources. You need to establish a baseline for your system. This baseline lets you know what results are normal and what results indicate a problem. Once you establish a baseline, you don’t need to track everything.

The key categories to monitor can be split into two major sections: Windows NT categories and SQL Server categories. Categories in this sense are groups of objects that contain counters. 

·     Windows NT

o    Memory

o    Processor

o    Disk I/O

o    Network

·     SQL Server

o    Cache

o    Disk I/O

o    Log

o    Locks

o    Users

o    Other Predefined Counters

o    User-Defined Counters

When monitoring both categories of data, look for trends of high and low activity. For example, particular times during the day, certain days of the week, or certain weeks of the month might show more activity than others. After you identify highs and lows, try to redistribute the workload. These peaks and valleys are especially good to know when something new is added to the schedule. If the peak loads are causing problems, identify which things can be scheduled at a later time when the system is not so busy. Knowing the load patterns is also helpful when problems occur, so that you can re-run a particular job or report when the load is low.

Get to know your users — find out which reports they need first thing in the morning. Perhaps you can schedule these reports to run at night in a batch mode, instead of having the user starting them during a busy time.

Monitoring Windows NT
The purpose of monitoring the Windows NT categories is to answer one of two questions: “What resource is my bottleneck?” or “Do I see any upward usage trends that tell me what resource I might run low on first?” SQL Server 6.5 introduced several highwater markers, such as Max Tempdb space used, which make it easier to identify potential long-term problems

 

Memory
The Memory: Pages/sec counter is the number of pages read or written to the disk when the system can’t find the page in memory. This page management process is referred to as paging. If the average value for this counter is five, you need to tune the system. If this value is 10 or more, put tuning the server high on your priority list. Before SQL Server 6.0, the value for this counter was an important flag to tell you whether memory was the bottleneck. Now, with SQL Server’s parallel read-ahead feature, this counter will give you only an indication of how busy the read-ahead manager is. However, we will discuss other counters that are better at tracking the read-ahead manager. In other words, this counter may have been one of the most significant counters to track in the past, and it still is on machines without SQL Server, but better ones are available to track memory. 

The Memory: Available Bytes counter displays the amount of free physical memory. If the value for this counter is consistently less than 10 percent of your total memory, paging is probably occurring. You have too much memory allocated to SQL Server and not enough to Windows NT.

Processor
Before we start talking about the counters in the processor category, it is important to know that Windows NT assigns certain responsibilities to certain processors if you have four or more CPUs. Processor 0 is the default CPU for the I/O subsystem. Network Interface Cards (NIC) are assigned to the remaining CPUs, starting from the highest-numbered CPU. If you have four processors and one NIC, that card is assigned Processor 3. The next NIC gets Processor 2. Windows NT does a good job of spreading out processor use. You can also set which processors SQL Server uses. See Chapter 16, “Performance Tuning,” particularly the notes on the Affinity Mask, for more information about allocating processors.

You can monitor each processor individually or all the processors together. For monitoring individual processors, use the Processor: % Process Time counter. This counter lets you see which processors are the busiest.

A better counter to monitor over the long term is the System: % Total Processor Time counter, which groups all the processors to tell you the average percentage of time that all processors were busy executing non-idle threads.

Who (or what) is consuming the CPU time? Is it the users, system interrupts, or other system processes? The Processor: Interrupts/sec counter will tell you if it is the system interrupts. A value of more than 1,000 indicates that you should get better network cards, disk controllers, or both. If the Processor: % Privileged Time is greater than 20 percent (of the total processor time) and Processor: % User Time is consistently less than 80 percent, then SQL Server is probably generating excessive I/O requests to the system. If your machine is not a dedicated SQL Server machine, make it so. If none of these situations is occurring, user processes are consuming the CPU. We will look at how to monitor user processes when we consider SQL Server-specific counters in the next section.

Disk I/O
As discussed in Chapter 16, “Performance Tuning,” having many smaller drives is better than having one large drive for SQL Server machines. Let’s say that you need 4 GB of disk space to support your application with SQL Server. Buy four 1-GB drives instead of one 4-GB drive. Even though the seek time is faster on the larger drive, you will still get a tremendous performance improvement by spreading files, tables, and logs among more than one drive.

Special Note: The single best performance increase on a SQL Server box comes from spreading I/O among multiple drives (adding memory is a close second).


Monitor the disk counters to see whether the I/O subsystem is the bottleneck, and if it is, to determine which disk is the culprit. The problem may be the disk controller board. The first thing to know about monitoring disk I/O is that to get accurate readings from the Physical Disk counters, you must go to a command prompt window and type DISKPERF -y, then reboot. This procedure turns on the operating system hooks into the disk subsystem. However, this setup also causes a small performance decrease of 3 to 5 percent, so you want to turn this on only periodically and only for a short period. Use the Diskperf -n command to turn it off, then restart your system. 

Track Physical Disk: % Disk Time to see how much time each disk is busy servicing I/O, including time spent waiting in the disk driver queue. If this counter is near 100 percent on a consistent basis, then the physical disk is the bottleneck. Do you rush out and buy another disk? Perhaps that is the best strategy if the other drives are also busy, but you have other options. You may get more benefit from buying another controller and splitting the I/O load between the different controllers. Find out what files or SQL Server tables reside on that disk, and move the busy ones to another drive. If the bottleneck is the system drive, split the virtual memory swap file to another drive, or move the whole file to a less busy drive. You should already have split the swap file, unless you only have one drive (which is very silly on a SQL Server machine).

LogicalDisk: Disk Queue Length and PhysicalDisk: Disk Queue Length can reveal whether particular drives are too busy. These counters track how many requests are waiting in line for the disk to become available. Values of less than 2 are good; if the value is any higher, it’s too high.

Network
Redirector: Read Bytes Network/Sec gives the actual rate at which bytes are being read from the network. Dividing this value by the value for the Redirector: Bytes Received/Sec counter gives the efficiency with which the bytes are being processed.

If this ratio is 1:1, your system is processing network packets as fast as it gets them. If this ratio is below 0.8, then the network packets are coming in faster than your system can process them. To correct this problem on a multiprocessor system, use the Affinity Mask and SMP Concurrency options in the SQL Configuration dialog box to allocate the last processor to the network card, and don’t let SQL Server use that processor. For example, if you have four CPUs, set the Affinity Mask to 7 (binary 0111) and SMP Concurrency to 3. This setup gives three CPUs to SQL Server and the fourth processor to the network card, which Windows NT assigns to that processor by default. If I/O is also a problem, set the Affinity Mask to 6 (binary 0110) and SMP Concurrency to 2, because Windows NT assigns the I/O subsystem to the first processor by default.

Monitoring SQL Server
The questions to ask yourself when monitoring the SQL Server categories are “Do I have the optimal configuration values for SQL Server?” and “Who is consistently using the most resources?”

If any of the counters considered in this section indicate a problem, the problem is somewhere related to SQL Server. If the problem is I/O, memory, CPU, or locks, you can dig deeper and find out who the culprits are. However, if you are using a long-term logging strategy for monitoring, you must monitor every session to be sure you have the necessary historical data when you want to see what was happening at a particular time.

If you are watching the monitor when a problem occurs, go to the SQL Server-Users object and turn on the counter for all instances. The instances in this case are the sessions currently logged on. You can see the login ID and the session number. If you see one or more sessions causing the problem, you can spy on them to find the last command sent. Go to the Enterprise Manager, click the Current Activity button on the toolbar, and double-click the line in the display corresponding to the session number. You will see the last command received from the session. To trace commands in more depth, use the SQLTrace utility that is new with version 6.5. (See Chapter 3, “Administrative and Programming Tools,” for details.)

The five main categories of SQL Server counters to monitor are cache, disk I/O, log, locks, and users. We will consider each of these categories separately as well as a mix of other important predefined counters. The final part of this section discusses the new user-defined counters.

Cache
To monitor your cache, watch SQL Server — Cache Hit Ratio. It monitors the rate at which the system finds pages in memory without having to go to disk. The cache hit ratio is the number of logical reads divided by the total of logical plus physical reads. If the value for this counter is consistently less than 80 percent, you should allocate more memory to SQL Server, buy more system memory, or both. However, before you buy more memory, you can try changing the read-ahead configuration options. Also look at the discussion of free buffers in the next chapter to determine whether the number of free buffers is approaching zero. Changing the free buffers configuration parameter may increase the cache hit ratio.

To find out if you have configured SQL Server properly, you should monitor SQL Server-Procedure Cache: Max Procedure Cache Used (%). If this counter approaches or exceeds 90 percent during normal usage, increase the procedure cache in the SQL Server configuration options. If the maximum cache used is less than 50 percent, you can decrease the configuration value and give more memory to the data cache. Rumor has it that SQL Server 7.0 will have a floating-point number for the procedure cache configuration parameter so that you can give the procedure cache less than 1 percent of your SQL Server memory. For a super server with gigabytes of memory, even 1 percent is too much for procedure cache.

If a 2K data page has been swapped to the Windows NT virtual memory file and read in again later, SQL Server still counts the page as already in memory for the purposes of the Cache Hit Ratio counter. Therefore, a system bogged down by heavy swapping to virtual memory could still show a good cache hit ratio. To find out if your system is in this category, monitor the Memory: Page Faults/Sec counter.

The Memory: Page Faults/Sec counter watches the number of times a page was fetched from virtual memory, meaning that the page had been swapped to the Windows NT swap file. It also adds to the counter the number of pages shared by other processes. This value can be high while system services, including SQL Server, are starting up. If it is consistently high, you may have given too much memory to SQL Server. The network and operating system may not have enough memory to operate efficiently.

Warning: This counter is a strange one to figure out. Running this counter on four different types of machines gave widely different results. To try to get a baseline value, we turned off all services, including SQL Server, unplugged the boxes from the network, and ran Performance Monitor with only the Memory: Page Faults/Sec counter turned on. The lowest measurement of page faults per second was from the system we least expected — a 50 MHz 486 with 16 MB of memory and one disk drive. It settled in at about five to seven page faults per second. The DEC Alpha with 4 processors, 10 GB RAID 5 striping on 5 drives, and 256 MB of memory was up in the 35 to 40 page faults per second range. So was a similarly configured Compaq ProLiant. The laptop performed in the middle, at about 15 page faults per second. It is a 90 MHz Pentium with 1 disk drive and 40 MB of memory. All were running Microsoft Windows NT version 3.51 service pack 4. All services except Server and Workstation were turned off. Running the same experiment with Windows NT 4.0 service pack 1 showed approximately the same results, except that the page faults per second numbers ran consistently 10 percent less than in Windows NT 3.51.

The result of this experiment is that we can’t recommend a range to gauge the performance of your machine. The best you can do is turn off all services for a brief period to get a baseline measurement on your machine, then use this value as a guide for your regular usage.

Disk I/O
Several counters measure how busy your disk drives are and which disk drives are the busiest. Remember that for any I/O measurements to be effective, you must run the Windows NT Diskperf -y command and reboot the system.

Even though the SQL Server: I/O Transactions Per Second counter is a bit misleading, it is still good, especially for capacity planning. This counter measures the number of Transact-SQL batches processed since the last refresh period. You should not use these results against any standard TPC benchmark tests that give results in transactions per second — it is not referring to a Begin/Commit transaction, just to batches of commands. Watch this number over a span of several months, because an increase in this counter can indicate that the use of SQL Server is growing.

The SQL Server: I/O — Lazy Writes/Sec counter monitors the number of pages per second that the lazy writer is flushing to disk. The lazy writer is the background Windows NT process that takes the data from cached memory and writes it to disk, although sometimes a lazy writer is hardware that reads the cached memory on the disk drive and is managed by the disk controller. A sustained high rate of lazy writes per second could indicate any of three possible problems:

·     the Recovery Interval configuration parameter is too short, causing many checkpoints

·     too little memory is available for page caching

·     the Free Buffers parameter is set too low

Normally this rate is zero until the least-recently used (LRU) threshold is reached. LRU is the indicator by which memory is released for use by other processes. Buying more memory may be the best solution if the configuration parameters seem to be in line for your server size.

The SQL Server: I/O Outstanding Reads counter and the I/O Outstanding Writes counter measure the number of physical reads and writes pending. These counters are similar to the PhysicalDisk: Disk Queue Length counter. A high value for this counter for a sustained period may point to the disk drives as a bottleneck. Adding memory to the data cache and tuning the read-ahead parameters can decrease the physical reads.

The SQL Server: I/O Page Reads per Second counter is the number of pages not found in SQL Server data cache, which indicates physical reads of data pages from disk. This value does not count pages that are read from the Windows NT virtual memory disk file. There is no way to watch only the logical page reads per second. According to sources in the SQL development team, counters for logical pages reads are hidden in a structure that is not available in this version of SQL Server. However, you can figure out the logical page reads per second by taking the total page reads per second and subtracting the physical page reads per second.

 

You should occasionally turn on the I/O Single Page Writes counter. A lot of single page writes means you need to tune SQL Server, because it is writing single pages to disk instead of its normal block of pages. Most writes consist of an entire extent (eight pages) and are performed at a checkpoint. The lazywriter handles all the writing of an extent at a time. When SQL is forced to hunt for free pages, it starts finding and writing the LRU pages to disk — one page at a time. A high number of single page writes means that SQL Server does not have enough memory to keep a normal amount of pages in data cache. Your choices are to give more memory to SQL Server by taking memory away from the static buffers, by decreasing the procedure cache, or decreasing the amount of memory allocated to Windows NT.

Log
Tie the SQL Server — Log: Log space used (%) counter to an alert. When the value goes over 80 percent, send a message to the administrator and to the Windows NT event log. When it goes over 90 percent, dump the transaction log to a disk file (not the diskdump device), which will back up the log and truncate it. You want to track this counter for all your application databases, for Tempdb, and for the Distribution database if you are running replication.

Locks
To check out locking, turn on the SQL Server Locks: Total Locks and Total Blocking Locks counters. If you notice a period of heavy locking, turn on some of the other lock counters to get a better breakdown of the problem. The value for Total Blocking Locks should be zero or close to it as often as possible.

One counter to turn on to see if you have configured the system correctly is SQL Server Licensing: Max Client Count. Once you have established that your licensing choice is correct, turn it off. You should turn it back on occasionally to check the connections. If you do exceed the license count, you will know because users will be denied access.

Users
When you suspect that one particular user is the cause of any performance problems, turn on the counters in the Users section. However, with many users on the system, it is difficult to guess which counters to use, and it is difficult to turn on all counters for all sessions. One shortcut is to go into the Current Activity screen of the SQL Enterprise Manager and look at the locks in the Locks tab as well as the changes in CPU and Disk I/O activity in the Detail tab.

Monitor the SQL Server — Users: CPU Time counter for each user. Users for whom this counter returns high values may use inefficient queries. If the query appears reasonable, a high value may indicate an indexing problem or poor database design. Use Showplan to determine if the database’s indexes are optimal. Look for wide tables (long row sizes), which indicate a non-normalized database. Wide tables and inefficient indexes can cause more I/O than table scans.

Other Predefined Counters
A new counter in SQL Server 6.5, SQL Server: Max Tempdb Space Used, indicates how well you have estimated the size of Tempdb. If the value for this counter is very small, you know you have overestimated the size of Tempdb. Be sure to watch this counter frequently, especially during the busiest times and when your nightly jobs run. If it approaches the size of Tempdb, then you should probably increase Tempdb’s size.

Compare SQL Server: NET — Network Reads/Sec to SQL Server: NET — Bytes Received/Sec (or Network Writes/Sec compared to Bytes Transmitted/Sec). If the SQL Server network counters are significantly lower than your server counter, your server is busy processing network packets for applications other than SQL Server. This reading indicates that you are using the server for uses other than SQL Server, perhaps as a primary or backup domain controller, or as a print server, file server, Internet server, or mail server. To get the best performance, make this server a dedicated SQL server and put all the other services on another box.

If you are using replication, you should focus on the publishing machine. You should monitor the distribution machine and the subscriber as well, but the publisher will show the first signs of trouble. Turn on all counters in the SQL Server Replication-Publishing DB object. The three counters will tell you how many transactions are held in the log waiting to be replicated, how many milliseconds each transaction is taking to replicate, and how many transactions per second are being replicated.

User-Defined Counters
Last but not least, you can define counters. The user-defined counters are in the SQL Server User-Defined Counters object in the Master database. The 10 counters correspond to 10 new stored procedures called sp_User_Counter1 through sp_User_Counter10. These stored procedures are the only system stored procedures you should change. If you look at the code of the procedure, they all perform a Select 0, which, when tracked on Performance Monitor, draws a flat line at the bottom of the screen. Replace the Select 0 with a Select statement that returns one number; an integer is preferable, but float, real, and decimal numbers also work. These queries should be quick, not ones that take minutes to run.

Please note that these counters are different from the user counters mentioned earlier, which track the specific activity of a particular person logged in to SQL Server.

The current version of Performance Monitor contains a bug. If User Counter 1 contains an error, none of the 10 counters will show up in Performance Monitor. However, this bug is not the only reason that you might not see these user defined counters in Performance Monitor. The Probe login account, added when you install SQL Server, must have both Select and Execute permission on these 10 stored procedures for them to appear.

It would be nice to be able to change the names of these stored procedures so you could more easily remember what you are tracking. Maybe this feature will be included in version 7.0.

Here is a trick: Suppose you want to count the number of transactions you have in a table. You could put the following statement in sp_User_Counter1:

SELECT COUNT(*) FROM MyDatabase.dbo.MyTable

If MyTable had 40 million rows, the stored procedure would take a lot of time to execute, even though it scans the smallest index to get an accurate count. Instead, you could get an approximate number by using the following command:

SELECT rows FROM myDatabase.dbo.sysindexes WHERE id=OBJECT_ID(‘MyTable’) AND indid in (0,1).

This way is much faster, even though SQL Server does not keep the value in sysindexes up-to-date. Sometimes the counters tracked in sysindexes get out of sync with the actual table, and the only way to get them updated accurately is with DBCC. But most of the time the value in sysindexes is accurate enough.

LONG-TERM PERFORMANCE MONITORING

The concept behind a good long-term strategy for monitoring performance is simple to explain: Use log files to track as many items as you can without affecting performance. We break this discussion into three sections: establishing a baseline, monitoring performance over the long term, and tracking problems.

Establishing a Baseline
First, go to a command prompt and turn on the disk counters using the command Diskperf -y, then reboot. Then establish a new log file, click the + button, add all the options, and start the logging process. Choosing all the options tracks every instance of every counter in every object. You are tracking a lot of information, especially with the physical disk counters turned on.

Run Performance Monitor with this setup for a week; if you wish, you can manually stop and restart the log file every night so that each day is contained in a different log file. These measurements become your baseline; all your trend measurements will be based on this baseline. This method is not a perfect way to establish a baseline if you have very many special activities taking place on your server that week. But you may never experience a “typical” week, and it’s better to get some baseline measurement than wait.

We also recommend that you start a performance notebook. In this notebook, keep a page where you log special activities and events. For instance, an entry in your log might say, “Ran a special query for the big boss to show what a Cartesian product between two million-record tables does to the system.” In your performance notebook, be sure to record changes to the hardware, along with dates and times. You should also schedule actions like backups and transaction log dumps regularly so that when you look at system performance for one night last week, you do not have to wonder whether the backup was running.

We recommend that you run your long-term monitoring from another computer on the network. This way, you are not skewing the results by running it on the server you are trying to monitor. Also, avoid running Perfmon.exe to capture the long-term baseline, because someone must be logged on for it to run, and leaving an administrator machine logged on for long time periods is not a good idea. Instead, run the command-line version of Performance Monitor, called Monitor.exe. It is essentially the same program as Perfmon.exe without the screens. All output can be directed to the log files. To further simplify your life, get Srvany.exe from the Windows NT resource kit and make Monitor.exe into a Windows NT service. This way you can manage Monitor.exe like any other network service.

Periodically, perhaps once every six months, repeat this baseline process with all the counters turned on. Then compare your baselines to establish a trend.

Monitoring Performance over the Long Term
Once you have established your baseline, start another series of log files for your everyday use. First, turn off the physical disk counters with the Diskperf -n command from a command prompt and reboot the system. You can still track everything else if you want to because turning off the physical disk counters reduces the performance problems caused by monitoring. However, it is not necessary to track all the counters. We recommend you track the following objects:

·     Logical Disk

·     Memory

·     Paging File

·     Processor

·     Server

·     SQL Server

·     SQL Server — Replication (only if you are running replication)

·     SQL Server — Locks

·     SQL Server — Log

·     SQL Server — Procedure Cache

·     SQL Server — Users

·     System


Tracking Problems
When you experience performance problems, leave your Performance Monitor running with the log file so you continue to collect long-term data. Then start Performance Monitor again to track the particular problem. Turn on whatever counters you need to look at, using this chapter as a guide for the key counters to monitor in the disk, memory, network, and processors categories.

Start with the high-level counters — look for the words “total” or “percent” (or the % sign). When one of these counters indicates a problem, you usually have the option of watching counters that give you more detail. Learn which counters in different sections are related to each other. The relationships can tell you a lot. For example, the I/O Transactions Per Second counter in the SQL Server section is closely related to the CPU % counter in the processor section. If the number of I/O transactions per second goes up, so does the processor usage.

Concentrate on finding out which resource is causing the problem. Is it the system or a user process? Is it Windows NT or SQL Server? Before you purchase more hardware, try to find a configuration option related to the problem. Don’t hesitate to change hardware configuration or move data to different servers to balance the work among the available resources.

For specific examples of tuning performance, see Chapter 16, “Performance Tuning.”

Special Note: Use log files to track as many items as you can without affecting performance.


Monitoring with Transact-SQL
You can also use three Transact-SQL commands to do your own monitoring:

·     DBCC MEMUSAGE

·     DBCC SQLPERF — cumulative from the start of SQL server; use iostats, lru stats, and netstats parameters

·     DBCC PROCCACHE — six values used by Performance Monitor to monitor procedure cache

The output from these commands can be inserted into a table for long-term tracking and customized reporting. Tracking the MEMUSAGE output calls for some tricky programming because different sections have different output formats. The other two commands are more straightforward.

The example below shows how to capture the DBCC PROCCACHE output. This command displays the same six values that you can display in Performance Monitor to watch the procedure cache usage in SQL Server.

CREATE TABLE PerfTracking (date_added datetime default (getdate()), num_proc_buffs int, num_proc_buffs_used int, num_proc_buffs_active int, proc_cache_size int, proc_cache_used int, proc_cache_active int) go INSERT PerfTracking (num_proc_buffs, num_proc_buffs_used, num_proc_buffs_active,    proc_cache_size, proc_cache_used, proc_cache_active) EXEC (“dbcc proccache”) go

After running this command, you can use any SQL Server-compliant report writer or graphing program to create your own fancy graphs. 

COUNTERS: A SUMMARY

The list below is a quick reference to the information about counters we’ve presented in this chapter. After the performance questions you may ask, we list related counters.

Is CPU the bottleneck?

·     system: % total processor time

·     system: processor queue length

What is SQL Server’s contribution to CPU usage?

·     SQL Server: CPU Time (all instances)

·     process: % Processor Time (SQL Server)

Is memory the bottleneck?

·     memory: page faults/sec (pages not in working set)

·     memory: pages/sec (physical page faults)

·     memory: cache faults/sec

What is SQL Server’s contribution to memory usage?

·     SQL Server: cache hit ratio

·     SQL Server: RA (all read ahead counters)

·     process: working set (SQL Server)

Is disk the bottleneck? (Remember that disk counters must be enabled for a true picture.)

·     physical disk: % disk time

·     physical disk: avg disk queue length

·     disk counters: monitor logical disk counters to see which disks are getting the most activity

What is SQL Server’s contribution to disk usage?

·     SQL Server-users: physical I/O (all instances)

·     SQL Server: I/O log writes/sec

·     SQL Server: I/O batch writes/sec

·     SQL Server: I/O single-page writes

Is the network the bottleneck?

·     server: bytes received/sec

·     server: bytes transmitted/sec

What is SQL Server’s contribution to network usage?

·     SQL Server: NET — Network reads/sec

·     SQL Server: NET — Network writes/sec

Did I make Tempdb the right size?

·     SQL Server: Max Tempdb space used (MB)

Is the procedure cache configured properly? (The highwater marks for the percentages are more important than the actual values.)

·     Max Procedure buffers active %

·     Max Procedure buffers used %

·     Max Procedure cache active %

·     Max Procedure cache used %


SUMMARY

SQL Server 6.5 gives you new configuration and tuning options. It also adds new counters to help you track the use of SQL Server on your system. Use Performance Monitor to see if your system is configured properly. Performance Monitor is one of the best tools you can use to identify current bottlenecks and prevent future problems.

 

          When problems occur with a SharePoint Portal Server, your first instinct is probably to look at the Event Viewer. While the Event Viewer can certainly provide some useful information, you might be surprised to learn that another good source of troubleshooting information is the Performance Monitor. Microsoft has included several Performance Monitor counters that are specific to SharePoint. In this article, I’ll review a few of the many SharePoint performance Monitor counters that are useful in troubleshooting situations.

 

Microsoft GathererHeartbeats

This service displays the number of heartbeats that have occurred since the SharePoint services were started. By default, a heartbeat occurs every ten seconds.  If you see that the heartbeats aren’t increasing, it means that the SharePoint services have either stopped or are unresponsive.

 

Microsoft GathererReason To Back Off

 Check this counter to see if its value is higher than zero. A non zero number means that document crawling has been paused because of insufficient system resources. Usually a non zero value indicates that the system is low on memory or that the current disk IO is too high to process requests.

 

Microsoft Search Indexer CatalogsNumber of Documents

 Another useful counter is the Number of Documents counter. The Number of Documents counter reports the number of documents in the catalogue. This is useful for seeing how heavily the SharePoint server is being used. For example, you can tell over time if the number of indexed documents remains fairly static, goes up, or even goes down. This is a really good counter to check out if you start having disk space problems.

 

Microsoft Search Indexer CatalogsIndex Size

 This counter reports the size of the document index in megabytes. If you watch this counter over time, you can track the rate at which the index is growing, and can use that information to predict how quickly your server may run out of hard disk space.

 

Microsoft GathererDocuments Delayed Retry

 This counter displays a value. Normally, the value should be zero, but a non zero value means that SharePoint is having problems accessing the Web storage system. These failed access attempts will keep retrying until successful. Therefore, if the Documents Delayed Retry value momentarily goes above zero and then goes back down, it means that the system was simply busy at the time of the original request, but later was able to catch up with the demand being placed on it. If however, the number continues to steadily rise, then it indicates a Web storage system failure.

 

Microsoft GathererActive Queue Length

This counter indicates the number of documents that are waiting for a robot thread to process them. Normally, this number should be zero. If this number is not zero, it means that the server is falling behind, although this may be a temporary condition caused by an especially busy period. If this number is anything other than zero, then all available threads should be filtering. If the number is above zero, but all possible threads aren’t filtering, it usually means that the SharePoint services need to be stopped and restarted.

 

          Monitoring performance regularly, organizations can recognize trends as they develop and prevent performance problems. This will also help to decide when to upgrade the hardware and whether upgrades are improving the server’s performance.

 

 

General Server Performance Counters

 

Following is a list of general performance counters that should be monitored for all SharePoint Portal Server based servers.

 

Performance Object

Counter

Threshold

Description

Processor

Percent processor time_total

80 to 85 percent averaged over three intervals

The total percentage of processor usage for a server.

Network interface

Bytes total per second_network interface

50 percent of the available network interface bandwidth — for example, a 100-MB network interface running at 50,000 KB per second

The rate at which bytes are sent and received over each network adapter.

Logical disk

Percent idle time_ (drives C:,D:, and so on)

20 percent over idle time_.

Reports the percentage of time during the sample interval that the disk was idle. If this value is very low, the logical disk is very busy.

Paging file

Percent usage

Above 70 percent

Review this value in conjunction with memory — available megabytes and page faults per second — to understand paging activity on the server.

Memory

Available MBs

128 MB — assuming 2 GB of RAM as prescribed on servers

The amount of physical memory, in MBs, immediately available for allocation to a process or for system use on the server.

Memory

Page faults per second

20

A high rate of page faults indicates a lack of physical memory.

System

Processor queue length

The number of CPUs + 1

Exceeding the threshold indicates that the processors are not fast.

ASP.NET applications

Requests per second_total

Through ongoing monitoring, trends begin to emerge that equate requests per second with CPU consumption

The number of requests executed per second; this roughly equates to the number of HTTP pages per second.

Table 1: General Windows Server 2003 Performance Counters

 

 

SharePoint Portal Server Counters

 

A SharePoint Portal Server supports many topologies that allow organizations to set up different server farm environments from small server farms to larger load balance server farm environments according to an organization’s requirements. This will introduce various numbers of Web Servers, Search Server and Index Servers in to the SharePoint farm environment. Some server farms can contain more than one SharePoint component bundled up together. For example, Web and Search components together in a single server.

The following table contains a list of performance counters that are relevant for all of the SharePoint components.

Object

Counter

Threshold

Description

Processor

Percent processor time_total

80 to 85 percent averaged over three intervals

The total percentage of processor usage for a server.

ASP.NET applications

Requests per second_total

Through ongoing monitoring, trends begin to emerge that equate requests per second with CPU consumption

The number of requests executed per second; this roughly equates to the number of HTTP pages per second.

Web service

Get requests per second_total, individual portal, or IIS Virtual root

Through ongoing monitoring, trends begin to emerge that equate get requests per second with CPU consumption

Generally speaking, this is the rate at which clients are requesting information from the front-end Web servers.

Search

Query rate

10 per second

The number of queries posted to the server per second; keep in mind that in the medium server farm configuration, the front-end servers are doing much more than searching.

Search

Succeeded queries

This counter should be used mostly for troubleshooting search problems

The number of queries that produce successful searches. Monitor this counter along with the failed queries counter if you need to troubleshoot search problems.

Search catalogs

Number of documents

Microsoft has tested up to 5 million documents per content index

The total number of documents in the catalog.

Search catalogs

Queries rate_(index names or all instances)

Indicates which catalogues are searched most often by users

The number of queries posted to indexes per second; in conjunction with other performance data, this can help determine if your index configuration can be optimized.

Table 2: SharePoint Portal Server Performance counters

 

 

Front End Web Servers in a Server Farm

 

The Processor, ASP.NET application and Web Service performance objects determine when to add a new dedicated Front End Web server to the farm environment. When above three performance objects counters continuously approach or exceed any of the thresholds; organizations should consider adding additional Front End Web servers to the server farm. At the same time make sure you check the general counters in Table 1 to better understand the Windows Server 2003 server performance.

The “percent processor time” counter is the main indicator of Front End Web server performance. The organization should gather this information every 15 minutes to determine the health of the Front End Web server. If the organization is having high performance expectations, this counter should record a lower average threshold.

Having a heavily used Windows SharePoint Services Site can add extensive performance impact on Front End Web servers. Organizations should identify these kinds of WSS sites and consider scaling out one or more those sites to a separate server farm to reduce the performance impact on Front End Web servers.

 

Front End Web / Search Servers in a Server Farm

Table 2 lists the important performance counter for a Front End Web server with SharePoint Search component in a server farm. If the “search catalog” size exceeds 5 million, Microsoft recommends adding an additional Front End / Searching server to the server farm or separating the indexing by moving the Indexing server to a dedicated Indexing server. This will reduce the propagation and crawling time.

The amount of CPU time taken for the search operations is another key factor of monitoring the health of the server farm. When the “query rate” counter exceeds the recommended threshold and search operations consumes more than 40 to 50 percent of the overall server capacity, organizations should consider scaling out toward a Larger Server Farm. If percent processor time is exceeding the threshold but the query rate is below the threshold and front end web traffic is bringing the server to its maximum capacity, the organization should consider adding an additional Front End Web/Search server to the Server Farm.

 

Index Servers

The main performance thresholds for index servers are 80 to 85 percent processor time and 85 percent memory availability. Adding additional indexing servers to a farm environment is not totally done looking at performance counters. It is more determined by the function of indexing and propagation time. When it take few hours for an incremental crawl and propagation, and the organization needs this to be done in half hour, consider adding another indexing server.

Performance of an indexing server also depends on the location of the crawled content and the processing power of the indexing server. Crawl time may depend on content location, network performance, type of documents in the index, frequency and type of content index updates.

Microsoft recommends no more than four index servers per large server farm.

 

Microsoft SQL Server

The present processor time counter is the leading indicator for Microsoft SQL Server performance. Examine percent processor time in conjunction with overall memory capacity, network traffic and input/output subsystem capacity will provide an accurate health report on your dedicated SQL Server on the SharePoint farm.

 

Internet Information Services (IIS) Logs

IIS logs include detailed information such as who has visited sites and what was viewed, in terms of total visits, page views and trends over time. Careful analysis of IIS logging data will help organizations to discover how much traffic is going to portal sites, how much is going to WSS sites and search operations.

It will also help organizations to identify bottlenecks and performance issues. Most importantly this will help to identify the most used Windows SharePoint Services Sites in the portal and its impact on the Front End Web server. According to this information, organizations can decide when to move one or more site collections to a dedicated Windows SharePoint Services server farm to reduce the performance impact on the Front End Web server.

 

Setting Up IIS Log Files

Set up logs files to be created daily for each virtual server. It is good practice to locate these logs separate from the drive that holds the operating system data. The local administrators group and the IIS_WPG group should have access to the log file directory.

IIS logging is enabled by default for each virtual server. The user can check whether IIS logging is enabled by following these steps.

 

·         Click Start -> Run

·         Then in the Run window type “inetmgr” to open the Internet Information Services (IIS) Manager

·         Select the virtual directory, right click and select the Properties menu option

 

The recommended log file format for IIS logs is the W3C format. This is the default log file format and enables you to select the fields you want to include in the log file. By limiting the number of fields you can simplify the process of identifying the data and save server resources, CPU usage. User can follow these steps to select the logging properties for their virtual server.

 

·         In the Properties window of the virtual server select the Properties button next to “Active log format”

·         The Logging Properties window will open as displayed in Figure 4

·         This window enables the user to select the appropriate schedule that will suit their organization (daily is recommended)

·         This screen also displays the log file directory and file name.

·         Select the Advanced (or Extended Properties) option from the top tab section

 

The following window will open allowing you to select from all the available fields:

IIS logs files can also be logged in a database that complies with Open Database Connectivity (ODBC), such as Microsoft SQL Server database as displayed in figure 6.

The information being logged to the SQL Server then can be reviewed by generating reports using the SQL Server Query Analyser.

 

 

Conclusion

 

Organizations can use the Windows 2003 System Monitor, Performance logs and alerts to monitor their SharePoint Portal Server environment. The performance counter details were taken from Microsoft SharePoint Products and Technologies Resource Kit.

 

          Planning is a critical step in Application Performance Review/Testing. A Team defines Application Performance Review/Testing as a methodology that comprised of 4 major types of testing to improve client (user) response time, and server performance.  These include both single user and simulated multiple (virtual) user testing. The key is to look at the application performance from user perspective first rather than the server resources. Application users do not care how many concurrent user can the application support or how much CPU is being utilized on the server, they only want the response time to be fast from performance perspective. For example, when they search for a product in a web site, they want the relevant search results to display in millisecond.  The 4 major types of testing are as follows:

  • Application Performance Review (also known as Application Performance Walkthrough, or Application Performance Assessment)
  • Application Network Analysis
  • Application Load Test (also known as Application Stress Test)
  • Application Scalability Test (also known as Application Capacity Test)

Planning should be done for each type of testing. In high level, there are 4 main elements in planning that need to be considered:

  • Performance Goals
  • Performance Metrics
  • Load Profile
  • Performance Test Plan

In minimum, performance goals are to determine user response time, testing for server capacity, and benchmark results in order to compare performance gains for future versions of the application.  The 3 categories within the performance goals are (1) Response Time Acceptability Goals (need to be realistic and achievable), (2) Throughput and Concurrent User Goals, and (3) Future Performance Growth Requirements.

Performance metrics relate to server resource usage or threshold value such as: CPUs utilization (X%), memory consumption (~XMB), disk and network activities, and processing delays (code nature), etc.

Load profile focus on user activities (scenarios) such as: ensure transactions occurred more frequently comprise a larger proportion of the test data and scripts for load test (e.g. 30 browsers : 1 buyer), apply appropriate user think (sleep) times (random realistic time preferred), conduct research to determine usage patterns (peak and duration), and ensure that the servers and network in the test environment are loaded with the required background tasks.  Remember to take into account as many of the parameters when developing the load profile for application performance test.  A single parameter may only affect the test results by a few percent but several parameters may add up and have a significant impact on test results.

At the very least, performance test plan should comprised of Application Overview, Architecture Overview, High-level Goals (test objectives), Performance Test Process, Performance Test Scripts (script and scenario description), and Test Environment details.

Planning properly and early with consideration to the 4 main elements will ensure a successful performance review/test that will help in attaining high-performing application.

 

Status Code Meaning 

 

1xx CODES – INFORMATION ONLY

100 – Continue; part of request received

101 – OK to switch protocols

 

 

2xx CODES – SUCCESS

200 – OK; all requested info returned

201 – Created; request filled

202 – Accepted; request received

203 – Source unknown; info came from 3rd party

204 – No new content; nothing to send back

205 – Reset content; OK to clear form

206 – Request only partially filled

 

 

3xx CODES – REDIRECTION

300 – Header for 3xx codes

301 – Moved permanently; use new URL

302 – Moved temporarily; use same URL

303 – Redirected

304 – Not modified since; use cached copy

305 – Use proxy; URL is provided

 

 

4xx CODES – FAILURE

400 – Did not understand request; try again

401 – Authorization required; needs password

402 – Payment required; needs payment data

403 – Request refused; may not give reason

404 – Not found; not sure of reason (typo, etc.)

406 – Content type not acceptable to request

407 – Browser must authenticate itself

408 – Timed out; send request again

409 – Update conflict; explanation provided

410 – Not found; resource permanently gone

411 – Content length missing in request

412 – Conditions on request failed

413 – Request too long to process

414 – Resource address too long to process

415 – Unsupported media type; bad format

 

 

5xx CODES – SERVER ERRORS

500 – Internal server error

501 – Server cannot fill request

502 – Server cannot process gateway request

503 – Server overloaded or service over limits

504 – Gateway or proxy server timed out

505 – HTTP version not supported in server

On Web Server / Application Server:

 

Processor\% Processor Time

Processor\% User Time

Processor\% Privileged Time

Processor\% Idle Time

 

These counters will give you % Processor Utilization for n number of concurrent users. Usually the ISV’s that I have worked with have multiple web servers with software network load balancing. They say that the avg CPU % is <50% but there might be times (peak times) where this might go up. If you want to see how many concurrent users your application supports for a particular hardware platform you can look @ this counter.

 

 

PhysicalDisk\%Disk Read Time

PhysicalDisk\%Disk Write Time

PhysicalDisk\Avg Disk queue Length

 

These counters to make sure there is not much logging etc that is happening either from your application (by mistake: Yes, I have seen this happening where when we were doing performance load testing, ISV’s code mistakenly logging when it should not). Also IIS logging (by default it is on). I was working with one ISV where in the peak times, their CPU% on web server was going beyond 95% but on an average for a whole day it was only ~25%. So when I looked in to the web server, IIS log (full log) was on by default. I recommended them to turn this off and we saw their peak time performance (response/sec) almost improved by 18-20%. So it is important to check this counter.

 

.NET CLR Memory\% Time in GC – This is the #1 counter to look at to see if GC is a possible issue in your application. If the %Time in GC is very low (< ~10%) then GC is not an issue. But incase if this counter is >25 or 30% then definitely this is an area that you would want to look into.

 

.NET CLR Memory\Gen”n”Collections (0, 1 and 2 all 3 counters) – If %Time in GC is high then this is the next set of counters that we want to look into. The good healthy ratio between gen2:gen1 is 1:10. But if this ratio is ~1:1 or 1:2 etc then looking @ allocation pattern and object survival would be next step to look into (Gen 1 promoted bytes might be another counter you want to look into). Why is this ratio? Since .NET GC is a generational one, when a gen “n” is collected then all gen “n-1″ will also be collected. i.e when a gen2 collection happens it is gen1, gen0 + LOH. So if the ratio of Gen2:gen1 is 1:1 that means that all the collections (gen1) are because of gen2. So if Gen2:gen1:gen0 is 1:1:1 then all the collections are gen2 collections which is very bad as gen2 collection means looking into entire heap. So @ this time, looking @ allocation graph and object survival statistics would definitely help. The CLR Profiler tool from Microsoft® would be the best tool @ this point which will help you solve these issues. Another question some folks keep asking me “I am doing lot of gen0 collections / sec”. I think I have a serious GC problem. My answer to this question is, don’t even look @ collections and in fact doing lot of Gen0 collections is a good thing (as opposed to doing gen1 or gen2). There are lots of temporary objects getting created and you are running out of gen0 budget and so end up doing these collections. Always look @ % Time in GC as the first counter. If it is very high then start looking into the second level counters such as collections.

 

.NET CLR LocksAndThreads\Contention Rate/sec – This counter is basically used to see how much contention you have in your application. Lower the value better. This counter has to be looked in conjunction with CPU utilization. For ex: Let’s say you are increasing the concurrent users and you would expect the CPU utilization in your web/app server should go up and so is requests/sec. But you don’t see increase in CPU utilization and this counter keep going up every time you add more concurrent users with no increase in requests/sec. This definitely means that you have contention issues. One of more data sources are shared and as the # of concurrent users increase so are # of threads and thus increase contentions. At this point to really identify you can use SOS tool from Microsoft (I will be blogging in detail on using these SOS tool to identify contentions). Also looking @ another counter (System\ Context Switches/sec) will help. If this value is high there is high contention because of which CPU is switching threads very fast.

 

.NET CLR Exceptions\Exceps Thrown /sec: This is another important counter to look at. If this value is high then this can have performance implications. Note that some code paths such as Response.ReDirect always throw exception so careful consideration should be done. If this value is high, it is important to look @ log (assuming that application for exception has some kind of log in debug mode etc) or using a debugger to catch and see why and where these exceptions are thrown and minimizing or eliminating the unnecessary ones can help improve the performance.

 

ASP.NET Applications\Pipeline Instance Count: This is another important counter too look at. This counter tells the number of active request pipeline instances for a specified ASP.NET application. Since only one execution thread will run in one pipeline instance, this gives the maximum number of concurrent requests that are processed by the application. The lower the value is better. You would see that in the warm up phase of the application, this value fluctuates as the threadpool is optimizing the # of thread/s that is optimal for giving you the best performance and this should be fairly constant assuming constant load coming. If you see significant fluctuations in this counter then it is better to look @ determining the best possible value by tuning machine.config file.

 

ASP.NET Applications\Requests/sec: Self explanatory. Total # of requests executed per second

 

ASP.NET Applications\Requests in Application Queue: This counter should be very low. Zero would be ideal. If the value is high then it indicates that all these are in queue waiting to be processed. This you might see when you are trying to increase the # of concurrent users (in a typical performance load testing) to determine how many concurrent users your application can support and you give higher value. This might also happen in the peak periods when burst of requests come and so monitoring this counter is very useful to take further actions for good sustained performance of ASP.NET web applications.

 

ASP.NET Worker Process Restarts: This is to see if w3wp.exe was crashed or shutdown and the restart happened. This should be 0 in majority cases if you see >0 then it is recommended to debug more the causes by looking into event monitor to see if an AV happened.

 

Hope you have got good idea about counters to track (and why and what information they give) on web or application server. Now we can go into database layer. I am not really a data base person but can give some information on the SQL counters that we track or look into.

 

Database layer: Locks\Lock Requests/sec:

 

Locks\Lock wait time (ms):

Locks\Number of deadlocks/sec:

 

These are some of the first set of counters we use to see how much locking is being done in database. Lower the value of these counters the better. If you see very high CPU % for the database occasionally then it would be advised to use SQL Profiler tool to look at which query is taking longer time. I was working on performance load testing with an ISV and we saw that with just ~150-200 concurrent users the database utilization was shooting up to 80%. We started a SQL profiler and saw 1 particular query was taking a very long time. We went ahead modified that to give junk instead of actually going to the table and returning the value (this was basically some news items (top 3) which will be displayed on to the page). This alone reduced the db to <10%. Later lack of indexing and real non efficient query was cause of this. So even 1 stored procedure can create significant problems. As far as the locks are concerned, sp_who2 or sp_lock system stored procedures can give you lot of information on specific locks which are causing issues and you will be able to fix them.

 

That is it for perfmon. Hope this is detail enough information which will help while looking @ n-tier .NET application. Please let me know if you need any specific information or some topic that you want me to write and any comments you have on these blogs.

 

What is SharePoint?

SharePoint is an enterprise collaboration portal, a product from Microsoft called Microsoft Office SharePoint Server (MOSS) 2007 that can be configured to run Intranet, Extranet and Internet sites.  MOSS 2007 allows people, teams and expertise to connect and collaborate in a single platform. A SharePoint Enterprise Collaboration Portal is composed of both SharePoint Portal and Windows SharePoint Services (WSS), with SharePoint being built upon WSS.  WSS is typically used by small teams, projects and companies.  SharePoint Server is designed for individuals, teams and projects within a

medium to large company wide enterprise portal

Why SharePoint?

SharePoint solves four main problems in an Enterprise Environment.

 

As companies grow so does the amount of their files.  It soon becomes difficult to keep track of the multiplying documents and their locations.  SharePoint overcomes this by allowing you to store and locate your files in a central site.  Files can also be located through enterprise wide searches with in SharePoint Portal.

 

Sharing work files through email is a cumbersome process.  SharePoint eliminates this by allowing files to be stored in one location, allowing easy access to all team members.

 

Today’s work occurs over multiple locations, whether it is in different countries, office locations, and separate departments or at your home office. SharePoint enables teams and individuals to connect and collaborate together regardless of where they are located.

 

It’s difficult and time consuming to create and maintain sites in a normal scenario.  Where as SharePoint allows business site owner/coordinator/administrator to create sites for use within their division or department, whether they are departmental sites, document libraries, meetings sites, survey sites, or discussion boards.

What is ICE?

Interactive Collaboration Environment is commonly known as ICE. This is the name of the project which is commissioned for setting up the required infrastructure and team to

run the Enterprise Collaboration Portal for the bank.

What is MOSS 2007?

Microsoft Office SharePoint Server (MOSS 2007) is an integrated suite of server capabilities that can help improve organizational effectiveness by providing comprehensive content management and enterprise search, accelerating shared business processes, and facilitating information sharing across boundaries for better business insight.

Microsoft Office SharePoint Server 2007 allows organizations to:

Manage content and processes

Improve business insight

Simplify internal and external collaboration

Empower IT to make a strategic impact

 

What is Collaboration?

Collaboration is a structured, recursive process where two or more people work together towards a common goal—typically an intellectual endeavor that is creative in nature—by sharing knowledge, learning and building consensus.Collaboration can mean “people working together,” or “people sharing information”.

What is My Site?

My Site is a personal site that gives you a central location to manage and store your documents, content, links, and contacts. My Site serves as a point of contact for other users in your organization to find information about you and your skills and interests.

What is Team Site?

Team Sites are Web sites created from a template and designed for team collaboration. They are hosted on the central SharePoint Server. Team Sites are a great way to coordinate team activities with document collaboration and storage

What is bespoke application?

Bespoke application means a Customized Business Application developed in SharePoint Platform using SharePoint Objects with embedded .Net Code

What is Site Collection?

Site Collection is a top-level SharePoint Parent site which contains lots of sites inside

i.e. when you create a site at the root of a Web Application, you are creating a site collection.

What is document Library?

Document library is a collection of files that you can share with team members on a Web based on Microsoft Windows SharePoint Services. For example, you can create a library of common documents for a project, and team members can use their Web browsers to

find the files, read them, and make comments/amendments.

What is List?

When you create a list in a SharePoint site, Windows SharePoint Services generates the list instance from a list definition contained in the site definition, from which the site was

created. List is basically a record that can be treated as a database table in SharePoint. It works as an excel data, where user can filter on the columns, in addition user can also upload a document to the list. List has got some limitation in the usage. Refer to the Limitation of MOSS 2007 table below.

What is Wikis?

Wikis are interconnected Collection of Wiki Pages. Wiki Page Libraries support pictures,

tables, hyperlinks and Wiki linking. Wiki is basically a scratch pad where multiple people can use this page as a white board to post their views. Also one can overwrite others. But versions are maintained by the system automatically. Used in general for brainstorming sessions across geographies

What is Blog?

A Blog is a site designed to help you share information. Blogs can be used as news sites, journals, diaries, team sites, and more. In business, Blogs can be used as a team communication tool. Keep team members in touch by providing a central place for links,

and relevant news.

 

What is Picture Library?

Picture library is a collection of Pictures that you can share with team members on a Web based on Microsoft Windows SharePoint Services. For example, you can create a library of common pictures for a project, and team members can use their Web browsers to find

the pictures, view them, and make comments.

What is Slide Library?

Slide library is a collection of slides from Microsoft office PowerPoint or a compatible application, that you can share with team members on a Web based on Microsoft

Windows SharePoint Services.

What is custom list?

Custom list is used to specify your own columns types.

What are the features available in SharePoint?

The features available in SharePoint 2007 are given below:

 

Debugger Feature – Adds a new menu item to the Site Actions menu that attaches the debugger.

 

Log Viewer – This is a Feature for viewing the Unified Logging Service (ULS) logs from within Central Admin

 

Print List – This Feature adds a “Print List” menu item to the “Actions” menu for every list in the site collection

 

Window Links – This Feature is a custom Links list that allows you to control all aspects of opening the link in a new window.

 

Presence Contact List - This is a contact list modified to show presence information as a drop-down associated with the e-mail field.

 

Task Alert – This Feature automatically sets up an alert for someone when a new task is assigned to them.

 

Toolbar Manager – This Feature allows you to selectively show and hide menu items on the standard list/library toolbar. The feature adds a new web part to the gallery that you can place on any list/library view page.

What is private and public in Mysite?

My Sites have public and private pages. Use your public page (called the “My Profile” page) to share files and information with others, and use your private page (called the

“My Home” page) to store files and information that only you can access.

What is Colleague Tracker?

The Colleagues Web Part helps you to keep track of events, such as whether your colleagues are in the office, in meetings, or on the telephone. You can also be notified when colleagues change departments or responsibilities, add documents to a SharePoint library, or have an anniversary or birthday. In addition, you can choose who appears on

your Colleagues list and organize your Colleagues list by groups

 

The simplest way to measure effective bandwidth is to determine the rate at which your server sends and receives data. Network bandwidth availability is a function of the organization’s network infrastructure. Network capacity is a function of the network cards and interfaces configured on the servers.

 

 

Network Interface: Bytes Total/sec : To determine if your network connection is creating a bottleneck, compare the Network Interface: Bytes Total/sec counter to the total bandwidth of your network adapter card. To allow headroom for spikes in traffic, you should usually be using no more than 50 percent of capacity. If this number is very close to the capacity of the connection, and processor and memory use are moderate, then the connection may well be a problem.

Web Service: Maximum Connections and Web Service: Total Connection Attempts : If you are running other services on the computer that also use the network connection, you should monitor the Web Service: Maximum Connections and Web Service: Total Connection Attempts counters to see if your Web server can use as much of the connection as it needs. Remember to compare these numbers to memory and processor usage figures so that you can be sure that the connection is the problem, not one of the other components.

To determine the throughput and current activity on a server’s network cards, you can check the following counters:

· Network\Bytes Received/sec
· Network\Bytes Sent/sec
· Network\Bytes Total/sec
· Network Current Bandwidth

If the total bytes per second value is more than 50 percent of the total capacity under average load conditions, your server might have problems under peak load conditions. You might want to ensure that operations that take a lot of network bandwidth, such as network backups, are performed on a separate interface card. Keep in mind that you should compare these values in conjunction with Physical Disk\% Disk Time and Processor\% Processor Time. If the disk time and processor time values are low but the network values are very high, there might be a capacity problem. Solve the problem by optimizing the network card settings or by adding an additional network card. Remember, planning is everything—it isn’t always as simply as inserting a card and plugging it into the network.

 

 

Next Page »