Tag Archive: perfmon



On Web Server / Application Server:

 

Processor\% Processor Time

Processor\% User Time

Processor\% Privileged Time

Processor\% Idle Time

 

These counters will give you % Processor Utilization for n number of concurrent users. Usually the ISV’s that I have worked with have multiple web servers with software network load balancing. They say that the avg CPU % is <50% but there might be times (peak times) where this might go up. If you want to see how many concurrent users your application supports for a particular hardware platform you can look @ this counter.

 

 

PhysicalDisk\%Disk Read Time

PhysicalDisk\%Disk Write Time

PhysicalDisk\Avg Disk queue Length

 

These counters to make sure there is not much logging etc that is happening either from your application (by mistake: Yes, I have seen this happening where when we were doing performance load testing, ISV’s code mistakenly logging when it should not). Also IIS logging (by default it is on). I was working with one ISV where in the peak times, their CPU% on web server was going beyond 95% but on an average for a whole day it was only ~25%. So when I looked in to the web server, IIS log (full log) was on by default. I recommended them to turn this off and we saw their peak time performance (response/sec) almost improved by 18-20%. So it is important to check this counter.

 

.NET CLR Memory\% Time in GC – This is the #1 counter to look at to see if GC is a possible issue in your application. If the %Time in GC is very low (< ~10%) then GC is not an issue. But incase if this counter is >25 or 30% then definitely this is an area that you would want to look into.

 

.NET CLR Memory\Gen”n”Collections (0, 1 and 2 all 3 counters) – If %Time in GC is high then this is the next set of counters that we want to look into. The good healthy ratio between gen2:gen1 is 1:10. But if this ratio is ~1:1 or 1:2 etc then looking @ allocation pattern and object survival would be next step to look into (Gen 1 promoted bytes might be another counter you want to look into). Why is this ratio? Since .NET GC is a generational one, when a gen “n” is collected then all gen “n-1” will also be collected. i.e when a gen2 collection happens it is gen1, gen0 + LOH. So if the ratio of Gen2:gen1 is 1:1 that means that all the collections (gen1) are because of gen2. So if Gen2:gen1:gen0 is 1:1:1 then all the collections are gen2 collections which is very bad as gen2 collection means looking into entire heap. So @ this time, looking @ allocation graph and object survival statistics would definitely help. The CLR Profiler tool from Microsoft® would be the best tool @ this point which will help you solve these issues. Another question some folks keep asking me “I am doing lot of gen0 collections / sec”. I think I have a serious GC problem. My answer to this question is, don’t even look @ collections and in fact doing lot of Gen0 collections is a good thing (as opposed to doing gen1 or gen2). There are lots of temporary objects getting created and you are running out of gen0 budget and so end up doing these collections. Always look @ % Time in GC as the first counter. If it is very high then start looking into the second level counters such as collections.

 

.NET CLR LocksAndThreads\Contention Rate/sec – This counter is basically used to see how much contention you have in your application. Lower the value better. This counter has to be looked in conjunction with CPU utilization. For ex: Let’s say you are increasing the concurrent users and you would expect the CPU utilization in your web/app server should go up and so is requests/sec. But you don’t see increase in CPU utilization and this counter keep going up every time you add more concurrent users with no increase in requests/sec. This definitely means that you have contention issues. One of more data sources are shared and as the # of concurrent users increase so are # of threads and thus increase contentions. At this point to really identify you can use SOS tool from Microsoft (I will be blogging in detail on using these SOS tool to identify contentions). Also looking @ another counter (System\ Context Switches/sec) will help. If this value is high there is high contention because of which CPU is switching threads very fast.

 

.NET CLR Exceptions\Exceps Thrown /sec: This is another important counter to look at. If this value is high then this can have performance implications. Note that some code paths such as Response.ReDirect always throw exception so careful consideration should be done. If this value is high, it is important to look @ log (assuming that application for exception has some kind of log in debug mode etc) or using a debugger to catch and see why and where these exceptions are thrown and minimizing or eliminating the unnecessary ones can help improve the performance.

 

ASP.NET Applications\Pipeline Instance Count: This is another important counter too look at. This counter tells the number of active request pipeline instances for a specified ASP.NET application. Since only one execution thread will run in one pipeline instance, this gives the maximum number of concurrent requests that are processed by the application. The lower the value is better. You would see that in the warm up phase of the application, this value fluctuates as the threadpool is optimizing the # of thread/s that is optimal for giving you the best performance and this should be fairly constant assuming constant load coming. If you see significant fluctuations in this counter then it is better to look @ determining the best possible value by tuning machine.config file.

 

ASP.NET Applications\Requests/sec: Self explanatory. Total # of requests executed per second

 

ASP.NET Applications\Requests in Application Queue: This counter should be very low. Zero would be ideal. If the value is high then it indicates that all these are in queue waiting to be processed. This you might see when you are trying to increase the # of concurrent users (in a typical performance load testing) to determine how many concurrent users your application can support and you give higher value. This might also happen in the peak periods when burst of requests come and so monitoring this counter is very useful to take further actions for good sustained performance of ASP.NET web applications.

 

ASP.NET Worker Process Restarts: This is to see if w3wp.exe was crashed or shutdown and the restart happened. This should be 0 in majority cases if you see >0 then it is recommended to debug more the causes by looking into event monitor to see if an AV happened.

 

Hope you have got good idea about counters to track (and why and what information they give) on web or application server. Now we can go into database layer. I am not really a data base person but can give some information on the SQL counters that we track or look into.

 

Database layer: Locks\Lock Requests/sec:

 

Locks\Lock wait time (ms):

Locks\Number of deadlocks/sec:

 

These are some of the first set of counters we use to see how much locking is being done in database. Lower the value of these counters the better. If you see very high CPU % for the database occasionally then it would be advised to use SQL Profiler tool to look at which query is taking longer time. I was working on performance load testing with an ISV and we saw that with just ~150-200 concurrent users the database utilization was shooting up to 80%. We started a SQL profiler and saw 1 particular query was taking a very long time. We went ahead modified that to give junk instead of actually going to the table and returning the value (this was basically some news items (top 3) which will be displayed on to the page). This alone reduced the db to <10%. Later lack of indexing and real non efficient query was cause of this. So even 1 stored procedure can create significant problems. As far as the locks are concerned, sp_who2 or sp_lock system stored procedures can give you lot of information on specific locks which are causing issues and you will be able to fix them.

 

That is it for perfmon. Hope this is detail enough information which will help while looking @ n-tier .NET application. Please let me know if you need any specific information or some topic that you want me to write and any comments you have on these blogs.

 


Microsoft’s Windows Performance Monitor application (PERFMON) makes use of monitoring the health of the servers. So that it is very much helpful in finding out the bottlenecks in the servers. 

 

The following basic counters will be very helpful to find the server health check up before starting the Performance testing.

 

Object: Web Service

Counter: Bytes Total/sec
Description: Bytes Total/sec is the sum of Bytes Sent/sec and Bytes Received/sec. This is the total rate of bytes transferred by the Web service.
Comment: Good indicator of the level of traffic (in bytes) that the site is dealing with. Make sure you pick the web site you are concerned with in the “Instance” list, or choose “_Total” is you want to check the level of traffic on the server as a whole.
Threshold: Very difficult to give a general threshold for this counter, as it is very dependant on factors like server type, network, amount of users, etc. The best thing to do

Object: Physical Disk
Counter: % Disk time
Description: % Disk Time is the percentage of elapsed time that the selected disk drive was busy servicing read or write requests.
Comment: Shows the level of disk activity, which should really be minimal for front-end web servers in a MOSS farm, as most data is stored in the database. Again, make sure you pick the drive you are concerned with in the “Instance” list, or choose “_Total” is you want to check the level of activity on all drives.
Threshold: If the server is simply a MOSS web server, then would expect the disk activity to be minimal. The servers in the current farm I am responsible for tend to average 0.5 on this counter.

Object: Processor
Counter: % Processor time
Description: % Processor Time is the percentage of elapsed time that the processor spends to execute a non-Idle thread. It is calculated by measuring the duration of the idle thread is active in the sample interval, and subtracting that time from interval duration. (Each processor has an idle thread that consumes cycles when no other threads are ready to run). This counter is the primary indicator of processor activity, and displays the average percentage of busy time observed during the sample interval. It is calculated by monitoring the time that the service is inactive, and subtracting that value from 100%.
Comment: Indicates how busy the processor is. You would usually only be interested in the “_Total” instance for multi-processor machines, as they (generally) share the workload equally between processors.
Threshold: As always, depends on the what the servers are being asked to do on your system, but to ensure adequate response times etc. I generally like to see this average beneath 50%.

Object: Memory
Counter: Available MBytes
Description: Available MBytes is the amount of physical memory, in Megabytes, immediately available for allocation to a process or for system use. It is equal to the sum of memory assigned to the standby (cached), free and zero page lists. For a full explanation of the memory manager, refer to MSDN and/or the System Performance and Troubleshooting Guide chapter in the Windows Server 2003 Resource Kit.
Comment: Shows the amount of memory left available to the system. Recommend that you determine a good scale for this counter if you want to display it on the same graph as the counters above (using a vertical limit of 100). I tend to use a scale of 0.1.
Threshold: Dependent on system. The servers in the current farm I am responsible for tend to average about 530MB available out of the 2Gig the server has. If this dropped beneath 300MB I would start being concerned.

 

Network

Object: Network Interface
Counters: Bytes Sent, Received, Total /sec

Object: IPv4
Counters: Datagrams Forwarded, Received, Sent, Total /sec

Object: TCPv4
Counters: Segments Received, Sent, Retransmitted, Total /sec

Object: Server
Counters: Pool Nonpaged Failures, Work Item Shortages

Object: Redirector
Counters: Server Sessions Hung

Physical Disk

Object: Physical Disk
Counters:
% Disk Time, Avg Disk sec/Transfer, Current Disk Queue Length

Processor

Object: Processor
Counters: % CPU Time, Interrupts/sec, Processor Queue Length

 

Memory

Object: Memory
Counters:
Available Mbytes, Page Faults/sec, Pages/sec, Pool Nonpaged Bytes