For every application could be small or medium or a big enterprises application the major non-functional requirement will be a good code performance. In some cases, companies will hire a special team or outsource to improve their existing application’s performance.

Application performance is the major consideration for every application designer. Also, we got many design patterns by considering the performance.

The challenge of the application designer or programmer is, how to improve performance and to say frankly there is no exact pattern to do this but it all depends on how we are utilizing the memory.

When we go back, from C-language to Java and .NET, designers concentrated more on memory management. I don’t go much deeper into this as I want to stay on how we can improve performance with little design changes.

As a .NET programmer, we are aware of Types are divided into Primitive Type and reference type, where the data of primitive types (int, float, etc..,) will store be on Stack memory and data of reference types (String, Object, etc..,) will be stored on Heap memory.

From processor point of view, whenever an application is running, it will deal with many variables of different types some may be primitive and some will be the reference. Generally, every processor from small to high end will maintain its own Caches to store all data which it frequently keeps using while executing our application.

Generally, if many applications are running in parallel, CPU time will be shared by each application depending on the configuration we have. Reading data from RAM by the processor will take up to ~10 nanoseconds (it may vary depending on configuration) and this read is so costly. If an application was given with 15 nanoseconds CPU time, 10 nanoseconds will go to get data from RAM and remaining 5 nanoseconds for processing, and again this application should wait for its next turn as 15 nanoseconds of CPU time is done.

To decrease the time to read data from RAM, every processor will have its dedicated Caches with different levels and these caches are very fast in reading data approx. 1 nanosecond (varies from processor to processor) i.e.., 10X faster than reading from dynamic RAM.

Nowadays, for any new processors will contain at least 3 levels of Caches

Level 1 Cache (L1): This is the primary cache and often accessed in few cycles. L1 cache is the fastest cache than other level cache and it will come with a processor built in. It can store up to 100 KB of data. This cache uses the high-speed SRAM (static RAM) instead of the slower and cheaper DRAM (dynamic RAM) used for main memory.

Level 2 Cache (L2): This is bigger than L1 and stores up to 512 KB. Accessing from this cache is little slow than L1 cache. This will be in between L1 and Main Memory. 

Level 3 Cache (L3): This is bigger than L2 and stores up to 2 MB. Accessing from this cache is little slow than L2 cache. This will be in between L2 and Main Memory and can be found on the motherboard rather than on a processor.

Similarly, there will be other levels as well depends on the processor.

Now when processor starts the application, it will get and store all the required variable data to its Cache such that it can be read it whenever needed quickly to utilize most of the CPU time for the processor. In the case of primitive type variables processor will store directly its original data to the cache and no need to go access main memory while processing. But, in case of reference type, this cache will be stored with the address or original data on the main memory and now everytime processor should get address from cache and read data or write data to the given address on main memory (if addresses are not stored on cache, gathering address and arranging in order is more time consuming). Here for a reference type, we are not fully utilizing the processor cache power. Now every developer or designer challenging part is to utilize maximum extent of these high-speed cache power and increase the CPU time utilization for processing and decreasing main memory accessed by the processor.

In C#, we have a type called Struct and it is similar to Class, but the variable of type Strut is treated as a primitive type and its data will be stored in a cache by the processor.

Note: If you are new to Struct please go to the following link to better understand it https://msdn.microsoft.com/en-us/library/aa288471(VS.71).aspx

Following are the code will provide a better understanding on how Struct type will provide performance improvement when compared with Class type. 

public class ClassEmpolyee
   {
       public string FirstName { get; set; } = string.Empty;
       public string LastName { get; set; } = string.Empty;
       public decimal Salary { get; set; } = 0;
   }

   public struct StructEmpolyee
   {
       public string FirstName { get; set; }
       public string LastName { get; set; }
       public decimal Salary { get; set; }
   }

   public class PerfTest
   {
       public void StartPerfTest(int countOfEmployees)
       {
           System.Diagnostics.Stopwatch stopWatch = System.Diagnostics.Stopwatch.StartNew();

           //Class
           ClassEmpolyee[] employeesAsClasses = new ClassEmpolyee[countOfEmployees];

           for(int i=0; i< countOfEmployees; i++ )
           {
               employeesAsClasses[i] = new ClassEmpolyee() { FirstName = "EmoFName " + i, LastName = "EmoLName " + i, Salary = 1000 * i};
           }

           //Update
           for (int i = 0; i < countOfEmployees; i++)
           {
               employeesAsClasses[i].Salary += 2000; 
           }

           long classTime = stopWatch.ElapsedMilliseconds;

           stopWatch.Restart();
           //Struct
           StructEmpolyee[] employeesAsStructs = new StructEmpolyee[countOfEmployees];
           for (int i = 0; i < countOfEmployees; i++)
           {
               employeesAsStructs[i] = new StructEmpolyee() { FirstName = "EmoFName " + i, LastName = "EmoLName " + i, Salary = 1000 * i };
           }

           //Update
           for (int i = 0; i < countOfEmployees; i++)
           {
               employeesAsStructs[i].Salary += 2000;
           }

           long structTime = stopWatch.ElapsedMilliseconds;

           Console.WriteLine("Time Taken for " + countOfEmployees + " Objects creation by\nClass: " + classTime + "ms\nStruct: " + structTime + "ms\nDifference: ~" + (classTime - structTime) + "ms");
       }
   }

Here, in the above code we try to create some 100K class objects and 100K struct objects and also we updated its salary value by adding to test the read and write performance by the CPU and following are the results

Clearly seen there is a huge performance difference here. But, we have some limitations of using Structs and following are those

  • If you want many variables to declare the size of the Struct type will become huge and it will lead to more performance problem as on heap (in the case of Class type) it will handle differently for huge objects.
  • If you have many times to pass these Struct type variables across classes or layers will lead to performance issue as we are not passing an address but we are passing the complete data.

If you are good with the above limitations while using Structs, you will definitely gain huge performance especially in the case of looping huge list of objects as in the above example program.

Hope this article gives you a good bit of information about how the processor will handle our application while processing and how can we design our application a processor friendly.

Thank You and happy coding 🙂