Tuesday, September 1, 2015

Why do most high-level languages are slow


The last couple of months, I am often asked this question, so I decided to answer his article.

The reasons for which the most high-level programming languages ​​are slow, usually two:

they do not work with the cache;
garbage collection takes a lot of resources.
In fact, both factors are reduced to one - in such languages ​​are a large number of memory allocations.

It should be clarified that I'm talking primarily about the client applications where the speed of local importance. If the application spends 99.9% of the time waiting for the response of the network it is unlikely to matter how slow is the language problem of optimization of networking is important.

I will take as an example C #, for two reasons: first, a high-level language that I use often lately, and secondly, if I took Java, fans C # could be argued that the use of value-types in C # eliminates the majority of problems (it is not).

In addition, I will proceed from the assumption that the code is written idiomatic, in the same style with the standard libraries and negotiated agreements. I will not cover ugly crutches as a "solution" of the problem. Yes, in some cases it allows you to bypass the use of language, but does not eliminate the original problem.

Work with the cache
To begin, let's look at why it is important to use the cache. Here's the schedule delay memory processors Haswell, based on these data.

memory access speed

Delay in reading from a pile of about 230 bars, at the time, like reading from L1 cache - 4 strokes. It turns out that incorrect operation with cache may make the code almost 50 times slower. Moreover, thanks to modern multi-tasking processor, reading from the cache may occur while working with registers, so the delay in reading from L1 can generally be ignored.

Simply put, in addition to the development of the algorithm, cache misses - the main problem of productivity. Once you decide the problem of effective access to the data, there will be only minor optimizations that, compared to cache misses, not so much affect performance.

The good news for developers of language: you do not need to develop the most efficient compiler and you can save time by using some abstraction. All you need - to organize effective access to data and programs on your tongue will be able to compete in speed of execution with C.

Why Use C # leads to frequent cache misses
Simply put, C # is designed in such a way that ignores the realities of today's job cache. Once again, I'm not talking about the limitations imposed by language and "pressure" that they have to be a programmer, causing not write the most efficient code. In theory, many of these things have workarounds, but not the most comfortable. I'm talking about the code to write the language which encourages us as such.

The main problem of the C # is that it is very poorly supported value-types. Yes, there have structures that represent a value stored in the same place and announced (for example, on the stack or within other objects). But there are some pretty serious problems, because of which these structures are more crutches than solutions.

You are obliged to declare the types of data in advance, which means if you ever need to get an object of this type existed in the pile, all objects of this type will always be allocated on the heap. Of course, you can write wrapper classes for the relevant data types and use the correct case for each wrapper, but it's pretty painful way. It would be better if the classes and structures have always declared the same, but it was possible to use them in different ways and it is as it should in this case. E. When you need something on the stack, you declare it as a value, and when the heap - as an object. It works because C ++, for example. They do not force you to do all objects, if only because there are some things that should be in the pile (for various reasons).
Features links are very limited. You can pass anything on the link to the function, but that's it. You can not just go and get a reference to an element in the List <int>, for example. For this to be stored in the list and references and values ​​simultaneously. You can not get a pointer to something allocated on the stack or the location of the object. You can only copy all or transfer function by reference. The reasons for such behavior, of course, is understandable. If type safety a priority, it is rather difficult (if not impossible) to maintain a flexible system of references and at the same time to ensure such security. But even the reasonableness of such restrictions does not alter the fact that these restrictions are all still there.
Buffer with a fixed size does not support user-defined types, and require directives unsafe.
Limited functionality subarrays. There is a class ArraySegment, but is little-used. If you need to pass a set of elements of the array, you have to create IEnumerable, so allocate memory. Even if the API supports ArraySegment, this is still not enough, you can not use it either for the List <T>, or array on the stack, only the usual array.
As a result, the language of all but the most simple cases, pushing the pile in active use. If all your data in the heap, the probability of a miss is higher (because you can not decide how to arrange the data). While writing a program in C ++ programmer has to think about how to ensure effective access to data, C # encourages their location in certain areas of memory. Programmer loses control of the location of data, which entails the unnecessary mistakes and as a result, the performance drop. And the fact that you can compile a C # program in the native code, it does not matter. Productivity growth does not compensate for poor local cache.

In addition, space is used inefficiently. Each link - is 8 bytes on 64-bit machine, each allocation associated metadata. In the heap, which is filled with tiny objects with reference to them is much less space than if they were located large objects with the contents inside, arranged at a fixed offset. Even if you do not mind wasting wasted space if filled with a bunch of headlines and links, all this garbage goes into the cache, which again will lead to a lot of mistakes and poor performance.

Of course, there are workarounds. For example, you can use the structure and place them in the pool of the List <T>. This will allow, for example, to bypass the list and update all the objects in place. Unfortunately, now, to get a reference to an object, you must get a reference to the pool and the index of the element, and further use of the array indexing. Unlike C ++, C # this trick looks terrible, the language is simply not designed for this. Moreover, now access one member of the pool is more expensive than using a reference - can now be two misses, since it will be necessary to dereference the pool itself. You can, of course, to duplicate the functionality of List <T> in the structure to avoid this. I wrote a lot of code for such a practice: it's ugly, working at a low level and is not immune from mistakes.

Finally, I want to point out that this problem arises not only in the narrow places. The idiomatic written code reference types are everywhere. This means that all code will constantly be a delay of a few hundred cycles, blocking all optimization. If even optimize resource-intensive areas, your code will be equally slow. So, if you do not want to write programs using the memory pools and codes, working, perhaps at a lower level than C ++ (then why use C #), you hardly anything you can do to avoid this problem.

Garbage collection
I believe that the reader understands why the garbage collection itself is a serious problem for performance. Random hovering on the assembly totally unacceptable for programs with animation. I will not dwell on it and move on to how to design the language itself can worsen the situation.

Because of the limitations of language to work with value-types, developers often have to instead of one large data structure (which can even be put on the stack) use a lot of small objects that are placed in a pile. At the same time we must remember that the larger the objects created on the heap, the more work for the garbage collector.

There are certain tests that C # and Java are superior to the C ++ performance, as memory allocation in a language with garbage collection costs are much cheaper. However, in practice this rarely happens. Write a program in C # c same number of allocated memory, as in the most naive implementation in C ++, requires much more effort. We compare the highly optimized program in automatic memory management, a native program written "on the forehead." If we spend the same amount of time optimizing C ++ code, we again leave the C # behind.

I know that it is possible to write a garbage collector provides high performance (eg, incremental, fixed-time gathering), but this is not enough. The biggest problem of high-level languages ​​- that they encourage the creation of large numbers of objects. If in idiomatic C # would have the same number of allocations, as in the program C, garbage collection would represent a much smaller problem. And if you were using incremental garbage collector, for example, for soft real-time systems, you will probably need to write a barrier for him. But no matter how cheap it is not treated, in a language that encourages the active use of pointers, this means an additional delay when changing fields.

Look at the standard library .Net - memory allocation at every step. According to my calculations .Net Core Framework contains 19 times more public classes than structures. This means that while using it you will encounter a large number of allocations. Even the creators of the language could not resist the temptation! I do not know how to collect statistics, but using the base class libraries, you will definitely notice that all code is a huge amount of memory allocations. All code is written based on the assumption that this is a cheap operation. You can not even bring int to the screen without creating an object! Imagine: even using StringBuilder you can not simply pass it int allocation without using the standard library. This is silly.

It is found not only in the standard library. Even in the API Unity (game engine, which is important for the performance) around methods that return references to objects or arrays of reference types, or force the user to allocate memory to call these methods. For example, returning an array of GetComponents, developers allocate memory for an array just to see what components are in GameObject. Of course, there are alternative API, but if you follow the style of language, unnecessary memory allocations can not be avoided. The guys from Unity write in "good» C #, which has a terrible performance.

Conclusion
If you develop a new language, pozhaluycta, consider performance. No "compiler is smart enough" will not be able to provide it after the fact, if you do not lay this capability initially. Yes, it is difficult to provide type safety without garbage collection. Yes, it is difficult to collect the garbage, if the data is not uniform. Yes, it is difficult to resolve questions of scope, if you can get a pointer to any location in memory. Yes, there are many problems. But the development of a new language - the decision of these questions. Why make a new language if it is not very different from those that were created in the '60s?

Even if you can not solve all the problems, you can try to deal with most of them. You can use the region types in Rust for safety. You can abandon the idea of ​​"type safety at any cost", increasing the number of runtime-checks (if they do not cause additional cache misses). Covariant arrays in C # - is exactly the case. In fact, this type of bypass system, which leads to the release of exceptions at runtime.

To summarize, I can say that if you want to develop an alternative to C ++ to vysokoprozvoditelnogo code, you need to think about the location of the data and their locality.

By Unknown with No comments

0 коммент.:

Post a Comment