Arraylist — ArrayList VS LinkedList


ArrayList Vs LinkedList

  • get is O (n)
  • add is O (1)
  • remove — O (n)
  • Iterator.remove — O (1)
  • get is O (1)
  • add — O (1) амортизируется, но O (n) наихудший, так как массив должен быть изменен и скопирован
  • remove — O (n)

Итак, посмотрев на это, я пришел к выводу, что если мне нужно сделать только последовательную вставку в моей коллекции за 5000000 элементов, LinkedList будет outclass ArrayList .

И если мне нужно просто извлечь элементы из коллекции, итерации, т.е. не захватив элемент посередине, все равно LinkedList будет превзойти `ArrayList.

Теперь, чтобы проверить мои предыдущие два утверждения, я написал ниже пример программы. Но я удивлен, что мои предыдущие утверждения оказались неверными.

ArrayList outclassed LinkedList в обоих случаях. Для добавления, а также получения их из коллекции потребовалось меньше времени LinkedList . Есть ли что-то, что я делаю неправильно, или исходные утверждения о LinkedList и ArrayList недействительны для коллекций размером 5000000?

Я упомянул размер, потому что, если я уменьшу количество элементов до 50000, LinkedList будет работать лучше, и исходные операторы верны.

Помните, что сложность большого вывода описывает асимптотическое поведение и может не отражать фактическую скорость реализации. В нем описывается, как стоимость каждой операции растет с размером списка, а не скоростью каждой операции. Например, следующая реализация add является O (1), но не быстрой:

Я подозреваю, что в вашем случае ArrayList работает хорошо, потому что он увеличивает его внутренний размер буфера довольно агрессивно, поэтому не будет большого количества перераспределений. Когда размер буфера не требуется, ArrayList будет иметь более быструю add s.

Вам также нужно быть очень осторожным, когда вы делаете такой профиль. Я бы предложил вам изменить свой код профилирования на фазу прогрева (так что JIT имеет возможность сделать некоторую оптимизацию, не влияя на ваши результаты) и усреднить результаты за несколько прогонов.

(Обратите внимание, что sum может переполняться, и вам может быть лучше использовать System.currentTimeMillis() ).

Также возможно, что компилятор оптимизирует ваши пустые циклы get . Убедитесь, что цикл на самом деле что-то делает, чтобы обеспечить получение нужного кода.

Это плохой показатель ИМО.

  • нужно повторить в цикле несколько раз, чтобы разогреть jvm
  • нужно что-то делать в своем итеративном цикле или оптимизировать массив
  • ArrayList изменяет размеры, что дорого. Если вы построили ArrayList как new ArrayList(500000) , вы могли бы построить за один удар, а затем все распределения были бы довольно дешевыми (один предварительно выделенный массив)
  • Вы не указываете свою JVM памяти — она ​​должна быть запущена с -xMs == -Xmx (все предварительно выделены) и достаточно высока, чтобы не запускалось GC,
  • Этот тест не охватывает самый неприятный аспект LinkedList — произвольный доступ. (итератор не обязательно одно и то же). Если вы подаете, скажем, 10% от размера большой коллекции в качестве случайного выбора list.get , вы увидите, что связанные списки ужасны для захвата всего, кроме первого или последнего элемента.

Для arraylist: jdk get — это то, что вы ожидаете:

(в основном просто возвращаем элемент с индексированным массивом.,

Для связанного списка:

выглядит похожим? Не совсем. entry — это метод, а не примитивный массив, и посмотрите, что он должен делать:

Правильно, если вы попросите сказать list.get(250000) , он должен начинаться во главе и многократно повторять следующий элемент. 250000 доступа или около того (есть оптимизация в коде, где он начинается с головы или хвоста, в зависимости от которых будет меньше обращений.)

ArrayList — это более простая структура данных, чем LinkedList. ArrayList имеет один массив указателей в смежных ячейках памяти. Его нужно только воссоздать, если массив расширен за пределы выделенного размера.

A LinkedList состоит из цепочки узлов; каждый node разделяется выделенным и имеет передние и задние указатели на другие узлы.

Так что это значит? Если вам не нужно вставлять середину, сращивать, удалять в середине и т.д., ArrayList обычно будет быстрее. Он нуждается в меньшем распределении памяти, имеет гораздо лучшую локальность ссылки (что важно для кэширования процессора) и т.д.

Чтобы понять, почему полученные вами результаты не противоречат характеристике «большого О». Нам нужно вернуться к первым принципам; т.е. определение.

Пусть f (x) и g (x) — две функции, определенные на некотором подмножестве вещественных чисел. Один пишет

тогда и только тогда, когда при достаточно больших значениях x f (x) не превосходит константу, умноженную на g (x) по абсолютной величине. То есть f (x) = O (g (x)) тогда и только тогда, когда существует положительное вещественное число M и вещественное число x0 такое, что

Во многих контекстах предположение о том, что нас интересует скорость роста, когда переменная x переходит в бесконечность, остается неустановленной, и проще записать, что f (x) = O (g (x)).

Таким образом, утверждение add1 is O(1) означает, что временная стоимость операции add1 в списке размера N стремится к константе C add1, когда N стремится к бесконечности.


И утверждение add2 is O(1) amortized over N operations означает, что средняя временная стоимость одной из последовательности операций N add2 стремится к константе C add2, когда N стремится к бесконечности.

Что не говорит, что это константы C add1 и C add2. Фактически причина, по которой LinkedList медленнее, чем ArrayList в вашем тесте, заключается в том, что C add1 больше, чем C add2.

Урок состоит в том, что большая нотация O не предсказывает абсолютную или даже относительную производительность. Все, что он предсказывает, это форма функции производительности, так как управляющая переменная становится очень большой. Это полезно знать, но это не говорит вам все, что вам нужно знать.

Arraylist — ArrayList VS LinkedList

O(n^2) ��� arraylist
O(n^3) ��� linkedlist

O(n) arraylist.
O(n^2) linkedlist.

Queue � ���� ������ ������ ������. 14 ��� 13, 14:32����[15128556] �������� | ���������� �������� ����������

Re: ������� ��� ������������� : ArrayList vs LinkedList [new]

At first glance, it seems like a fairly simple operation, something that could run in constant time (O(1)), but it isn’t. Consider an array list with 5 elements:

If we want to delete the element in the index 4, the last one, we just have to remove it and leave the cell empty. That’s all. But what happens if we want to remove the element in the index 0, the first one. We will end up with ‘bubble’, an empty cell before the occupied cells. To avoid this, the array list will shift all the elements after the deleted element:

So, for this operation, the execution time will depend on the number of elements, because the more elements you have in the array, the more time it will take to shift them. That means that the algorithmic complexity for the deletion is O(n), which is not good at all.

Indexing

In this case, it is easy to see that the algorithmic complexity of this operation is O(1) for an array list. That means that it will take the same time to get an element by its index whether we have a hundred elements or a million.

Operation Algorithmic complexity
Inserting (at the last position) O(1)*
Deleting O(n)
Indexing O(1)

*amortized constant time

When we are working with lists, sometimes we know the exact size that our list is going to be, and sometimes we also know that the size will never change.

In such cases, we can do a little trick which consists of instantiating the array list with an initial capacity. This way we will avoid the case when you insert an element and you have to allocate more space.

LinkedList

Java implements this class as a doubly linked list. This means that every element in the list have a reference to the next and the previous elements in the list. A graphical representation of this would look like this:

In this case, every element in the list will use more memory than in an array list as it has to hold two pointers, but it can be really useful for some cases.

Inserting

Inserting an element in a linked list means that we have to add a pointer in the previous last element of the list and another in the new element (to the previous last element).

Luckily the LinkedList also holds a couple of references to the first and the last element, so the operation only needs a few assignations, disregarding the number of elements in the list. This means that it runs in constant time so its algorithmic complexity is O(1), which, as you already know, is really good.

Deleting

In order to delete an element from a linked list the first thing we have to do is to find the element, so we will have to traverse the list until we find it. We have to do that even if we already know the element’s position inside the list. If we want to remove the first or last element it will be much easier, as we have references to both elements, so the list provides special methods for deleting last and first element. But we are considering the general case in which you want to remove an element in a random position.

Once we’ve found the element in the list, it is fairly simple to remove it, we just have to make some assignation, which can run on constant time. Although, as you may guess, the algorithmic complexity for this operation will be O(n), the more elements we have in the list, the more it will take to delete one.

Indexing

As we have already seen on the deleting operation, finding an element by its index means that we have to traverse the list until we find it. So the operation of indexing is pretty similar to the deleting one. Thus its algorithmic complexity is also O(n).

Conclusion

Operation ArrayList LinkedList
Inserting (at the last position) O(1)* O(1)
Deleting O(n) O(n)
Indexing O(1) O(n)

*amortized constant time

In the table above we have the summary of the complexities of these operations for the array list and the linked list.

Next time you need to use a List in your code you can take these in account to choose between the two implementations. If you need a list for inserting elements at the last position and for getting them by its index then you should probably choose the ArrayList.

On the other hand, if you are going to perform more complex operations like inserting at the middle of the list or deleting from the head of the list (which we haven’t covered in this article) you probably want to use a LinkedList.

ArrayList vs LinkedList

Posted by: Attila Mihaly Balazs in Core Java December 30th, 2013 4 Comments Views

I must confess the title of this post is a little bit catchy. I have recently read this blog post and this is a good summary of discussions & debates about this subject.

But this time I would like to try a different approach to compare those 2 well known data structures: using Hardware Performance Counters.


I will not perform a micro-benchmark, well not directly. I will not time using System.nanoTime(), but rather using HPCs like cache hits/misses.

No need to present those data structures, everybody knows what they are using for and how they are implemented. I am focusing my study on list iteration because, beside adding an element, this is the most common task for a list. And also because the memory access pattern for a list is a good example of CPU cache interaction.

Here my code for measuring list iteration for LinkedList & ArrayList:

To measure, I am using a class called HWCounters based on overseer library to get Hardware Performance Counters. You can find the code of this class here.

The program take 2 parameters: the first one to choose between ArrayList implementation or LinkedList one, the second one to take a buffer size used in initializeList method. This method fills a list implementation with 50K strings. Each string is newly created just before add it to the list. We may also allocate a buffer based on our second parameter of the program. if 0, no buffer is allocated.
bench method performs a search of a constant string which is not contained into the list, so we fully traverse the list.

Finally, main method, perform initialization of the list, warmups the bench method and measure 1000 runs of this method. Then, we print results from HPCs.

Let’s run our program with no buffer allocation on Linux with 2 Xeon X5680:

First run is on the ArrayList implementation, second with LinkedList.

  • Number of cycles is the number of CPU cycle spent on executing our code. Clearly LinkedList spent much more cycles than ArrayList.
  • Instructions is little higher for LinkedList. But it is not so significant here.
  • For L2 cache accesses we have a clear difference: ArrayList has significant more L2 misses compared to LinkedList.
  • Mechanically, LLC hits are very important for ArrayList.

The conclusion on this comparison is that most of the data accessed during list iteration is located into L2 for LinkedList but into L3 for ArrayList.

My explanation for this is that strings added to the list are created right before. For LinkedList it means that it is local the Node entry that is created when adding the element. We have more locality with the node.

But let’s re-run the comparison with intermediary buffer allocated for each new String added.

Here the results are quite different:

  • Cycles are 10 times more important.
  • Instructions stay the same as previously
  • For cache accesses, ArrayList have more L2 misses/LLC hits, than previous run, but still in the same magnitude order. LinkedList on the contrary have a lot more L2 misses/LLC hits, but moreover a significant number of LLC misses/DRAM accesses. And the difference is here.

With the intermediary buffer, we are pushing away entries and strings, which generate more cache misses and the end also DRAM accesses which is much more slower than hitting caches.

ArrayList is more predictable here since we keep locality of element from each other.

The memory access pattern here is crucial for list iteration performance. ArrayList is more stable than LinkedList in the way that whatever you are doing between each element adding, you are keeping your data much more local than the LinkedList.

Remember also that, iterating through an array is much more efficient for CPU since it can trigger Hardware Prefetching because access pattern is very predictable.

Difference between ArrayList and LinkedList

ArrayList and LinkedList both implements List interface and maintains insertion order. Both are non synchronized classes.

However, there are many differences between ArrayList and LinkedList classes that are given below.

ArrayList LinkedList
1) ArrayList internally uses a dynamic array to store the elements. LinkedList internally uses a doubly linked list to store the elements.
2) Manipulation with ArrayList is slow because it internally uses an array. If any element is removed from the array, all the bits are shifted in memory. Manipulation with LinkedList is faster than ArrayList because it uses a doubly linked list, so no bit shifting is required in memory.
3) An ArrayList class can act as a list only because it implements List only. LinkedList class can act as a list and queue both because it implements List and Deque interfaces.
4) ArrayList is better for storing and accessing data. LinkedList is better for manipulating data.

Example of ArrayList and LinkedList in Java

Let’s see a simple example where we are using ArrayList and LinkedList both.

When to use ArrayList vs LinkedList in Java


When to use ArrayList vs LinkedList in Java

Before comparing differences of ArrayList and LinkedList, let’s see What is common between ArrayList and LinkedList in Java :

1) Both ArrayList and LinkedList are an implementation of List interface, which means you can pass either ArrayList or LinkedList if a method accepts the java.util.List interface.


2) Both ArrayList and LinkedList are not synchronized, which means you can not share them between multiple threads without external synchronization. See here to know How to make ArrayList synchronized in Java.

3) ArrayList and LinkedList are ordered collection e.g. they maintain insertion order of elements i.e. the first element will be added to the first position.

4) ArrayList and LinkedList also allow duplicates and null, unlike any other List implementation e.g. Vector.

5) Iterator of both LinkedList and ArrayList are fail-fast which means they will throw ConcurrentModificationException if a collection is modified structurally once Iterator is created. They are different than CopyOnWriteArrayList whose Iterator is fail-safe.

Difference between LinkedList and ArrayList in Java

Now let’s see some difference between ArrayList and LinkedList and when to use ArrayList and LinkedList in Java.


1) Underlying Data Structure

The first difference between ArrayList and LinkedList comes with the fact that ArrayList is backed by Array while LinkedList is backed by LinkedList. This will lead further differences in performance.

2) LinkedList implements Deque
Another difference between ArrayList and LinkedList is that apart from the List interface, LinkedList also implements Deque interface, which prov >add() and poll() and several other Deque functions.

Also, LinkedList is implemented as a doubly linked list and for index-based operation, navigation can happen from either end (see Complete Java MasterClass).

3) Adding elements in ArrayList
Adding element in ArrayList is O(1) operation if it doesn’t trigger re-size of Array, in which case it becomes O(log(n)) , On the other hand appending an element in LinkedList is O(1) operation, as it doesn’t require any navigation.

4) Removing element from a position
In order to remove an element from a particular index e.g. by calling remove(index), ArrayList performs a copy operation which makes it close to O(n) while LinkedList needs to traverse to that point which also makes it O(n/2), as it can traverse from either direction based upon proximity.


5) Iterating over ArrayList or LinkedList

Iteration is the O(n) operation for both LinkedList and ArrayList where n is a number of an element.

6) Retrieving element from a position
The get(index) operation is O(1) in ArrayList while its O(n/2) in LinkedList, as it needs to traverse till that entry. Though, in Big O notation O(n/2) is just O(n) because we ignore constants there.

7) Memory
LinkedList uses a wrapper object, Entry, which is a static nested class for storing data and two nodes next and previous while ArrayList just stores data in Array.

If Array is large enough it may take a lot of memory at that point and trigger Garbage collection, which can slow response time.

From all the above differences between ArrayList vs LinkedList, It looks ArrayList is the better choice than LinkedList in almost all cases, except when you do a frequent add() operation than remove(), or get() .

In my opinion, use ArrayList over LinkedList for most of the practical purpose in Java.

Lists in Java (ArrayList vs LinkedList) Tutorial

List is an interface for an ordered collection of elements in Java.

You can easily add, remove and get elements by index.

2 main things that you should keep in mind using Lists in Java:

  • Lists guarantee an order of elements. That means if you will add 1, 2, 3 integers to the list, you can access it by 0, 1 and 2 indexes.
  • Lists allow duplicates

There are 2 the most popular implementations of Java List: ArrayList and LinkedList .

Let’s take a look deeper at this 2 classes.

I’ll explain how to initialize list in Java, syntax, difference between ArrayList and LinkedList and of course I’ll mention few popular interview questions.

Table of Contents

Java List Syntax

The list syntax should not be a problem for any person.

For example, we want to make an array list of string:


or initialize a linked list of integers:

Note that currently (until Java 9) you cannot use primitive types as a generic type, you have to use corresponding primitive > Integer , Long etc.

But JEP 218 promises to fix this situation and maybe in the future we’ll get this feature.

You can add elements calling add ( ) method:

You can remove elements by index as well:

strings . remove ( 1 ) ;

If an index is not val > IndexOutOfBoundException will be thrown.

You can access elements by index:

String explain = strings . get ( 0 ) ;

Or you can iterate through the list using a for-each loop.

Let’s take a deeper look at list implementations.

ArrayList

Java ArrayList is built using a simple array.

Inside of array list, you can find a field:

Default capacity is 10 elements.

You can specify initial capacity via constructor as well.

Internal array size is always greater than list size because we need a gap to add elements fast.

If required capacity less than current element capacity – array will grow on 50%.

For example, current capacity is 10, we want to add one more element.

Required capacity is 11, that is more than current, so elementData array will grow to 15 elements.

Previous 10 elements will be copied using Arrays . copy ( ) and new element will be added to the 11th position.

But when you remove an element capacity is not changed, it’s important to know.

One more thing that you should keep in mind that ArrayList is not a thread-safe collection.

If you need concurrency take a look at CopyOnWriteArrayList and Collections . synchronizedList ( ) .

Time Complexity

First of all, read what is algorithm time complexity on the Wiki if you don’t know what is it.

Now depends on your use case you should understand what data structure is better.

You should think about the most frequent operations, e.g. you need to add elements fast or you need to read elements by index.

We’re going to estimate the following use-cases:

  • add/remove an element to/from the beginning
  • add/remove an element to/from the end
  • add/remove an element to/from the middle
  • read element by index
  • contains an element
  • get list size


Let’s take a look at time complexity table:

Use-case Time Complexity Description
add/remove an element to/from the beginning O(n) You need to copy each element to the i + 1 position and then you can write a new element to 0 index
add/remove an element to/from the end O(1) At the end of an array you have a gap, so it costs nothing to write a new element to the end
add/remove an element to/from the middle O(n/2) You need to copy 2nd half of element to i + 1 position
read element by index O(1) Get element by index in the array takes constant time
contains an element O(n) You need to iterate through the whole array and check if array[i] element equals to target object, worst case is an object is a last one in the array
get list size O(1) ArrayList stores a private field size and increments it every time when its growing, so it’s constant time to read this variable

If we analyze this table we can say following:

  • It’s really fast to add element to the end, read element by index and check array list size
  • It could take time to check if object exists or not
  • It could take time to add an element to the beginning or to the m > Arrays . copy ( ) works really fast but if this is a frequent operation you should think twice before choosing an ArrayList

LinkedList

LinkedList is one more List implementation that consists of Nodes that contains an object value and references to the next and previous nodes.

It’s double-linked – that means LinkedList contains references to the first and last nodes.

elizarov

Блог Романа Елизарова

«Память это новый диск, диск это новая лента». Такой мотив можно часто прочитать в блогах, посвященных написанию высокопроизводительных систем. Действительно, память становится всё дешевле и дешевле, и во многих областях становится не только возможным, но и экономически выгодным, хранить все данные системы в памяти, рассматривая диск лишь как способ их архивирования. Однако, есть и другой аспект этого высказывания. Разрыв в производительности между кэшом процессора и оперативной памятью настолько велик, что к памяти надо относится как к диску при написании любого высокопроизводительного кода.

Классическое образование в области теории алгоритмов уделяет много времени асимптотической сложности алгоритмов, но не уделяет достаточно времени практическим аспектам производительности. Например, первая структура данных рассмотренная в Главе 2 классического труда Дональда Кнута «Искусство Программирования» это линейный список. В языке Java он реализован в классе LinkedList. Альтернативой этой структуре данных является список на массиве, реализованный в классе ArrayList.

Если провести сравнительный анализ асимптотической амортизированной сложности различных базовых операций над списком, то получим следующую таблицу:

Операции LinkedList ArrayList
Вставка или удаление элемента в конце, последовательная итерация по одному элементу O(1) O(1)
Вставка или удаление элемента в начале или в середине O(1) O(N)
Получение элемента по индексу (номеру) O(N) O(1)

Из этой таблицы видно, что есть специфические операции, на которых LinkedList или ArrayList оптимальны по асимптотической сложности, в то время как для самого простого случая добавления элементов в конец и итерации по ним, разницы в асимптотической сложности нет. Какой же список использовать по умолчанию? Кнут не дает ответа на этот вопрос, более того, начиная рассмотрение со списка он как бы намекает.

На самом деле, идеология «память это новый диск» дает нам другую установку по умолчанию: при прочих равных, структура данных которая занимает меньше памяти будет и быстрей работать. Более того, память, как и диск, работает блоками, и ключевым моментом анализа производительности становится количество блоков памяти, которые алгоритм «трогает» во время работы, а также шаблон этого доступа (последовательный или случайным).

Давайте замерим скорость работы простейшего цикла итерации по N элементам в списке случайных целых чисел:

Полный код теста выложен здесь. На моем ноутбуке (Core i5, 520M, 2.4 GHz) при запуске под Java 1.7.0_01 используя HotSpot Server VM (опция «-server») получаются следующие времена итерации в расчете на один элемент, в наносекундах:

Размер LinkedList ArrayList
1 000 3.43 1.94
10 000 3.71 1.95
100 000 6.70 2.22
1 000 000 7.34 3.43

Видно, что итерация по LinkedList почти в два раза медленней, чем по ArrayList. Как объяснить эти результаты? Давайте посчитаем, сколько памяти уходит на каждый элемент в этих списках. Объект класса Integer на 32-битной JVM занимает 16 байт с учетом заголовка объекта в 8 байт и выравнивания. Узел в LinkedList занимает 24 байта, а указатель на объект в ArrayList занимает всего 4 байта. Таким образом, общее потребление памяти на объект в LinkedList это 16+24=40 байт, а в ArrayList это 16+4=20 байт, то есть ровно в два раза меньше, что и объясняет основную разницу в скорости работы. Вся остальная вариация в деталях. Например, при размере в 100 тысяч элементов LinkedList на моей машине работает в 3 раза медленней чем ArrayList. Общий объем памяти, необходимый для хранение этих 100 тысяч элементов в массиве около 2 МБ, а в связанном списке 4 МБ. Первый еще влезает в 3 МБ кэш моего процессора, а второй уже нет.

ArrayList vs LinkedList

ArrayList vs LinkedList

There comes two classes ArrayList and LinkedList to store objects. Internally, ArrayList stores elements as an array form and LinkedList stores elements in node form. Due to their internal style of storing the elements, they come with different performance heads depending the nature of action performed like addition and retrieval.

Programs are available at ArrayList and LinkedList.

A small Clock program developed (used in subsequent programs) to know the performance:

The above screen shows different results when executed different times. It is because a number processes may be running right now on your system, some of which may be in pause, working or stopped etc. Also it depends on the availability of JVM free time. The above Clock program is used in every program to know the performance.

The above clock program is used in finding the performance of ArrayList and LinkedList in the following programs.

In the next program elements are inserted at 0 position always and actual 0 position elements are pushed down.

Output screen of ArrayListLinkedList.java


Inserting elements at the beginning of an ArrayList requires that all existing elements be pushed down. This is a very costly overhead. But inserting at the beginning of LinkedList is cheap, because the elements of the structure are connected with each other via linked nodes. when an element is added, the two neighbour nodes are disturbed but not all nodes.

Output screen of RandomLookup.java

Accessing the elements with ArrayList is less expensive than accessing elements with LinkedList.

ArrayList LinkedList
Adding elements Expensive Cheaper
Retrieving elements Cheaper Expensive

To retrieve the elements, it is advised to use Iterators with LinkedList.

Programming in terms of Interfaces

The example in the previous section illustrates an important point, namely, that there may be more than one «right» way to represent data. You might, for example, decide that an ArrayList is the best choice for some application, but then later decide that a LinkedList would in fact be a better choice. One way you can make changeovers like this easier to program is by using interface types, instead of actual types of classes that implement the interface. For example, if you say:

ArrayList vs LinkedList

Posted by: Attila Mihaly Balazs in Core Java December 30th, 2013 4 Comments Views

I must confess the title of this post is a little bit catchy. I have recently read this blog post and this is a good summary of discussions & debates about this subject.

But this time I would like to try a different approach to compare those 2 well known data structures: using Hardware Performance Counters.

I will not perform a micro-benchmark, well not directly. I will not time using System.nanoTime(), but rather using HPCs like cache hits/misses.

No need to present those data structures, everybody knows what they are using for and how they are implemented. I am focusing my study on list iteration because, beside adding an element, this is the most common task for a list. And also because the memory access pattern for a list is a good example of CPU cache interaction.

Here my code for measuring list iteration for LinkedList & ArrayList:

To measure, I am using a class called HWCounters based on overseer library to get Hardware Performance Counters. You can find the code of this class here.

The program take 2 parameters: the first one to choose between ArrayList implementation or LinkedList one, the second one to take a buffer size used in initializeList method. This method fills a list implementation with 50K strings. Each string is newly created just before add it to the list. We may also allocate a buffer based on our second parameter of the program. if 0, no buffer is allocated.
bench method performs a search of a constant string which is not contained into the list, so we fully traverse the list.

Finally, main method, perform initialization of the list, warmups the bench method and measure 1000 runs of this method. Then, we print results from HPCs.

Let’s run our program with no buffer allocation on Linux with 2 Xeon X5680:

First run is on the ArrayList implementation, second with LinkedList.

  • Number of cycles is the number of CPU cycle spent on executing our code. Clearly LinkedList spent much more cycles than ArrayList.
  • Instructions is little higher for LinkedList. But it is not so significant here.
  • For L2 cache accesses we have a clear difference: ArrayList has significant more L2 misses compared to LinkedList.
  • Mechanically, LLC hits are very important for ArrayList.

The conclusion on this comparison is that most of the data accessed during list iteration is located into L2 for LinkedList but into L3 for ArrayList.

My explanation for this is that strings added to the list are created right before. For LinkedList it means that it is local the Node entry that is created when adding the element. We have more locality with the node.

But let’s re-run the comparison with intermediary buffer allocated for each new String added.

Here the results are quite different:

  • Cycles are 10 times more important.
  • Instructions stay the same as previously
  • For cache accesses, ArrayList have more L2 misses/LLC hits, than previous run, but still in the same magnitude order. LinkedList on the contrary have a lot more L2 misses/LLC hits, but moreover a significant number of LLC misses/DRAM accesses. And the difference is here.

With the intermediary buffer, we are pushing away entries and strings, which generate more cache misses and the end also DRAM accesses which is much more slower than hitting caches.

ArrayList is more predictable here since we keep locality of element from each other.

The memory access pattern here is crucial for list iteration performance. ArrayList is more stable than LinkedList in the way that whatever you are doing between each element adding, you are keeping your data much more local than the LinkedList.

Remember also that, iterating through an array is much more efficient for CPU since it can trigger Hardware Prefetching because access pattern is very predictable.

Цукерберг рекомендует:  Вакансии Сознательные Машины
Понравилась статья? Поделиться с друзьями:
Все языки программирования для начинающих