1 .. TODO
2 ..
3 ..
4 ..
5 ..
6 ..
7 ..
8 ..
9 ..
10 ..
11 ..
12 ..
13
14
15 .. TODO
16 ..
17 ..
18 ..
19 ..
20
21
22 -----
23 Links
24 -----
25 ``20. 07. 2016``
26
27 Looks like this might evolve into Vulkan's NeHe (as in, back when NeHe was up
28 to date and not teaching historic OpenGL techniques):
29
30 http://vulkan-tutorial.com
31
32
33 ----------
34 Histograms
35 ----------
36 ``20. 07. 2016``
37
38 Thing I noticed recently while adding some statistics into my code:
39 Histograms are really useful. And really useful to write.
40
41 I needed to figure out whether most (network) packets received from kernel are
42 really below ~1.5kiB (default Ethernet MTU). (Note: these code blocks are
43 written in train without even attempring to compile)
44
45 .. code-block:: cpp
46
47
48
49
50 static const uint32_t MAX_PACKET_SIZE = 66000;
51 uint32_t ip_histogram[MAX_PACKET_SIZE];
52 uint8_t* packet;
53 uint32_t length;
54 while(getPacket(packet, length))
55 {
56
57 (length < MAX_PACKET_SIZE ? ip_histogram[length] : ip_histogram[MAX_PACKET_SIZE - 1]) += 1;
58
59 }
60
61 for (uint32_t i = 0; i < MAX_PACKET_SIZE - 1; ++i)
62 {
63
64 if (ip_histogram[i] != 0)
65 {
66 printf("%u: %u", i, ip_histogram[i]);
67 }
68 }
69 printf("%u or more: %u", MAX_PACKET_SIZE - 1, ip_histogram[MAX_PACKET_SIZE - 1]);
70
71
72 This turned out to be useful, so I decided to add **moar** histograms to monitor
73 various stats in my codebase. And I needed larger ranges of values:
74
75 .. code-block:: cpp
76
77 class Histogram
78 {
79
80
81
82
83 Histogram(uint32_t step, uint32_t bucketCount)
84 : counts_(bucketCount)
85 , step_(step)
86 {
87 }
88
89 void countValue(uint32_t value)
90 {
91 const uint32_t idx = static_cast<uint32_t>(value / step_);
92 if (idx < counts_.size())
93 {
94 ++_count[idx];
95 return;
96 }
97 ++countOver;
98 }
99
100 void print()
101 {
102 uint32_t min = 0;
103 for (uint32_t count; counts_)
104 {
105 printf("<%u - %u): %u", min, min + step_, count);
106 min += step_;
107 }
108 printf("<%u - inf): %u", min, countOver_);
109 }
110
111 private:
112 std::vector<uint32_t> counts_;
113
114 uint32_t step_;
115
116 uint32_t countOver_;
117 };
118
119 Obviously, this could be improved a lot; we could generalize it to signed
120 integers and floats or even made generic with a type parameter, we could add
121 an iterator/range over the histogram to allow external processing of the
122 histogram and so on. But just hacking this together in a few minutes got me
123 a *lot* of information about what's happening in my code, by liberally
124 sprinkling histograms all over.
125
126 --------------------
127 Linux debugging news
128 --------------------
129 ``18. 07. 2016``
130
131 An interesting video about GDB is making circles on Reddit/HN:
132
133 https://www.youtube.com/watch?v=PorfLSr3DDI
134
135 TLDR: GDB has its own TUI ('console GUI'), which greatly improves usability;
136 with source view showing current line and breakpoints, assembly view and
137 register view. Not exactly groundbreaking compared to what IDEs and GUI GDB
138 front-ends can do, but - built in.
139
140 Also, GDB can be easily scripted, on the fly, with python. This is very useful
141 to e.g. run a heisenbug-ridden program a thousand times, only to break
142 execution right when the heisenbug manifests.
143
144 Also; it shows how to do reverse debugging with GDB; which I knew GDB can do,
145 but never actually learnt how to use.
146
147 HN discussion also lead to another interesting open source debugging tool:
148
149 http://rr-project.org/
150
151 This runs a program and records all state of the running program into a file;
152 this state can then be replayed repeatedly (and reversibly); kinda like
153 ApiTrace (http://apitrace.org) for the whole program instead of just OpenGL.
154
155 This should be extremely useful as you just need to record a bug *once*, and
156 can then re-run it as many times as needed, back and forth, let other people
157 look at the bug etc, until it's fixed; no replication needed.
158
159
160 ----------------------------------
161 (Micro-) Optimization notes on ARM
162 ----------------------------------
163 ``17. 07. 2016``
164
165 Some of these are relevant for all architectures or plain obvious;
166 this is a reference list I'm using while looking through my code.
167
168 * High-precision (and even plain old) clocks are unexpectedly slow
169 * Align copies/memsets to 64byte boundaries
170 * Division and modulo are really slow
171 * Access to globals is slow-ish
172 * Branch prediction - GCC likely/unlikely hints
173 * Reduce 64bit vars/ops (e.g. use decisecond time instead of nanoseconds)
174 * Less mutexes
175
176 * Up to 4 32-bit params are through registers; any more through the stack (32bit ARM)
177 * 8bit/16bit are slightly slower on some cores
178 * Loops counting down to 0 are a bit faster
179 * Don't do ``memset()`` or other big writes in inner loops
180 * Merge neighboring loops, unroll loops (even partially) - might help or hurt
181 same for ``-funroll-loops``; try with and without
182 * ``a * b + c`` should result in FMA
183 * Avoid memsets/inits we can safely avoid
184 * ``restrict``
185 * LTO
186 * Small looops (64 bytes) will be fast. Smaller (32 bytes) will be faster