1 .. TODO
  2 .. -------------------------------
  3 .. Improving productivity on Linux
  4 .. -------------------------------
  5 .. ``??. ??. 2016``
  6 .. dotfiles
  7 .. tmux
  8 .. shell
  9 .. shell aliases
 10 .. fasd
 11 .. magit
 12 .. ultisnips & custom snippets
 13 
 14 
 15 .. TODO
 16 .. -----------------------------------
 17 .. VR desktop / window manager concept
 18 .. -----------------------------------
 19 .. ``??. ??. 2016``
 20 
 21 
 22 -----
 23 Links
 24 -----
 25 ``20. 07. 2016``
 26 
 27 Looks like this might evolve into Vulkan's NeHe (as in, back when NeHe was up
 28 to date and not teaching historic OpenGL techniques):
 29 
 30 http://vulkan-tutorial.com
 31 
 32 
 33 ----------
 34 Histograms
 35 ----------
 36 ``20. 07. 2016``
 37 
 38 Thing I noticed recently while adding some statistics into my code:
 39 Histograms are really useful. And really useful to write.
 40 
 41 I needed to figure out whether most (network) packets received from kernel are
 42 really below ~1.5kiB (default Ethernet MTU). (Note: these code blocks are
 43 written in train without even attempring to compile)
 44 
 45 .. code-block:: cpp
 46 
 47    // In a sane world, no packet should be bigger than (slightly more than 65536 - which is
 48    // a limit for IP payload length)
 49    // Use a std::vector for added safety
 50    static const uint32_t MAX_PACKET_SIZE = 66000;
 51    uint32_t ip_histogram[MAX_PACKET_SIZE];
 52    uint8_t* packet;
 53    uint32_t length;
 54    while(getPacket(packet, length))
 55    {
 56        // ... do stuff
 57        (length < MAX_PACKET_SIZE ? ip_histogram[length] : ip_histogram[MAX_PACKET_SIZE - 1]) += 1;
 58        // ... do more stuff
 59    }
 60 
 61    for (uint32_t i = 0; i < MAX_PACKET_SIZE - 1; ++i)
 62    {
 63        // Don't spam the output with zero values
 64        if (ip_histogram[i] != 0)
 65        {
 66            printf("%u: %u", i, ip_histogram[i]);
 67        }
 68    }
 69    printf("%u or more: %u", MAX_PACKET_SIZE - 1, ip_histogram[MAX_PACKET_SIZE - 1]);
 70 
 71 
 72 This turned out to be useful, so I decided to add **moar** histograms to monitor
 73 various stats in my codebase. And I needed larger ranges of values:
 74 
 75 .. code-block:: cpp
 76 
 77    class Histogram
 78    {
 79        /** Could be optimized by using a binary exponent as a step
 80         * instead of allowing universal value to avoid division in
 81         * countValue()
 82         */
 83        Histogram(uint32_t step, uint32_t bucketCount)
 84            : counts_(bucketCount)
 85            , step_(step)
 86        {
 87        }
 88 
 89        void countValue(uint32_t value)
 90        {
 91             const uint32_t idx = static_cast<uint32_t>(value / step_);
 92             if (idx < counts_.size())
 93             {
 94                 ++_count[idx];
 95                 return;
 96             }
 97             ++countOver;
 98        }
 99 
100        void print()
101        {
102            uint32_t min = 0;
103            for (uint32_t count; counts_)
104            {
105                printf("<%u - %u): %u", min, min + step_, count);
106                min += step_;
107            }
108            printf("<%u - inf): %u", min, countOver_);
109        }
110 
111    private:
112        std::vector<uint32_t> counts_;
113 
114        uint32_t step_;
115 
116        uint32_t countOver_;
117    };
118 
119 Obviously, this could be improved a lot; we could generalize it to signed
120 integers and floats or even made generic with a type parameter,  we could add
121 an iterator/range over the histogram to allow external processing of the
122 histogram and so on. But just hacking this together in a few minutes got me
123 a *lot* of information about what's happening in my code, by liberally
124 sprinkling histograms all over.
125 
126 --------------------
127 Linux debugging news
128 --------------------
129 ``18. 07. 2016``
130 
131 An interesting video about GDB is making circles on Reddit/HN:
132 
133 https://www.youtube.com/watch?v=PorfLSr3DDI
134 
135 TLDR: GDB has its own TUI ('console GUI'), which greatly improves usability;
136 with source view showing current line and breakpoints, assembly view and
137 register view. Not exactly groundbreaking compared to what IDEs and GUI GDB
138 front-ends can do, but - built in.
139 
140 Also, GDB can be easily scripted, on the fly, with python. This is very useful
141 to e.g. run a heisenbug-ridden program a thousand times, only to break
142 execution right when the heisenbug manifests.
143 
144 Also; it shows how to do reverse debugging with GDB; which I knew GDB can do,
145 but never actually learnt how to use.
146 
147 HN discussion also lead to another interesting open source debugging tool:
148 
149 http://rr-project.org/
150 
151 This runs a program and records all state of the running program into a file;
152 this state can then be replayed repeatedly (and reversibly); kinda like
153 ApiTrace (http://apitrace.org) for the whole program instead of just OpenGL.
154 
155 This should be extremely useful as you just need to record a bug *once*, and
156 can then re-run it as many times as needed, back and forth, let other people
157 look at the bug etc, until it's fixed; no replication needed.
158 
159 
160 ----------------------------------
161 (Micro-) Optimization notes on ARM
162 ----------------------------------
163 ``17. 07. 2016``
164 
165 Some of these are relevant for all architectures or plain obvious;
166 this is a reference list I'm using while looking through my code.
167 
168 * High-precision (and even plain old) clocks are unexpectedly slow
169 * Align copies/memsets to 64byte boundaries
170 * Division and modulo are really slow
171 * Access to globals is slow-ish
172 * Branch prediction - GCC likely/unlikely hints
173 * Reduce 64bit vars/ops (e.g. use decisecond time instead of nanoseconds)
174 * Less mutexes
175 
176 * Up to 4 32-bit params are through registers; any more through the stack (32bit ARM)
177 * 8bit/16bit are slightly slower on some cores
178 * Loops counting down to 0 are a bit faster
179 * Don't do ``memset()`` or other big writes in inner loops
180 * Merge neighboring loops, unroll loops (even partially) - might help or hurt
181   same for ``-funroll-loops``; try with and without
182 * ``a * b + c`` should result in FMA
183 * Avoid memsets/inits we can safely avoid
184 * ``restrict``
185 * LTO
186 * Small looops (64 bytes) will be fast. Smaller (32 bytes) will be faster