Staging Ground Beta 1 Recap, and Reviewers needed for Beta 2. 64- . , LZT OS. 0x000AE430 Notice the lower 4 bits are always 0. . What is the difference between #include and #include "filename"? How do you know it is 4 byte aligned, simply because printf is only outputting 4 bytes at a time? Why is this sentence from The Great Gatsby grammatical? For example, the 16-byte aligned addresses from 1000h are 1000h, 1010h, 1020h, 1030h, and so on. - RO, in which case it is RAO, indicating 8-byte SP alignment Notice the lower 4 bits are always 0. Fastest way to determine if an integer's square root is an integer. @MarkYisri: yes, I expect that in practice, every implementation that supports SSE2 instructions provides an implementation-specific guarantee that'll work :-), -1 Doesn't answer the question. What sort of strategies would a medieval military use against a fantasy giant? SSE (Streaming SIMD Extensions) defines 128-bit (16-byte) packed data types (4 of 32-bit float data) and access to data can be improved if the address of data is aligned by 16-byte; divisible evenly by 16. Is gcc's __attribute__((packed)) / #pragma pack unsafe? To learn more, see our tips on writing great answers. The memory alignment is important for performance in different ways. Fastest way to work with unaligned data on a word-aligned processor? Many programmers use a variant of the following line to find out if the array pointer is adequately aligned. What is the point of Thrower's Bandolier? Shouldn't this be __attribute__((aligned (8))), according to the doc you linked? I will definitely test it. ceo of robinhood ghislaine maxwell son check if address is 16 byte aligned | June 23, 2022 . Is a PhD visitor considered as a visiting scholar? A Cross-site request forgery (CSRF) vulnerability allows remote attackers to hijack the authentication of users for requests that modify all the settings. You can use memalign or posix_memalign if you want to ensure a specific alignment. Making statements based on opinion; back them up with references or personal experience. In 32-bit x86 systems, the alignment is mostly same as its size of data type. If you don't want that, I'd still think hard about using the standard version in most of your code, and just write a small implementation of it for your own use until you update to a compiler that implements the standard. One might even make the. It means not multiple or 4 or out of RAM scope? We simply mask the upper portion of the address, and check if the lower 4 bits are zero. Site design / logo 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA. Is it suspicious or odd to stand by the gate of a GA airport watching the planes? For SSE instructions, use 16 bytes, for AVX instructions32 bytes, and for the coprocessor instruction set64 bytes. Be aware of using custom struct member alignment. I have to work with the Intel icc compiler. 7. AFAIK, both memalign and posix_memalign are doing their job. Suppose that v "=" 32 * k + 16. For example. If the address is 16 byte aligned, these must be zero. *PATCH v3 15/17] build-many-glibcs.py: Enable ARC builds 2020-03-06 18:29 [PATCH v3 00/17] glibc port to ARC processors Vineet Gupta @ 2020-03-06 18:24 ` Vineet Gupta 2020-03-06 18:24 ` [PATCH v3 01/17] gcc PR 88409: miscompilation due to missing cc clobber in longlong.h macros Vineet Gupta ` (16 subsequent siblings) 17 siblings, 0 . Playing with, @PlasmaHH: yes, but GCC 4.5.2 (nor even 4.7.0) doesn't. Data alignment means that the address of a data can be evenly divisible by 1, 2, 4, or 8. Why are non-Western countries siding with China in the UN? In other words, data object can have 1-byte, 2-byte, 4-byte, 8-byte alignment or any power of 2. The nature of simulating nature: A Q&A with IBM Quantum researcher Dr. Jamie We've added a "Necessary cookies only" option to the cookie consent popup. Lets illustrate using pointers to the addresses 16 (0x10) and 92 (0x5C). Why do small African island nations perform better than African continental nations, considering democracy and human development? To my knowledge a common SSE-optimized function would look like this: However, how do I correctly determine if the memory ptr points to is aligned by e.g. accident in butte, mt today; ramy abbas issa net worth; check if address is 16 byte aligned This concept is used when defining pointer conversion: 6.3.2.3 A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. This function is useful for over-aligned allocations, such as to SSE, cache line, or VM page boundary. Im not sure about the meaning of unaligned address. How can I measure the actual memory usage of an application or process? Some architectures call two bytes a word, and four bytes a double word. Addresses are allocated at compile time and many programming languages have ways to specify alignment. Approved syntax for raw pointer manipulation. Is the SSE unaligned load intrinsic any slower than the aligned load intrinsic on x64_64 Intel CPUs? Therefore, the load has to be unaligned which *might* degrade performance. How do I determine the size of my array in C? Not the answer you're looking for? For example, the declaration: int x __attribute__ ( (aligned (16))) = 0; causes the compiler to allocate the global variable x on a 16-byte boundary. The compiler will do the following: - Treat the loop iterations i =0 and i = 1 sequentially (loop peeling). Memory alignment for SSE in C++, _aligned_malloc equivalent? rev2023.3.3.43278. What's the purpose of aligned data for memory address, Styling contours by colour and by line thickness in QGIS. What's the best (simplest, most reliable and portable) way to specify that it should always be aligned to a 64-bit address, even on a 32-bit build? For instance, if the address of a data is 12FEECh (1244908 in decimal), then it is 4-byte alignment because the address can be evenly divisible by 4. Regular malloc aligns memory suitable for any object type (which, in practice, means that it is aligned to alignof(max_align_t)). The nature of simulating nature: A Q&A with IBM Quantum researcher Dr. Jamie We've added a "Necessary cookies only" option to the cookie consent popup. How is Physical Memoy mapped in Kernal space? How to know if the address is 64 bit aligned? Notice the lower 4 bits are always 0. check if address is 16 byte aligned. Asking for help, clarification, or responding to other answers. For such an implementation, foo * -> uintptr_t -> foo * would work, but foo * -> uintptr_t -> void * and void * -> uintptr_t -> foo * wouldn't. For example, an aligned 32 bit access will have the bottom 4 bits of the address as 0x0, 0x4, 0x8 and 0xC assuming the memory is byte addressed. If they aren't, the address isn't 16 byte aligned . How can I measure the actual memory usage of an application or process? Did any DOS compatibility layers exist for any UNIX-like systems before DOS started to become outmoded? How do I connect these two faces together? In this context, a byte is the smallest unit of memory access, i.e. This means that even if you read 1 byte from memory, the bus will deliver a whole 64bit (8 byte word). The nature of simulating nature: A Q&A with IBM Quantum researcher Dr. Jamie We've added a "Necessary cookies only" option to the cookie consent popup. As you can see a quite complicated (thus slow) operation. Browse other questions tagged, Where developers & technologists share private knowledge with coworkers, Reach developers & technologists worldwide. How to allocate aligned memory only using the standard library? But some non-x86 ISAs. I know gcc'smalloc provides the alignment for 64-bit processors. [[gnu::aligned(64)]] in c++11 annotation For example, if you have a 32-bit architecture and your memory can be accessed only by 4-byte for a address multiple of 4 (4bytes aligned), It would be more efficient to fit your 4byte data (eg: integer) in it. This is no longer required and alignas() is the preferred way to control variable alignment. 16/32/64/128b) alignedness is identical for virtual and physical addresses. The short answer is, yes. You can declare a variable with 16-byte aligned in MSVC, using __declspec(align(16)) keyword; Dynamic array can be allocated using _aligned_malloc() function, and deallocated using _aligned_free(). A memory access is said to be aligned when the data being accessed is n bytes long and the datum address is n-byte aligned. You may use "pack" pragma directive to specify different packing alignment for struct, union or class members. The typical use case will be 64-bit platform and pointer heavy data structures, giving me three tag bits, but I want to make sure the code still works if compiled 32-bit. One solution to the problem of ever slowing memory, is to access it on ever wider busses, instead of accessing 1 byte at a time, the CPU will read a 64 bit wide word from the memory. Please click the verification link in your email. If true portability is your goal, binary compatibility of serialized data should probably not be an additional goal though. If you want start address is aligned, you should use aligned_alloc: Show 5 more items. so I can amend my answer? Time arrow with "current position" evolving with overlay number. Compiler aligns variables on their natural length boundaries. A place where magic is studied and practiced? meaning , if the first position is 0x0000 then the second position would be 0x0008 .. what is the advantages of these 8 byte aligned type ? Whenever I allocate a memory space with malloc function, the address is aligned by 16 bytes. Redoing the align environment with a specific formatting, Theoretically Correct vs Practical Notation. Good one . What are aligned addresses? Visual C++ permits types that have extended alignment, which are also known as over-aligned types. Due to easier calculation of the memory address or some thing else ? you could check alignment at runtime by invoking something like, To check that bad alignments fail, you could do. Best: supply an allocator that provides 16-byte aligned memory. That is why logical operators are used to make the first digit zero in hex number. GCC implements taking the address of a nested function using a technique -called @dfn{trampolines}. For instance, if the address of a data is 12FEECh (1244908 in decimal), then it is 4-byte alignment because the address can be evenly divisible by 4. structure C - Every structure will also have alignment requirements If you continue to use this site we will assume that you are happy with it. For instance (ad & 0x7) == 0 checks if ad is a multiple of 8. Also is there any alignment for functions? This is what libraries like Botan and Crypto++ do for algorithms which use SSE, Altivec and friends. How to determine the size of an object in Java. If so, variables are stored always in aligned physical address too? If a law is new but its interpretation is vague, can the courts directly ask the drafters the intent and official interpretation of their law? In this context a byte is the smallest unit of memory access, i.e . Refrigerate until set. @Pascal Cuoq, gcc notices this and emits the exact same code for, I upvoted you, but only because you are using unsigned integers :), @jww I'm not sure I understand what you mean. How do I discover memory usage of my application in Android? By clicking Post Your Answer, you agree to our terms of service, privacy policy and cookie policy. By clicking Accept all cookies, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy. How do I set, clear, and toggle a single bit? Second has 2 and third one has a 7, neither of which are divisible by 4. Find centralized, trusted content and collaborate around the technologies you use most. Since, byte is the smallest unit to work with memory access So, after C000_0004 the next 64 bit aligned address is C000_0008. It doesn't really matter if the pointer and integer sizes don't match. While going through one project, I have seen that the memory data is "8 bytes aligned". @pawe-bylica, you're probably correct. Do I need a thermal expansion tank if I already have a pressure tank? 8. The application of either attribute to a structure or union is equivalent to applying the attribute to all contained elements that are not explicitly declared ALIGNED or UNALIGNED. It will remove the false positives, but still leave you with some conforming implementations on which the union fails to create the alignment you want, and hence fails to compile. Is the definition of "volatile" this volatile, or is GCC having some standard compliancy problems? And, you may have from 0 to 15 bytes misaligned address. "), @milleniumbug he does align it in the second line, @MarkYisri It's also not "how to align a buffer?". Post author: Post published: June 12, 2022 Post category: thinkscript bollinger bands Post comments: is tara lipinski still married is tara lipinski still married Has 90% of ice around Antarctica disappeared in less than a decade? 1, the general setting of the alignment of 1,2,4 bytes of alignment, VC generally default to 4 bytes (maximum of 8 bytes). Browse other questions tagged, Where developers & technologists share private knowledge with coworkers, Reach developers & technologists worldwide. This can be used to move unaligned data to an aligned address. if the memory data is 8 bytes aligned, it means: sizeof(the_data) % 8 == 0. generally in C language, if a structure is proposed to be 8 bytes aligned, its size must be multiplication of 8, and if it is not, padding is required manually or by compiler. Is a collection of years plural or singular? I wouldn't have thought it's difficult to do. How do I set, clear, and toggle a single bit? . To subscribe to this RSS feed, copy and paste this URL into your RSS reader. 0X0E0D8844. Understanding stack alignment. The Disney original film Chip 'n Dale: Rescue Rangers seemingly managed to pull off a trifecta with a reboot of the Rescue Rangers franchise that won over fans of the original series, young . Thanks! What can a lawyer do if the client wants him to be acquitted of everything despite serious evidence? In this post,I hope to shed some light on areally simple but essential operation to figure out if memory is aligned at a 16 byte boundary. In short an unaligned address is one of a simple type (e.g., integer or floating point variable) that is bigger than (usually) a byte and not evenly divisible by the size of the data type one tries to read. We use cookies to ensure that we give you the best experience on our website. In a food processor, pulse the graham crackers, white sugar, and melted butter until combined. How do I connect these two faces together? Recovering from a blunder I made while emailing a professor, "We, who've been connected by blood to Prussia's throne and people since Dppel". What sort of strategies would a medieval military use against a fantasy giant? Thanks for contributing an answer to Stack Overflow! rev2023.3.3.43278. The cryptic if statement now becomes very clear and intuitive. Thanks. Seems to me that the most obvious way to do this would be to use Boost's implementation of aligned_storage (or TR1's, if you have that). It would allow you to access it in one memory read instead of two if it is not aligned. Before the alignas keyword, people used tricks to finely control alignment. I don't know what versions of gcc and clang support alignof, which is why I didn't use it to start with. On average there will be 15 check bits per address, and the net probability that a randomly generated address if mistyped will accidentally pass a check is 0.0247%. @JohnDibling: I know. But you have to define the number of bytes per word. 16 . I am waiting for your second reason. If the address is 16 byte aligned, these must be zero. Is a collection of years plural or singular? As a consequence, v + 2 is 32-byte aligned. rev2023.3.3.43278. For instance, a struct is aligned as its largest field. By clicking Post Your Answer, you agree to our terms of service, privacy policy and cookie policy. Most of the entries in the NAME column of the output from lsof +D /tmp do not begin with /tmp. To subscribe to this RSS feed, copy and paste this URL into your RSS reader. So what is happening? What remains is the lower 4 bits of our memory address. When you aligned the . address should not take reserved memory. (Linux kernel uses and operation too fyi). You may re-send via your, Alignment of returned address from malloc(), Intel Connectivity Research Program (Private), oneAPI Registration, Download, Licensing and Installation, Intel Trusted Execution Technology (Intel TXT), Intel QuickAssist Technology (Intel QAT), Gaming on Intel Processors with Intel Graphics. My code is GPL licensed, can I issue a license to have my code be distributed in a specific MIT licensed project? how to write a constraint such that it generates 16 byte addresses. If the source pointer is not two-byte aligned, though, the fix-up fails and you get a SIGSEGV. For a word size of N the address needs to be a multiple of N. After almost 5 years, isn't it time to accept the answer and respectfully bow to vhallac? The standard also leaves it up to the implementation what happens when converting (arbitrary) pointers to integers, but I suspect that it is often implemented as a noop. If you are working on traditional architecture, you really don't need to do it. For a time,gcc had situations not shared by icc where stack objects weren't aligned. If they arent, the address isnt 16 byte aligned and we need to pre-heat our SIMD loop. Or if your algorithm is idempotent (like. Tags C C++ memory programming. Do new devs get fired if they can't solve a certain bug? 2018-01-29. not yet calculated. Asking for help, clarification, or responding to other answers. 16 byte alignment will not be sufficient for full avx optimization. Since memory on most systems is paged with pagesizes from 4K up and alignment is usually matter of orders of magnitude less (typically bus width, i.e. You should always use the and operation. std::atomic ob [[gnu::aligned(64)]]. Site design / logo 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA. Some CPUs will not even perform such a misaligned load - they will simply raise an exception (or even silently load the wrong data!). Casting a void pointer to check memory alignment, Fatal signal 7 (SIGBUS) using some PCL functions, Casting general-pointer to int-pointer for optimization. I think it is related to the quality of vectorization and I definitely need to make sure the malloc function of icc also supports the alignment. vegan) just to try it, does this inconvenience the caterers and staff? stm32f103c8t6 No, you can't. You should use __attribute__((aligned(8)). I will use theoretical 8 bit pointers to explain the operation. Why are non-Western countries siding with China in the UN? The first address of the structure must be an integer multiple of the widest type in the structure; In addition, each member of the structure must start at an integer multiple of its own type size (it is important to note . UNIX is a registered trademark of The Open Group. This is the first reason one likes aligned memory access. I think that was corrected before gcc 4.4.7, which has become outdated . Accesses to main memory will be aligned if the address is a multiple of the size of the object being tracked down as given by the formula in the H&P book: For a word size of 4 bytes, second and third addresses of your examples are unaligned. Acidity of alcohols and basicity of amines. In code that targets 64-bit platforms, it's 16 bytes.) Some architectures call two bytes a word, and four bytes a double word. If they arent, the address isnt 16 byte aligned and we need to pre-heat our SIMD loop. Can anyone assist me in accurately generating 16byte memory aligned data for icc on linux platform. I always like checking my input, so hence the compile time assertion. rev2023.3.3.43278. This also means that your array is properly aligned on a 16-byte boundary. Why are trials on "Law & Order" in the New York Supreme Court? ), Acidity of alcohols and basicity of amines. // and use this pointer to read or write data into array, // dellocate memory original "array", NOT alignedArray. How to allocate 16byte memory aligned data, How Intuit democratizes AI development across teams through reusability. When writing an SSE algorithm loop that transforms or uses an array, one would start by making sure the data is aligned on a 16 byte boundary. Press into the bottom of a 913 inch baking dish in a flat layer. What is a word for the arcane equivalent of a monastery? Asking for help, clarification, or responding to other answers. This memory access can be aligned or unaligned, and it all depends on the address of the variable pointed by the data pointer. Why is the difference between id(2) and id(1) equal to 32? And, you may have from 0 to 15 bytes misaligned address. Compiling an application for use in highly radioactive environments. To learn more, see our tips on writing great answers. For example, if you have 1 char variable (1-byte) and 1 int variable (4-byte) in a struct, the compiler will pads 3 bytes between these two variables. Minimising the environmental effects of my dyson brain, Movie with vikings/warriors fighting an alien that looks like a wolf with tentacles, ERROR: CREATE MATERIALIZED VIEW WITH DATA cannot be executed from a function. Is a collection of years plural or singular? For example, on a 32-bit machine, a data structure containing a 16-bit value followed by a 32-bit value could have 16 bits of padding between the 16-bit value and the 32-bit value to align the 32-bit value on a 32-bit boundary. This differentiation still exists in current CPUs, and still some have only instructions that perform aligned accesses. How to prove that the supernatural or paranormal doesn't exist? For what it's worth, here's a quick stab at an implementation of aligned_storage based on gcc's __attribute__(__aligned__, directive: A quick test program to show how to use this: Of course, in real use you'd wrap up/hide most of the ugliness I've shown here. The nature of simulating nature: A Q&A with IBM Quantum researcher Dr. Jamie We've added a "Necessary cookies only" option to the cookie consent popup. Find centralized, trusted content and collaborate around the technologies you use most. Therefore, you need to append 15 bytes extra when allocating memory. You can use an array of structures, each containing a single float, with the aligned attribute: The address returned by memalign function is 0x11fe010, which is a multiple of 0x10. How to change Kernel Base address when compiling Linux? On a 32 bit architecture that doesn't 8-align either, How Intuit democratizes AI development across teams through reusability. *PATCH 1/4] tracing: Add creation of instances at boot command line 2023-01-11 14:56 [PATCH 0/4] tracing: Addition of tracing instances via kernel command line Steven Rostedt @ 2023-01-11 14:56 ` Steven Rostedt 2023-01-11 16:33 ` Randy Dunlap 2023-01-12 23:24 ` Ross Zwisler 2023-01-11 14:56 ` [PATCH 2/4] tracing: Add enabling of events to boot . Find centralized, trusted content and collaborate around the technologies you use most. To learn more, see our tips on writing great answers. How to determine if address is word aligned, How Intuit democratizes AI development across teams through reusability. // because in worst case, the data can be misaligned upto 15 bytes. Does the icc malloc functionsupport the same alignment of address? Styling contours by colour and by line thickness in QGIS, "We, who've been connected by blood to Prussia's throne and people since Dppel". For the first structure test1 the short variable takes 2 bytes. A 64 bit address has 8 bytes. Hence. Data structure alignment is the way data is arranged and accessed in computer memory. A modern PC works at about 3GHz on the CPU, with a memory at barely 400MHz). Dynanically allocated data with malloc() is supposed to be "suitably aligned for any built-in type" and hence is always at least 64 bits aligned. check if address is 16 byte aligned. By clicking Post Your Answer, you agree to our terms of service, privacy policy and cookie policy. ", not "how to allocate some aligned memory? A multiple of 8. The compiler is maintaining a 16-byte alignment of the stack pointer when a function is called, adding padding . Page 29 Set the parameters correctly. Copy. How to properly resolve increase in pointer alignment with clang? Better: use a scalar prologue to handle the misaligned elements up to the first alignment boundary. Why do we align data? How do I determine the size of an object in Python? Can anyone please explain what this means? If not, a single warmup pass of the algorithm is usually performedto prepare for the main loop. What does alignment to 16-byte boundary mean . Welcome to Alignment Health Plans Provider web page! It may cause serious compatibility issues, for example, linking external library using different packing alignments. Given a buffer address, it returns the first address in the buffer that respects specific alignment constraints and can be used to find a proper location in a buffer if variable reallocation is required. You only care about the bottom few bits. Throughout, though, the hit Amazon Prime Video show has done a remarkable job of making all of its characters feel like real . Compilers can start structs on 16-bit boundaries without a speed penalty, even if the first member was a 32-bit scalar. Misaligned data slows down data access performance, // size = 2 bytes, alignment = 1-byte, address can be divisible by 1, // size = 4 bytes, alignment = 2-byte, address can be divisible by 2, // size = 8 bytes, alignment = 4-byte, address can be divisible by 4, // size = 16 bytes, alignment = 8-byte, address can be divisible by 8, // size = 9, alignment = 1-byte, no padding for these struct members. However, your x86 Continue reading Data alignment for speed: myth or reality? The process multiply the data by a constant. 0xC000_0007 How is Jesus " " (Luke 1:32 NAS28) different from a prophet (, Luke 1:76 NAS28)? Staging Ground Beta 1 Recap, and Reviewers needed for Beta 2. By clicking Post Your Answer, you agree to our terms of service, privacy policy and cookie policy. When the compiler can see that alignment is inherited from malloc , it is entitled to assume alignment. How to follow the signal when reading the schematic? Browse other questions tagged, Where developers & technologists share private knowledge with coworkers, Reach developers & technologists worldwide. Where does this (supposedly) Gibson quote come from? Then operate on the 16-byte aligned buffer without the need to fixup leading or tail elements. 1. So, a total of 12 bytes of memory is . How Intuit democratizes AI development across teams through reusability. alignment requirement that objects of a particular type be located on storage boundaries with addresses that are particular multiples of a byte address. When a memory access is not aligned, it is said to be misaligned. 16 byte alignment will not be sufficient for full avx optimization. We simply mask the upper portion of the address, and check if the lower 4 bits are zero. @user2119381 No. &A[0] = 0x11fe010 Why are all arrays aligned to 16 bytes on my implementation? Then you can still use SSE for the 'middle' ones Hm, this is a good point. Is it suspicious or odd to stand by the gate of a GA airport watching the planes? Unlike functions, RSP is aligned by 16 on entry to _start, as specified by the x86-64 System V ABI.. From _start, you're ready to call a function right away, without having to adjust the stack, because the stack should be . To check if an address is 64 bits aligned, you just have to check if its 3 least significant bits are null. Why use _mm_malloc? What Is the Difference Between 'Man' And 'Son of Man' in Num 23:19? An object that is "8 bytes aligned" is stored at a memory address that is a multiple of 8. Best: supply an allocator that provides 16-byte aligned memory. But then, nothing will be. What is data alignment C? There's no need to worry about alignment of, Take note that you shouldn't use a real MOD operation, it's quite an expensive operation and should be avoided as much as possible. Secondly, there's posix_memalign to be sure. Does a barbarian benefit from the fast movement ability while wearing medium armor? What's your machine's word size? It will unavoidably lead to: If you intend to have every element inside your vector aligned to 16 bytes, you should consider declaring an array of structures that are 16 byte wide. I think I have to include the regular C code path for non-aligned memory as I cannot make sure that every memory passed to this function will be aligned. Find centralized, trusted content and collaborate around the technologies you use most. Therefore, only character fields with odd byte lengths can ever cause padding. Say you have this memory range and read 4 bytes: More on the matter in Documentation/unaligned-memory-access.txt. Thanks for contributing an answer to Stack Overflow! Is there a proper earth ground point in this switch box? Portable? Does it make any sense to use inline keyword with templates? How can I explicitly free memory in Python? EDIT: Sorry I misread. With modern CPU, most likely, you won't feel il (maybe a few percent slower, but it will be most likely in the noise of a basic timer measurement). If an address is aligned to 16 bytes, is it also aligned to 8 bytes? reserved memory is 0x20 to 0xE0. But there was no way, for instance, to insure that a struct with 8 chars or struct with a char and an int are 8 bytes aligned. Connect and share knowledge within a single location that is structured and easy to search. @Benoit: If you need to align a struct on 16, just add 12 bytes of padding at the end @VladLazarenko, Works, but not nice and portable. When you have identified the loops that might get some speedup with alignement, you need to: - Align the memory: you might use _mm_malloc, - Tell the compiler that the pointer you are going to use is aligned: you might use OpenMP 4 (#pragma omp simd aligned(p : 32)) or the Intel extension special __assume_aligned.