Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Unsigned sizes: A five year mistake (c3-lang.org)

119 points by lerno 7 days ago | 149 comments

ncruces 7 days ago [-]

It's not really signed vs unsigned that's the issue, IMO. It's (mostly, in C) undefined behavior and implicit conversions?

I'm not sure Go is saner just because len is an int. Well, maybe, depending on how you look at it. Defining len to be signed int, means the largest valid len is half your address space, which also means half of all possible indexes are always invalid; which makes some things easier.

But it's really that integer arithmetic is not undefined behavior regardless of signedness, that bounds are checked, and that even indexing your slice with an int64 on a 32-bit CPU does the full correct bounds check. In fact, you can use any integer type as an index.

Given all of the above, indexing with a uint or an int is actually indiferent. In that case, the bound check is a single unsigned <len compare (despite the fact that len is signed).

What's really painful, is trying to handle a full 32-bit address space with 32-bit addresses and sizes, like in Wasm; you need 33-bit math. So in a sense, limiting sizes to 31-bit (signed) does help. But at the language level, IMO, the rest matters more.

uecker 7 days ago [-]

For signed overflow we have sanitizers, and for conversions C compilers warnings in C. Bounds checking can also be done with sanitizers (but is a bit more tricky). So no, I do not think the undefined behavior is really a big problem. In fact, it helps us find the problem because every overflow can be considered a programming error.

Error due to unsigned wraparound are a much bigger issue, because the lead to subtle issues where neither automatic warnings nor sanitizers help, exactly because it is well-defined and no automatic tool can tell whether the behavior is intended or wrong.

tialaramex 7 days ago [-]

> Error due to unsigned wraparound are a much bigger issue

This is a type design mistake. The unsigned integers should not wrap by default. It makes absolute sense, given all the constraints and the fact that it's doing New Jersey "implementation simplicity dominates" design that K&R C only provides a wrapping unsigned type, but that's an excuse for K&R C which is a 1960s programming language.

The excuse gets shakier and shakier the further you move past that. C3 even named these types differently, so they're certainly under no obligation to provide the wrapping unsigned integers as if that's just magically what you mean. In most cases it's not what you mean. The excuse given in the article is way too thin.

Rust's Wrapping<u32> is the same thing as the wrapping 32-bit unsigned integer in C or C++ today, but most people don't use it because they do not actually want the wrapping 32-bit unsigned integer. This is a "spelling matters" ergonomics class again like the choice to name the brutally fast but unstable general comparison sort [T]::sort_unstable whereas both C and C++ leave the noob who didn't know about sort stability to find out for themselves because they name this just "sort" and you get to keep both halves when you break things...

uecker 6 days ago [-]

Unsigned is certainly a misnomer for a wrapping type. That does not mean it is a type design mistake. And I agree that people should not use it much.

But what I do not believe is that there is a real need for a non-wrapping non-negative integer type.

tialaramex 6 days ago [-]

> But what I do not believe is that there is a real need for a non-wrapping non-negative integer type.

So the most obvious counter example is so obvious you might not even have remembered it's a type, the unsigned 8-bit integer or byte.

But frankly if you don't have the wrapping mistake they just make for a pretty good general purpose index, they're a useful counter, there's a reason we called these the "Natural numbers".

uecker 6 days ago [-]

I am not convinced. A byte is for low-level accessing of memory, you shouldn't really do any computation with it, except maybe low-level bit-fiddling or crypto, but then the non-wrapping non-negative inter is not correct either.

Natural numbers are nice, but then we invented zero and negative number so we got a group structure for addition which is really useful. Because even for a counter, or some index, you may want to to addition and subtraction and then you definitely do not want a non-wrapping non-negative integer for intermediate results.

And the rust design with unsigned type where subtraction does not return a signed type but may fail at return or silently produce the wrong results, seems the worst possible design imaginable to me.

tialaramex 6 days ago [-]

Was this last part added or did I just miss it? Huh.

> And the rust design with unsigned type where subtraction does not return a signed type but may fail at return or silently produce the wrong results, seems the worst possible design imaginable to me.

You can ask for whatever you meant, and indeed asking for what you meant is crucial here because if we express ourselves we get the desired results.

For example u8::borrowing_sub lets us do the arithmetic style you may have learned in primary school in which we track whether we "borrowed" one because of our subtractions, this might be useful in some places and is certainly easier to understand.

u8::checked_sub tells us either the answer or that it would overflow, which might allow us to take a different course of action and not need the subtraction.

u8::saturating_sub performs saturating arithmetic, if it would overflow we get the largest value in the appropriate direction instead, this often makes sense in e.g. signal processing.

u8::unchecked_sub promises we know the subtraction doesn't overflow and so no checks are needed, this is a performance optimisation if you really need it.

u8::wrapping_sub_signed performs the wrapping arithmetic you say is sometimes a good idea, with specifically a signed i8 parameter rather than an unsigned one if we want that.

The truth here is that you might want a lot of different operations and the C choice is not only to provide a single choice, which made a lot more sense 50+ years ago than it does today, but to provide a singularly bad default.

uecker 6 days ago [-]

It was added, but immediately after the rest. I was just quickly refreshing my memory on what Rust was doing. I think it is a terrible design.

If you want special function which protects you from errors in specific scenarios it is easy enough to do this in C. But I do think the C defaults are actually ok and having all the wrapper functions and boiler plate has its downsides too (What I would admit is bad in C are the default warning modes in C compilers).

tialaramex 6 days ago [-]

Thanks for clarifying.

I cannot imagine we will end up agreeing, but it's good to understand why you made that edit

jstimpfle 6 days ago [-]

> The truth here is that you might want a lot of different operations and the C choice is not only to provide a single choice, which made a lot more sense 50+ years ago than it does today, but to provide a singularly bad default.

C has a single '+' operator, just like Rust has. And what that operator does depends on the types to the left and to the right. You can cast between integer types to achieve different behaviours depending on what you want.

About u8::unchecked_sub() etc, those are just regular functions. Not really a language thing. Yes, nothing of that is standardized in C AFAIK, but I'll happily use e.g. __builtin_add_overflow() or whatever in practice.

We can argue all day long what are the right defaults, checked or unchecked operations. If you want to be safe, you want the compiler to emit checks. It's probably possible to get some of those in GCC. If you want to emit streamlined machine code, you'll definitely not want to add checks after every machine instruction.

tialaramex 6 days ago [-]

> About u8::unchecked_sub() etc, those are just regular functions. Not really a language thing. Yes, nothing of that is standardized in C

Well "regular functions" in the sense that these are methods of the primitive type u8†, and of course neither C nor C++ can do that at all. So, yeah, it's a language thing.

In C++ what you'd do here instead is invent custom types and add the methods you want to the types, and I would give C++ credit here if the stdlib provided say, a bunch-of-bits base type with all the bit-twiddling methods defined and maybe specialisations for the 32-bit and 8-bit unsigned integers or something, but AFAICT it doesn't do anything like that.

"I could go out of my way to do this" is true for everything in any of the general purpose languages by their nature.

† In Rust if we define a function associated to a type T with a "self" first parameter then you can call that function as just a method on any value of type T and the appropriate parameter is inferred. So e.g. u8::checked_sub(u8::MAX, 10) is Some(245) but u8::MAX.checked_sub(10) is also Some(245) because it de-sugars to the same call.

jstimpfle 5 days ago [-]

__builtin_add_overflow() is type-generic as well, what is the big difference to this Rust stuff? I'd say Rust is more ergonomic here (standardized calls, I don't have to resort to GCC builtins). But it's not fundamentally different in capabilities.

I really don't care about functions vs methods, what is the difference, it's just syntax. Actually, keeping to regular function calls is mostly more readable to me, compared to using methods, using short unqualified method names, mixing function calls and methods calls, nesting/chaining functions and methods.

tialaramex 5 days ago [-]

Sure, other than the ergonomics it's the same. But, other than the ergonomics Fortran and C# are the same so what are we even talking about ?

Better ergonomics means it's more likely the programmer writes what they actually meant, and if you do that modern compilers have got better at making what you meant fast even as they remain the same or perhaps slightly worse at converting vague gestures which aren't clear about what you meant into what you had hoped for without expressing it.

tialaramex 6 days ago [-]

The wrapping APIs do come up a lot in cryptography, but in bit twiddling I think they're as often a hindrance because we actually want to be pulled up short if we're trying to squeeze things where they won't fit.

It have definitely written C code which tries to use 257 values for a byte, with zero playing both its role as "just zero" in some places, and then also serving as 256 because "it's never zero" in other places and of course this is a nasty bug if one of those "it's never zero" zeroes gets into the "it's just zero of course" code paths or vice versa.

The "Wrapping will fix my arithmetic ordering" thing is in this article too and I think that's also a terrible idea, maybe even worse than the wrapping unsigned integer types themselves because it leads to a muddled idea of what's really going on.

duped 6 days ago [-]

> The unsigned integers should not wrap by default.

What would you do instead?

Mond_ 6 days ago [-]

How about just panic? If a wrap happens and you don't expect it, it's almost always a severe bug.

Then, dedicated APIs for wrapping behavior where you expect it to happen.

duped 6 days ago [-]

Because it adds 4-6x overhead to all integer arithmetic

ncruces 6 days ago [-]

Do you always run with those sanitizers in place?

Just this week I've had a C compilers silently delete me an entire function call because of UB (infinite loop without side effects). Took me a day to figure out. So that's a problem for me.

I don't think I've ever had an hard to debug issue in Go because of signed/unsigned wrap around. Particularly a memory issue.

If anything, and there I guess I agree with the article, I wish Go had implicit conversions to wider types: to make the problematic ones stand out.

I guess the reason it doesn't is that they're different named types, which would be a problem when you create a named type for the purpose of forcing explicit type conversions. But maybe the default ones could implicitly implement a numeric tower, where exact conversions can be implicit.

uecker 6 days ago [-]

That depends. But some sanitizer are cheap enough that you can usually always run them.

Regarding infinite loops, C++ and C differ with C++ being more aggressive. But also compilers differ with clang being more aggressive. https://godbolt.org/z/Moe6zYKqo

In general, I do not recommend to use clang if you worry about UB. gcc is a bit more reasonable and also has better warnings.

duped 6 days ago [-]

> In fact, it helps us find the problem because every overflow can be considered a programming error.

High performance, lock-free FIFOs/channels are commonly implemented in a way that requires overflow.

Rendered at 22:27:37 GMT+0000 (Coordinated Universal Time) with Vercel.

#include <stdio.h> unsigned int pack_rgb(unsigned char r, unsigned char g, unsigned char b) { return (r << 16) | (g << 8) | b; } unsigned int pack_rgb_arith(unsigned char r, unsigned char g, unsigned char b) { return (r * 65536) + (g * 256) + b; } int main(void) { printf("The color value of (246, 176, 223) is %d\n", pack_rgb(246, 176, 223)); printf("The color value of (246, 176, 223) is %d\n", pack_rgb_arith(246, 176, 223)); }

- bitwise operations - modular arithmetic implemented with just ++, -- (ringbuffers, e.g TCP sequence numbers) - using the full range of a 8-bit, 16-bit, 32-bit datatype (quite common) - splitting a positive quantity into two smaller quantities, e.g. using a 16-bit index as 8-bit major index plus 8-bit minor index.