- new
- past
- show
- ask
- show
- jobs
- submit
Yes I miss them in Java as primitives, however there are utility methods for unsigned arithmetic, that get it right.
And despite all pitfalls especially around mixing signed and unsigned in C, unsigned types are very useful, I'd in fact say that for low-level programming they are essential.
Books like Yourdon Structured Method were mainly targeted to business C back in the day.
#include <stdio.h>
unsigned int pack_rgb(unsigned char r, unsigned char g, unsigned char b) {
return (r << 16) | (g << 8) | b;
}
unsigned int pack_rgb_arith(unsigned char r, unsigned char g, unsigned char b) {
return (r * 65536) + (g * 256) + b;
}
int main(void)
{
printf("The color value of (246, 176, 223) is %d\n", pack_rgb(246, 176, 223));
printf("The color value of (246, 176, 223) is %d\n", pack_rgb_arith(246, 176, 223));
}
Compiler Explorer link, https://godbolt.org/z/3jExdaTT9I would expect a better comment from someone working on the standard.
It's pretty rare to have values that can be negative but are always integers. At least in the work I do. The most common case I encounter are approximations of something related to log probability. Such as various scores in dynamic programming and graph algorithms.
Most of the time, when you deal with integers, you need special handling to avoid negative values. Once you get used to thinking about unsigned integers, you quickly develop robust ways of avoiding situations where the values would be negative.
You are right that often you need to constrain an integer to be non-negative or positive, but usually not during arithmetic, but at certain points in the logic of a program. But then in my experience it is better expressed as some assertion.
I've also done a lot of succinct data structures, data compression, and things like that. When you manipulate the binary representation directly, it's easier to connect representation to unsigned semantics than to signed semantics.
Unsigned integers are usually integers modulo 2^n, which gives them a convenient algebraic structure. Whether you find that intuitive or not probably depends on your education. From my perspective, abstract algebra and discrete mathematics are things you learn in the first year of your CS degree.
I've mostly seen this used by cryptographers to prove results over arbitrary bitwidths, since you can prove it over the p-adics and deal with truncation/division separately.
Ask an ordinary person what 3 * (1/3) or 3 * (-1/3) should come to, and they aren't going to say any of the results that you get in C for either signed or unsigned int types.
The source of confusion is that unsigned is a terrible name. Unsigned does not mean non-negative. Its 100% complete valid to assign a negative value to an unsigned, it just fails silently.
If you want non-negative integers, then you should make a wrapper class that enforces non-negativity at compile and runtime.
C’s implicit casts are tripping you up. Unsigned ints can’t be negative, but C will happily let you assign a negative signed int to an unsigned int variable, but the moment it is assigned it ceases to be negative. In serious programming languages this implicit assignment is forbidden—you have to explicitly cast.
> For example, looping to the 2nd to last item in an array or getting the index before the given index.
I don’t understand what you mean here, can you clarify?
> If you want non-negative integers, then you should make a wrapper class that enforces non-negativity at compile and runtime.
Unsigned integers are the compile time side of the coin, but yes you may want to take care to enforce it at runtime as well, though this typically implies a performance penalty that most don’t want to pay.
Sentence 2: Catching implicit conversion errors is idealistic, pedantic, and prevents you from doing your job.
Great stuff. 10/10. No notes.
C23 updated the definition of the [] operator to disallow negative subscripts with array type. I think you have to explicitly convert the array to a pointer type now.
int a[2];
a[-1]; // not ok
(&a[0])[-1] // ok
C23: https://cstd.eisie.net/c2y.html#6.5.3.2best off having a bespoke type that understands how big the array its indexing is
You never want any element of an array, except elements within the range [0, array_length). Anything outside of that is undefined behavior.
I think people tend to overthink this. A function which takes an index argument, should simply return a result when the index is within the valid range, and error if it's outside of it (regardless of whether it's outside by being too low or too high). It doesn't particularly matter that the integer is signed.
If you aren't storing 2^64 elements in your array (which you probably aren't - most systems don't even support addressing that much memory) then the only thing unsigned gets you is a bunch of footguns (like those described in the OP article).
You can deal with this by casting before doing the subtraction, or you can deal with it by storing the indices as signed integers at all times. The latter is more ergonomic at the cost of wasted capacity.
However I do concede writing a few helper methods isn't that much of a burden.
I don't see this hate in Rust. I think this is a big thing in the C-related languages, and that the author has chosen to pretend that's the same for any "systems language" but it is not.
Nor it is in C++. Most default flag setup will report implicit conversions on numerical as error in C++
Unsigned / signed is mainly an issue if your language chose to silent implicit conversions.
Which honestly, is terrible design beyond simply signed / unsigned.
There are a few of those, but that is the niche case. Certainly when we're talking about 64-bit size types. And if you want to cater to smaller size types, then just just template over the size type. Or, OK, some other trick if it's C rather than C++.
Losing half the range to make them signed when you only care about positive values 95% of the time (and in the rare case when you do any modulo on top of them you can cast, or write wrappers for that), is just a bad trade-off.
Yes, you've still then only doubled the range to 2^32, and you'll still hit it at some point, but that extra byte can make a lot of difference from a memory/cache efficiency standpoint without jumping to 64-bit.
So very often uint32_t is a very good sweet spot for size: int32_t is sometimes too small, and (u)int64_t is generally not needed and too wasteful.
Those are not sizes of data structures.
> Losing half the range
It's not a part of the range of sizes they can use, with any typical data structure.
> Losing half the range to make them signed when you only care about positive values 95% of the time is just a bad trade-off.
It's the right choice sizes in the standard library (in C++) or standard-ish/popular libraries in C. And - again, it's the wrong type. For example, even if you only care about positive values, their difference is not necessarily positive.
In theory, it should have been at most a year. In practice, Windows XP /3GB boot switch (allocates 3 GB of virtual address space user mode and 1 GiB for the kernel instead of the usual 2 and 2) was relevant for many years.
So that /3GB switch is for people who are stuck on the wrong hardware for a variety of reasons, and the timing is about how long those people stayed trapped rather than how long before this became a bad idea (it was a bad idea before it even shipped, but it was necessary)
Linux had some more extreme splits including a 3.5:0.5 split and a nasty 4:4 split (in which all the userspace addresses are invalidated when in kernel space, ugh) and it's for the same reason, these aren't customers who chose not to go to 64-bit, they're customers who can't yet and will pay $$$$ to keep what they are doing working for just a while longer anyway despite that.
TBH I've had very little struggle with this at all. As long as you keep your values and types separate, the unsigned type that you got a number from originally feeds just fine into the unsigned type that you send it to next. Needing casting then becomes a very clear sign that you're mixing sources and there be dragons, back up and fix the types or stop using the wrong variable. It's a low-cost early bug detector.
Implicitly casting between integer types though... yeah, that's an absolute freaking nightmare.
Part of me feels like direct numeric array indexing is one of the last holdouts of a low-level operation screaming for some standardized higher-level abstraction. I'm not saying to get rid of the ability to index directly, but if the error-resistant design here is to use numeric array indices as though they were opaque handles, maybe we just need to start building support for opaque handles into our languages, rather than just handing out numeric indices and using thoughts and prayers to stop people from doing math on them.
For analogy, it's like how standardizing on iterators means that Rust code rarely needs to index directly, in contrast with C, where the design of for-loops encourages indexing. But Rust could still benefit from opaque handles to take care of those in-between cases where iterators are too limiting and yet where numeric indices are more powerful than needed.
This paragraph reminds me a bit of Dex: https://arxiv.org/abs/2104.05372
Using a unique type per array instance though, that I quite like, and in index-heavy code I often build that explicitly (e.g. in Go) because it makes writing the code trivially correct in many cases. Indices are only very rarely shared between arrays, and exceptions can and should look different because they need careful scrutiny compared to "this is definitely intended for that array".
However unsigned integers are still very useful, I'd say essential, in low-level programming. For example when doing buffer management and memory allocation.
- bitwise operations
- modular arithmetic implemented with just ++, -- (ringbuffers, e.g TCP sequence numbers)
- using the full range of a 8-bit, 16-bit, 32-bit datatype (quite common)
- splitting a positive quantity into two smaller quantities, e.g. using a 16-bit index as 8-bit major index plus 8-bit minor index.
etc.Don't forget that the signed vs unsigned integer is in some sense an artificial distinction. Machines have you put the distinction in the CPU instructions themselves, they don't track a "signed" property as part of values. And it can make sense to use the same value in different ways. However, C and many other languages decided to put a tag on the type, so operator syntax can be agnostic to signedness, and the compiler will choose the appropriate CPU instruction.
It mostly comes up with widening conversions (signed numbers must extend the sign bit, unsigned numbers set the extra bits to zero), unsigned/signed divide (and multiply, in case of a widened result) and greater than/less than comparisons (and of course geq/leq). (With signed comparison, A is less than B if by starting from INT_MIN (included) and iteratively incrementing you reach A before B. With unsigned comparison, A is less than B if by starting from 0 (included) and iteratively incrementing you reach A before B. This way of phrasing comparison as range inclusion is convenient, since it works around the wrapping concern in a rather clean way.)
Alas, (2S * signed_index) / 2S will similarly result in surprises the moment the signed_index hits half the signed-int max. There's no free lunch when trying to cheat the integer ranges.
The downside is a pervasive, constant footgun every time you are dealing with indices.
Well … we even mention Rust in the paragraph right before this. In Rust, you can up a u16 to a u64 this way:
let bigger: u32 = x.into();
or let bigger = u32::from(x);
The conversion `from` is infallible, because a u16 always fits in a u32. There is no `from(u64) -> u32`, because as the article notes, that would truncate, so if we did change the type to u64, the code would now fail to compile. (And we'd be forced to figure out what we want to do here.)(There are fallible conversions, too, in the form of try_from, that can do u64 → u32, but will return an error if the conversion fails.)
Similarly, for,
for (uint x = 10; x >= 0; x--) // Infinte loop!
This is why I think implicit wrapping is a bad idea in language design. Even Rust went down the wrong path (in my mind) there, and I think has worked back towards something safer in recent years. But Rust provides a decent example here too; this is pseudo-code: for (uint x = 10; x.is_some(); x = x.checked_sub(1))
Where `checked_sub` is returns `None` instead of wrapping, providing us a means to detect the stopping point. So, something like that. (Though you'd probably also want to destructure the option into the uint for use inside the loop.) Of course, higher-level stuff always wins out here, I think, and in Rust you wouldn't write the above; instead something like, for x in (0..=10).rev()
(And even then, if we need indexes; usually, one would prefer to iterate through a slice or something like that. The higher-level concept of iterators usually dispenses with most or all uses of indexes, and in the rare cases when needed, most languages provide something like `enumerate` to get them from the iterator.)* Do your utmost to rewrite the code in order to avoid doing that (e.g. reordering disequations to transform subtractions into additions). * If not possible, think very hard about any possible edge case: you most certainly need an additional `if` to deal with those. * When analyzing other people's code during troubleshooting merge reviews, assume any formula involving an unsigned integer and a minus sign is wrong.
The potential bugs listed would be prevented by, e.g. "x--" won't compile without explicitly supplying a case for x==0 OR by using some more verbose methods like "decrement_with_wrap".
The trade-off is lack of C-like concise code, but more safe and explicit.
Except that's not quite what unsigned types do. They are not (just) numbers that will always be >= 0, but numbers where the value of `1 - 2` is > 1 and depends on the type. This is not an accident but how these types are intended to behave because what they express is that you want modular arithmetic, not non-negative integers.
> e.g. "x--" won't compile without explicitly supplying a case for x==0
If you want non-negative types (which, again, is not what unsigned types are for) you also run into difficulties with `x - y`. It's not so simple.
There are many useful constraints that you might think it's "better to have a type that reflects that" - what about variables that can only ever be even? - but it's often easier said than done.
unsigned foo(unsigned a, unsigned b) { return a - b; }
but this would: unsigned foo(unsigned a, unsigned b) {
auto c = a - b;
return c >= 0 ? c : 0;
}
Assuming 32 bit unsigned and int, the type of c should be computed as the range [-0xffffffff, 0xffffffff], which is different from int [-0x100000000, 0x7fffffff]. Subtle things like this are why I think it is generally a mistake to type annotate the result of a numerical calculation when the compiler can compute it precisely for you.Second, it's not as simple as you present. What is the type of c? Obviously it needs to be signed so that you could compare it to zero, but how many bits does it have? What if a and b are 64 bit? What if they're 128 bit?
You could do it without storing the value and by carrying a proof that a >= b, but that is not so simple, either (I mean, the compiler can add runtime checks, but languages like C don't like invisible operations).
I agree they're a bit more error-prone in practice, but I suspect a huge part of that is because people are so used to signed numbers because they're usually the default (and thus most examples assume signed, if they handle extreme values correctly at all (much example code does not)). And, legitimately, zero is a more commonly-encountered value... but that can push errors to occur sooner, which is generally a desirable thing.
As someone else already pointed out, that's undefined behaviour in C and C++ (in Java they wrap), but the more important point is that the vast majority of integers used in programs are much closer to zero than to int_min/max. Sizes of buffers etc. tend to be particularly small. There are, of course, overflow problems with signed integers, but they're not as common.
No, that's undefined behavior in C, and if you care about correctness, you run at least your testsuite in CI with -ftrapv so it turns into an abort().
Besides, for safety there are much clearer options, like wrapping_add / saturating_add. Aborting is great as a safety tool though, agreed - it'd be nice if more code used it.
If you have "uint x" and "uint y", then for "x - y", the programmer should explicitly write two cases (a) no underflow, i.e. x >= y, and (b) underflow, x < y. The syntax for that... that is an open question.
> what about variables that can only ever be even
Yes, maybe you should have an "EvenInt" type, if that is important. Maybe you should be able to declare a variable to be 7...13, just like a "uint8" can declare something 0...255. Of course, the type-checker can get complicated, and perhaps simply fail to type-check some things. But, having compile-time constraints to what you know your variables will be is good, IMHO.
My two ruls of thumb for C code are:
1. use signed integers for everything except bit-wise operations and modulo math (e.g. "almost always signed")
2. make implicit sign conversion an error via `-Werror -Wsign-conversion`
The problem with making sizes and indices unsigned (even if they can't be negative) is that you'd might to want to add negative offsets, and that either requires explicit casting in languages without explicit signed/unsigned conversion (e.g. additional hassle and reducing readability), or is a footgun area in languages with implicit sign conversion.
For anyone else struggling to read it, Ctrl-A will make it legible.
Given your examples, I think you'd have fewer issues if you were working with unsigned integers exclusively. Although I'm curious about what other code you were referencing with this: "But seeing how each change both made the code easier to reason about and more correct, I couldn’t deny the evidence."
With regards to modulo, in Zig if you try to use it with a signed integer it will tell you to specify whether you want `@mod` or `@rem` semantics. In my case, I'd almost never write `x % 2`, I'd write `x & 1`. I do use unsigned division but I'd pretty much never write code that would emit the `div` instruction.
I'm not saying you're wrong though! Everyone has a different mind. If you attain higher correctness and understandability through using signed integers, that's great. I'm just saying I'm in the opposite camp.
The if statement won't work since Zig would force a cast.
The tricky wrap sucks unless you use a power of 2. Then the Zig type can match (u4, u5, u7, etc.) and you would use wrapping arithmetic operators. And on smaller CPUs you NEED to use a power of 2 because division and mod are expensive.
I don’t really get this claim. Indexing should just look up the element corresponding to the value provided. It’s easy to come up with semantics that are intuitive and sound, even if signed integers or ones smaller than size_t are used.
This is especially inconvenient in C, where there exist extremely dangerous legacy implicit casts between signed integers and unsigned integers, which have a great probability of generating incorrect values.
Because the index is typically a signed integer, comparing it with an unsigned limit without using explicit casts is likely to cause bugs. Using explicit casts of smaller unsigned integers towards bigger signed integers results in correct code, but it is cumbersome.
These problems are avoided as said in TFA, by making "sizeof" and the like to have 64-bit signed integer values, instead of unsigned values.
Well chosen implicit conversions are good for a programming language, by reducing unnecessary verbosity, but the implicit integer conversions of C are just wrong and they are by far the worst mistake of C much worse than any other C feature.
Other C features are criticized because they may be misused by inexperienced or careless programmers, but most of the implicit integer conversions are just incorrect. There is no way of using them correctly. Only the conversions from a smaller signed integer to a bigger signed integer are correct.
Mixed signedness conversions have always been wrong and the conversions between unsigned integers have been made wrong by the change in the C standard that has decided that the unsigned integers are integer residues modulo 2^N and they are not non-negative integers.
For modular integers, the only correct conversions are from bigger numbers to smaller numbers, i.e. the opposite of the implicit conversions of C. The implicit conversions of C unsigned numbers would have been correct for non-negative integers, but in the current C standard there are no such numbers.
The current C standard is inconsistent, because the meaning of sizeof is of a non-negative integer and this is also true for the conversions between unsigned numbers, but all the arithmetic operations with unsigned numbers are defined to be operations with integer residues, not operations with non-negative numbers.
The hardware of most processors implements at least 3 kinds of arithmetic operations: operations with signed integers, operations with non-negative integers and operations with integer residues.
Any decent programming language should define distinct types for these kinds of numbers, otherwise the only way to use completely the processor hardware is to use assembly language. Because C does not do this, you have to use at least inline assembly, if not separate assembly source files, for implementing operations with big numbers.
It was undefined what happens at unsigned overflows and underflows. Therefore a compiler could choose to implement "unsigned" as either non-negative numbers or as integer residues.
The fact that "sizeof" is unsigned and the implicit conversions between "unsigned" numbers are consistent only with non-negative numbers. Therefore the undefined behavior should have been defined correspondingly.
Instead of this, at some version of the standard, I am lazy to search it now, but it might have been C99, they have changed the behavior from undefined to defined as the behavior of integer residues.
I do not know the reason for this choice, it may have been just laziness, because it is easier to implement in compilers and it leads to maximum performance in the absence of bugs. In any case this decision has broken the standard, because the arithmetic operations have become incompatible with the implicit conversions between "unsigned" types and with the semantics of "sizeof", which must be non-negative.
For non-negative numbers, the correct conversions are from smaller sizes to bigger sizes, while for integer residues the correct conversions are only in the opposite direction, from bigger sizes to smaller sizes (e.g. a number that is 257 modulo 65536 is also 1 modulo 256, so truncating it yields a correct value, while a number that is 1 modulo 256 when modulo 65536 it could be 257, 511, 769 etc. so you cannot extend it without additional information).
Judging from the implicit conversions, it is clear that the intention of the designers of C during the seventies was that "unsigned" numbers must be non-negative integers and not integer residues. The modern C standard is guilty of the current inconsistencies that greatly increase the chances of bugs
I get your argument about the conversion order, but I do not buy it in terms of language design. You also do not want to go to a quotient ring implicitly, so I do not agree that this conversion direction would be more "correct" for implicit conversion either and from a practical point of view the C design is defensible.
I think the motivation originally was merely to expose the common capabilities of the hardware, nothing more. What we miss from this perspective are polynomials over F_2, but nobody pushed for this too hard so far.
NaN is almost always a mistake, and adding it breaks the law of identity. You don't want it.
But I can't agree with the claim that "nan is almost always a mistake". Certainly if you're doing floating-point computation on large arrays, the last thing you want is e.g. for an error to be thrown in a elementwise division just because two corresponding elements both happen to be zero.
It's true that nan!=nan is one of the more 'controversial' parts of the standard, that possibly would have been decided the other way in a perfect world. But it was also a reasonable pragmatic decision at the time the standard was developed. See here: https://stackoverflow.com/a/1573715/1013442
I don't recall similar arguments being made for Pascal or ADA.
Look around at the state of our C++ and C software and all the CVEs I think we probably shouldn't care about unsigned or signed loop indexes and move on before regulatory pressure forces us. Please language designers, give us some interesting alternatives to Rust.
Fix the language. Don't hack around it by using the wrong type.
Using signed sizes adds a lot of footguns and performance degradations and in exchange gives only small code simplifications in rare and niche cases.
Sizes and indices of course need to be unsigned, and any self respecting compiler should warn about dangerous usage.
So in fact it is not just telling me it’s a hard problem, it’s telling me that the cost-benefit is still not there. It’s like it’s just not a very important problem (in an economic sense). And that is what surprises me, given that computers were made to do arbitrary calculations.
I used to imagine for someone in construction a wall must be some really simple thing. But it's only simple after millennia of building walls. So I now have lots of grace and patience for humanity to figure out numbers in computers, whether integers or reals.
Your explanation is possibly the same just in different words. It's a hard problem and probably needs a whole lifetime. But it's in no single person's economic interest to devote to it the time it needs (not to mention the diverse skills required; once one has a solution one has to pitch it to the world). And so it will happen over a hundred lifetimes.
Sure, it's possible to write bugs in C. And if you really want to, you can disable the compiler warnings which flag tautologous comparisons and mixed-sign comparisons (a common reason for doing this is to avoid spurious warnings in generic-type code).
But, uhh, "people can deliberately write bugs" has got to be the weakest justification I've ever seen for changing a language feature -- especially one as fundamental as "sizes of objects can't be negative".
Signed integers can be negative. The so-called "unsigned" integers of C are integer residues modulo 2^N, which are neither positive nor negative, i.e. these concepts are not applicable to "unsigned" integers.
An alternative view is that any C "unsigned" is both positive and negative. For example the unsigned short "1" is the same number as "65537" and as "-65535".
So any sizeof value in C is negative (while also being positive).
In contradiction with what you say, the change described in TFA, by making sizes 64-bit signed integers, is the only method to guarantee that the sizes are non-negative in a language that does not have dedicated non-negative integers.
Other programming languages have non-negative integers, but C and C++ and many languages derived from them do not have such integers.
The arithmetic operations with non-negative integers differ from the arithmetic operations of C. On overflows and underflows, they either generate exceptions or have saturating behavior.
This can be disproven by the fact that dividing by `unsigned e = 1U` is well defined and always yields the starting number. If the unsigned numbers were really modular numbers as you suggest, division could not be defined.
The oldest parts of the C language are all consistent with "unsigned" numbers being non-negative integers. The implicit conversions between different sizes of "unsigned", the sizeof operator, the relational operators and division are consistent with non-negative integers.
However the first C standard, instead of defining the correct behavior has left undefined many corner cases of the arithmetic operations, allowing the implementation of "unsigned" as either non-negative integers or integer residues.
Eventually, the undefined behaviors for addition, subtraction and multiplication have been defined to be those of integer residues, not those of non-negative integers.
These contradictory properties are the cause of many confusions and bugs.
In extensible languages, like C++, it is possible to define proper non-negative integers and integer residues and bit strings and to always use those types instead of the built-in "unsigned".
In C, it is better to always use signed numbers and avoid unsigned, by casting unsigned to bigger sizes of signed before using such a value.
#include <stdio.h>
int main() {
unsigned short a = 1;
long b = a;
printf("%ld\n", b);
}
If not, why?The so-called "unsigned" integers of C are integer residues, where each value can be interpreted either as both positive and negative or as neither positive nor negative. In any case no "unsigned" value can be said to be non-negative.
You have to go back to languages not contaminated by C, like Ada, to find true non-negative integers among the primitive data types.
In C++, it is possible to define a non-negative integer type, which can have good performance if you implement its operations in assembly language.
However I am not aware of an open-source library including such a type.
And - yes, there are very important use cases for unsigned/modulo-2n/wraparound values. But sizes of data structures are generally _not_ one of those use cases. The fact that the size is non-negative does not mean that the type should be unsigned. You should still be able to, say, subtract sizes and get a signed value which may be negative.
No, signed wraparound is undefined behavior in C, whereas unsigneds are defined to wraparound. If you use -ftrapv, signed wraparound is an immediate abort().
While C like in many other places fails to define the correct behavior to avoid shaming the processor or compiler makers that fail to provide it, there are only 2 correct behaviors on overflows and underflows, like when incrementing the biggest number or decrementing the smallest number.
Both for signed integers and for non-negative integers, the 2 alternatives of correct behavior on overflows and underflows is to either generate exceptions or to saturate the result to the biggest or smallest representable number.
Wraparound is the correct behavior for integer residues, which are a distinct data type from either signed integers or non-negative integers.
While some people criticize C for making easy for careless programmers to make certain kinds of bugs, like access outside bounds, those are easily mitigated by using appropriate compiler options.
For me a much more serious defect of C is this confusion promoted by it in the heads of most programmers, who do not understand which are the fundamental integer types and which are the correct conversions between them, because C uses "unsigned" instead of at least 3 distinct types that it does not have, bit strings, integer residues and non-negative integers. More rarely, "unsigned" is used for other 2 types that are missing, binary polynomials and binary polynomial residues. All these 5 types must be primitive types in a programming language because all modern processors implement in hardware distinct operations for all 5 types, which can be accessed only through assembly language when these types are missing.
They do. The code:
unsigned x;
unsigned y = -x;
is well-defined in C and C++. See this discussion on StackOverflow for spec text and reference:
I'm not sure Go is saner just because len is an int. Well, maybe, depending on how you look at it. Defining len to be signed int, means the largest valid len is half your address space, which also means half of all possible indexes are always invalid; which makes some things easier.
But it's really that integer arithmetic is not undefined behavior regardless of signedness, that bounds are checked, and that even indexing your slice with an int64 on a 32-bit CPU does the full correct bounds check. In fact, you can use any integer type as an index.
Given all of the above, indexing with a uint or an int is actually indiferent. In that case, the bound check is a single unsigned <len compare (despite the fact that len is signed).
What's really painful, is trying to handle a full 32-bit address space with 32-bit addresses and sizes, like in Wasm; you need 33-bit math. So in a sense, limiting sizes to 31-bit (signed) does help. But at the language level, IMO, the rest matters more.
Error due to unsigned wraparound are a much bigger issue, because the lead to subtle issues where neither automatic warnings nor sanitizers help, exactly because it is well-defined and no automatic tool can tell whether the behavior is intended or wrong.
This is a type design mistake. The unsigned integers should not wrap by default. It makes absolute sense, given all the constraints and the fact that it's doing New Jersey "implementation simplicity dominates" design that K&R C only provides a wrapping unsigned type, but that's an excuse for K&R C which is a 1960s programming language.
The excuse gets shakier and shakier the further you move past that. C3 even named these types differently, so they're certainly under no obligation to provide the wrapping unsigned integers as if that's just magically what you mean. In most cases it's not what you mean. The excuse given in the article is way too thin.
Rust's Wrapping<u32> is the same thing as the wrapping 32-bit unsigned integer in C or C++ today, but most people don't use it because they do not actually want the wrapping 32-bit unsigned integer. This is a "spelling matters" ergonomics class again like the choice to name the brutally fast but unstable general comparison sort [T]::sort_unstable whereas both C and C++ leave the noob who didn't know about sort stability to find out for themselves because they name this just "sort" and you get to keep both halves when you break things...
But what I do not believe is that there is a real need for a non-wrapping non-negative integer type.
So the most obvious counter example is so obvious you might not even have remembered it's a type, the unsigned 8-bit integer or byte.
But frankly if you don't have the wrapping mistake they just make for a pretty good general purpose index, they're a useful counter, there's a reason we called these the "Natural numbers".
Natural numbers are nice, but then we invented zero and negative number so we got a group structure for addition which is really useful. Because even for a counter, or some index, you may want to to addition and subtraction and then you definitely do not want a non-wrapping non-negative integer for intermediate results.
And the rust design with unsigned type where subtraction does not return a signed type but may fail at return or silently produce the wrong results, seems the worst possible design imaginable to me.
> And the rust design with unsigned type where subtraction does not return a signed type but may fail at return or silently produce the wrong results, seems the worst possible design imaginable to me.
You can ask for whatever you meant, and indeed asking for what you meant is crucial here because if we express ourselves we get the desired results.
For example u8::borrowing_sub lets us do the arithmetic style you may have learned in primary school in which we track whether we "borrowed" one because of our subtractions, this might be useful in some places and is certainly easier to understand.
u8::checked_sub tells us either the answer or that it would overflow, which might allow us to take a different course of action and not need the subtraction.
u8::saturating_sub performs saturating arithmetic, if it would overflow we get the largest value in the appropriate direction instead, this often makes sense in e.g. signal processing.
u8::unchecked_sub promises we know the subtraction doesn't overflow and so no checks are needed, this is a performance optimisation if you really need it.
u8::wrapping_sub_signed performs the wrapping arithmetic you say is sometimes a good idea, with specifically a signed i8 parameter rather than an unsigned one if we want that.
The truth here is that you might want a lot of different operations and the C choice is not only to provide a single choice, which made a lot more sense 50+ years ago than it does today, but to provide a singularly bad default.
If you want special function which protects you from errors in specific scenarios it is easy enough to do this in C. But I do think the C defaults are actually ok and having all the wrapper functions and boiler plate has its downsides too (What I would admit is bad in C are the default warning modes in C compilers).
I cannot imagine we will end up agreeing, but it's good to understand why you made that edit
C has a single '+' operator, just like Rust has. And what that operator does depends on the types to the left and to the right. You can cast between integer types to achieve different behaviours depending on what you want.
About u8::unchecked_sub() etc, those are just regular functions. Not really a language thing. Yes, nothing of that is standardized in C AFAIK, but I'll happily use e.g. __builtin_add_overflow() or whatever in practice.
We can argue all day long what are the right defaults, checked or unchecked operations. If you want to be safe, you want the compiler to emit checks. It's probably possible to get some of those in GCC. If you want to emit streamlined machine code, you'll definitely not want to add checks after every machine instruction.
Well "regular functions" in the sense that these are methods of the primitive type u8†, and of course neither C nor C++ can do that at all. So, yeah, it's a language thing.
In C++ what you'd do here instead is invent custom types and add the methods you want to the types, and I would give C++ credit here if the stdlib provided say, a bunch-of-bits base type with all the bit-twiddling methods defined and maybe specialisations for the 32-bit and 8-bit unsigned integers or something, but AFAICT it doesn't do anything like that.
"I could go out of my way to do this" is true for everything in any of the general purpose languages by their nature.
† In Rust if we define a function associated to a type T with a "self" first parameter then you can call that function as just a method on any value of type T and the appropriate parameter is inferred. So e.g. u8::checked_sub(u8::MAX, 10) is Some(245) but u8::MAX.checked_sub(10) is also Some(245) because it de-sugars to the same call.
I really don't care about functions vs methods, what is the difference, it's just syntax. Actually, keeping to regular function calls is mostly more readable to me, compared to using methods, using short unqualified method names, mixing function calls and methods calls, nesting/chaining functions and methods.
Better ergonomics means it's more likely the programmer writes what they actually meant, and if you do that modern compilers have got better at making what you meant fast even as they remain the same or perhaps slightly worse at converting vague gestures which aren't clear about what you meant into what you had hoped for without expressing it.
It have definitely written C code which tries to use 257 values for a byte, with zero playing both its role as "just zero" in some places, and then also serving as 256 because "it's never zero" in other places and of course this is a nasty bug if one of those "it's never zero" zeroes gets into the "it's just zero of course" code paths or vice versa.
The "Wrapping will fix my arithmetic ordering" thing is in this article too and I think that's also a terrible idea, maybe even worse than the wrapping unsigned integer types themselves because it leads to a muddled idea of what's really going on.
What would you do instead?
Then, dedicated APIs for wrapping behavior where you expect it to happen.
Just this week I've had a C compilers silently delete me an entire function call because of UB (infinite loop without side effects). Took me a day to figure out. So that's a problem for me.
I don't think I've ever had an hard to debug issue in Go because of signed/unsigned wrap around. Particularly a memory issue.
If anything, and there I guess I agree with the article, I wish Go had implicit conversions to wider types: to make the problematic ones stand out.
I guess the reason it doesn't is that they're different named types, which would be a problem when you create a named type for the purpose of forcing explicit type conversions. But maybe the default ones could implicitly implement a numeric tower, where exact conversions can be implicit.
Regarding infinite loops, C++ and C differ with C++ being more aggressive. But also compilers differ with clang being more aggressive. https://godbolt.org/z/Moe6zYKqo
In general, I do not recommend to use clang if you worry about UB. gcc is a bit more reasonable and also has better warnings.
High performance, lock-free FIFOs/channels are commonly implemented in a way that requires overflow.