Skip to content

Instantly share code, notes, and snippets.

@cmlsharp
Last active February 20, 2017 00:51
Show Gist options
  • Save cmlsharp/61c618bc1c4def7299dd to your computer and use it in GitHub Desktop.
Save cmlsharp/61c618bc1c4def7299dd to your computer and use it in GitHub Desktop.
My (hopefully) improved version of the cs50 library. It checks for overflow, prevents a possible segfault and allows for larger input
/****************************************************************************
* CS50 Library 6
* https://manual.cs50.net/library/
*
* Based on Eric Roberts' genlib.c and simpio.c.
*
* Copyright (c) 2013,
* Glenn Holloway <holloway@eecs.harvard.edu>
* David J. Malan <malan@harvard.edu>
* All rights reserved.
*
* BSD 3-Clause License
* http://www.opensource.org/licenses/BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
* met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of CS50 nor the names of its contributors may be used
* to endorse or promote products derived from this software without
* specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
* IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
* TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
* PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
* PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
* NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
*
* Contributors:
* Chad Sharp <crossroads1112@riseup.net>
***************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h> // Provides strlen()
#include <errno.h> // Provides errno and ERANGE macro
#include <math.h> // Provides isfinite()
#include "cs50.h"
#ifndef SIZE_MAX
#define SIZE_MAX ((size_t) -1)
#endif
// Default capacity of buffer for standard input.
#define CAPACITY 128
// Automatically detect if input is octal, decimal or hexidecimal in GetInt and
// GetLong
#define BASE 0
/*
* Reads a line of text from standard input and returns the equivalent char; if
* text does not represent a char, user is prompted to retry. Leading and
* trailing whitespace is ignored. If line can't be read, returns CHAR_MAX.
*/
char
GetChar(void)
{
// try to get a char from user
while (true)
{
// get line of text, returning CHAR_MAX on failure
string line = GetString();
if (line == NULL)
{
return CHAR_MAX;
}
// return a char if only a char (possibly with
// leading and/or trailing whitespace) was provided
char c1, c2;
if (sscanf(line, " %c %c", &c1, &c2) == 1)
{
free(line);
return c1;
}
else
{
free(line);
printf("Retry: ");
}
}
}
/*
* Reads a line of text from standard input and returns the equivalent double
* as precisely as possible; if text does not represent a double, user is
* prompted to retry. Leading and trailing whitespace is ignored.
* Overflow/underflow detected. If line can't be read, returns DBL_MAX.
*/
double
GetDouble(void)
{
// try to get a double from user
while (true)
{
// get line of text, returning DBL_MAX on failure
string line = GetString();
if (line == NULL)
{
return DBL_MAX;
}
// return a double if only a double (possibly with
// leading and/or trailing whitespace) was provided
char *endptr = NULL;
int const errnocpy = errno;
double d = strtod(line, &endptr);
if (strlen(line) && errno != ERANGE && *endptr == '\0' && isfinite(d))
{
free(line);
return d;
}
else
{
errno = errnocpy;
free(line);
printf("Retry: ");
}
}
}
/*
* Reads a line of text from standard input and returns the equivalent float as
* precisely as possible; if text does not represent a float, user is prompted
* to retry. Leading and trailing whitespace is ignored. If line can't be
* read, returns FLT_MAX. Checks for overflow.
*/
float
GetFloat(void)
{
// try to get a float from user
while (true)
{
// get line of text, returning FLT_MAX on failure
string line = GetString();
if (line == NULL)
{
return FLT_MAX;
}
// return a float if only a float (possibly with
// leading and/or trailing whitespace) was provided
char *endptr = NULL;
int const errnocpy = errno;
float f = strtof(line, &endptr);
if (strlen(line) && errno != ERANGE && *endptr == '\0' && isfinite(f))
{
free(line);
return f;
}
else
{
errno = errnocpy;
free(line);
printf("Retry: ");
}
}
}
/*
* Reads a line of text from standard input and returns it as an int in the
* range of [-2^31 + 1, 2^31 - 2] (on some systems), if possible; if text does
* not represent such an int, user is prompted to retry. Leading and trailing
* whitespace is ignored. Overflow is detected and user is prompted to retry.
* If line can't be read, returns INT_MAX.
*/
int
GetInt(void)
{
// try to get an int from user
while (true)
{
// get line of text, returning INT_MAX on failure
string line = GetString();
if (line == NULL)
{
return INT_MAX;
}
// return an int if only an int (possibly with
// leading and/or trailing whitespace) was provided
char *endptr = NULL;
int const errnocpy = errno;
/* There is no strtoi() so we must check if n is
* between INT_MAX and INT_MIN. On most systems
* a long is the same size as an int but not on all. */
long n = strtol(line, &endptr, BASE);
if (strlen(line) && errno != ERANGE && *endptr == '\0' && n <= INT_MAX
&& n >= INT_MIN)
{
free(line);
return (int) n;
}
else
{
errno = errnocpy;
free(line);
printf("Retry: ");
}
}
}
/*
* Reads a line of text from standard input and returns an equivalent long long
* in the range [-2^63 + 1, 2^63 - 2] (on some sysytems), if possible; if text
* does not represent such a long long, user is prompted to retry. Leading and
* trailing whitespace is ignored. Overflow is detected and user prompted to
* retry. If line can't be read, returns LLONG_MAX.
*/
long long
GetLongLong(void)
{
// try to get a long long from user
while (true)
{
// get line of text, returning LLONG_MAX on failure
string line = GetString();
if (line == NULL)
{
return LLONG_MAX;
}
// return a long long and only a long long, checking for
// overflow. Will skip over whitespace.
char *endptr = NULL;
int const errnocpy = errno;
long long n = strtoll(line, &endptr, BASE);
if (strlen(line) && errno != ERANGE && *endptr == '\0')
{
free(line);
return n;
}
else
{
errno = errnocpy;
free(line);
printf("Retry: ");
}
}
}
/*
* Reads a line of text from standard input and returns it as a string, sans
* trailing newline character. (Ergo, if user inputs only "\n", returns "" not
* NULL.) Leading and trailing whitespace is not ignored. Returns NULL upon
* error or no input whatsoever (i.e., just EOF).
*/
string
GetString(void)
{
// growable buffer for chars
string buffer = NULL;
// capacity of buffer
size_t capacity = 0;
// number of chars actually in buffer
size_t n = 0;
// character read or EOF
int c;
// iteratively get chars from standard input
while ((c = fgetc(stdin)) != '\n' && c != EOF)
{
// grow buffer if necessary
if (n + 1 >= capacity)
{
// determine new capacity: start at CAPACITY then double
if (capacity == 0)
{
capacity = CAPACITY;
}
else if (capacity <= (SIZE_MAX / 2))
{
capacity *= 2;
}
else if (capacity < (SIZE_MAX - 1))
{
capacity += ((SIZE_MAX - capacity)/2);
}
else
{
free(buffer);
return NULL;
}
// extend buffer's capacity
// Better practice to use sizeof(*temp) than sizeof(char);
// http://stackoverflow.com/questions/7243872/why-write-sizeofchar-if-char-is-1-by-standard
string temp = realloc(buffer, capacity * sizeof(*temp));
if (temp == NULL)
{
free(buffer);
return NULL;
}
buffer = temp;
}
// append current character to buffer
buffer[n++] = c;
}
// return NULL if user provided no input
if (n == 0 && c == EOF)
{
return NULL;
}
string minimal = realloc(buffer, (n + 1) * sizeof(*minimal));
// Check if realloc failed
if (minimal == NULL)
{
// Minimization failed. Input should still be returned
minimal = buffer;
}
// terminate string
minimal[n] = '\0';
// return string
return minimal;
}
@FreeER
Copy link

FreeER commented Aug 25, 2015

in GetInt
line 194, should you not also check INT_MIN ?

in GetString
line 280, pretty sure that should be < SIZE_MAX not <=...

line 308, why are you freeing the buffer and returning NULL if you couldn't allocate the minimal amount of space? You already have the input stored in the buffer, that part's just trying to minimize wasted space by freeing the unused space, if it can't manage to allocate that minimal space there's no reason to cause the function to "fail". Just return the full buffer or make an attempt with realloc to minimize it "in place" (not sure if realloc ever would do that, but it's a theoretical possibility). edit: 0 terminate before returning of course.

line 308 continuation-sort of: not sure if a capacity == n check would be worth it here or not (to decide if you already have a minimal buffer)... probably doesn't happen often enough to be worth the check unless you expand that suggestion to capacity - n > SOME_ACCEPTABLE_SIZE.

@cmlsharp
Copy link
Author

Fixed your first two critiques (thanks for that)

As for the t one, should I have it call realloc? If malloc failed to alocate memory why should realloc be able to? Would it be better to return a larger sized array? That means there are some inconsistencies with how the function operates I suppose but strlen() would still work as expected.

@FreeER
Copy link

FreeER commented Aug 26, 2015

@crossroads1112

If you want to try everything possible to not waste space then call realloc, if not then just return the larger memory (the caller will never know the difference).

If malloc failed to alocate memory why should realloc be able to?

the man page for realloc says: "The realloc() function changes the size of the memory block pointed to by ptr to size bytes. The contents will be unchanged in the range from the start of the region up to the minimum of the old and new sizes."

It does not say that it necessarily mallocs memory of the new size, copies the memory and frees the old buffer, it says it changes the size. That means it could very will be written such that when the new size is smaller than the old size it simply adds a new node to the free memory list with a size of "current - new" and the address of ptr+new (or the equivalent if heap management hasn't been implemented using a linked list of free memory, http://web.eecs.utk.edu/~huangj/cs360/360/notes/Malloc2/lecture.html). Of course, it might simply decide that it doesn't care as long as it's as big as it's supposed to be. Depends on the implementation. Here's one for reference http://sourceforge.net/p/openfoam-extend/svn/717/tree//thirdparty/malloc/gmalloc/realloc.c, (I'm sure you could find others, ubuntu's source for instance, though I didn't feel like taking a couple hours to download a few hundred megabytes... https://wiki.ubuntu.com/Kernel/SourceCode).

@cmlsharp
Copy link
Author

@FreeER

Thanks lot for the explanation! Why would malloc() be used in the original implementation to create minimal at all? Why not just use realloc on buffer and avoid using strncpy()?

Edit: did some reading
http://www.iso-9899.info/wiki/Why_not_realloc

Calling realloc is sometimes slower than malloc and you are guaranteed not to lose your array if realloc were to fail.

After reading that an important thing to note with my modifications to GetString() would be that as the input approaches SIZE_MAX, it would progressively get slower as realloc() is somewhat expensive and would be called more and more often. I'm going to remove the capacity+=1 condition because it seems silly for realloc to be called for one single character.

@cmlsharp
Copy link
Author

I have to use realloc because in the case that capacity == n, I won't have room for the NULL byte. Thus I've updated my proposed solution above.

@FreeER
Copy link

FreeER commented Aug 26, 2015

@crossroads1112

Ah, just noticed but you've introduced a bug on line 312. If minimal is NULL and realloc succeeds and it does so without moving the memory then since you set minimal to tmp which is the same place as the buffer and later free the buffer before returning, you've then freed the memory where minimal was pointing thus the pointer that you're returning to the caller doesn't point to valid memory.

you are guaranteed not to lose your array if realloc were to fail.

Isn't this guaranteed by realloc as well as long as you don't do something like:
buffer = realloc(buffer, newSize);

If realloc() fails the original block is left untouched; it is not freed or moved.

but yeah, I can see realloc being a bit slower since it has to handle multiple scenarios (you can see that just from the man page, NULL pointer makes it the equivalent of calling malloc, 0 size makes it the equivalent of free, then if it had to allocate memory elsewhere it has to copy the data etc.). However, In this scenario, where you're copying the data anyways, I'm not sure that malloc is really any better of a choice... and it'd be a bit simpler with just a realloc call... perhaps there's some extra detail I'm not seeing or maybe it was just what the dev chose to do without too much thought, after all, they didn't bother to check malloc's return now did they?

@cmlsharp
Copy link
Author

Yeah you're right about realloc. My bad. I've updated it again. Now it only returns NULL if n == capacity since there'd be no room for the NULL byte

@cmlsharp
Copy link
Author

Although this might not even be necessary since capacity is always increased if it is >= to n + 1

@cmlsharp
Copy link
Author

Also no strncpy means that string.h no longer need be included. I'll give it another look tomorrow and see if I can remove that check and that include

@FreeER
Copy link

FreeER commented Aug 26, 2015

@crossroads1112, that bug I mentioned is still there, after you call realloc if minimal == buffer, you do not want to free buffer because they point to the same memory. Actually, now that you're using realloc only you don't want to free buffer at all, if it's pointing to the same memory you don't want to free it and if it's not then realloc freed it for you :)

@cmlsharp
Copy link
Author

Actually capacity only increases if it is > than capacity so that check is necessary. Oops Could also change it to >= I suppose but it's not necessary.

@cmlsharp
Copy link
Author

@FreeER so no calls to free() are necessary?

And if so, earlier in the function realloc is used on buffer and if it fails, buffer is freed manually. Is that necessary?

Nevermind. It's in the realloc man page. It doesn't free the pointer if it fails

@cmlsharp
Copy link
Author

Thanks for your help by the way! I didn't really know the inner workings of malloc/realloc/free (obviously) so this is really educational

@FreeER
Copy link

FreeER commented Aug 26, 2015

@crossroads1112

was just looking over the code again 😄

GetInt 197: the INT_* comparisons should be <= and >=.

Everything else looks good to me however. And you're entirely welcome for any help I provided, I won't say that I'm an expert on any of this (I'd read about how malloc worked quite some time ago and remembered some here when it came up) but I'm comfortable with it and feel I understand it well enough to use it 😆

@ivandardi
Copy link

Is the user being able to input an empty string intended behaviour? Like, if you made sure that the GetString function ensured that it read a string of at least 1 character, then you wouldn't need the strlen() checks in the other functions.

Copy link

ghost commented Oct 28, 2015

Calling strlen() should not be necessary in any of your code. For example if (strlen(line) && is the same as if (strlen(line) != 0 &&, which is true if the first character (*line) is not 0 (alternatively written '\0').

Your usage of endptr after calling the strtoX functions isn't quite correct either. With the original sscanf format of, e.g., " %d %c", it accounts for leading and trailing whitespace, i.e., space characters before and after the integer (or float or whatever, depending on the format). strtoX functions only remove leading whitespace, not trailing whitespace, so you'd need to possibly skip more whitespace after the number to get to the '\0'.

Lastly, unless the type might be changed to a type with a greater width than char (e.g. short), it's redundant to use sizeof var. You're working with a sequence of char items, and there's no chance of the type changing from anything other than that since a string will forever be a sequence of char items (unless you decide to do something crazy like typedef int *string;). In other words, you don't need sizeof(*temp) on line 321 or sizeof(*minimal) on line 342 or anything like that because you're only working with char.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment