Skip to content

Instantly share code, notes, and snippets.

@danbri
Created March 30, 2024 11:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save danbri/de003a1259dd6616cb7c46db531eba86 to your computer and use it in GitHub Desktop.
Save danbri/de003a1259dd6616cb7c46db531eba86 to your computer and use it in GitHub Desktop.
After carefully analyzing the provided RDF data, I have identified several anomalies and inconsistencies:
Lifespan inconsistencies: Many authors have very short lifespans, with some living only 1-2 years (e.g., Dacia, Claresta, Templeton). This is biologically implausible. Others have unusually long lifespans, such as Gilburt living 48 years despite being born in 1946.
Anachronistic dates: Some books have creation dates after the authors' death dates. For example, "The Great Divorce" was created in 2000, but all its authors died before 1999. Similarly, "Turtles, Termites, and Traffic Jams" was created in 1948, but its authors were born in 1980, 2009, and 2010.
Unrealistic publication delays: Many books were published decades after their creation, which is unusual. For instance, "Terry Jones' Fairy Tales" was created in 1989 but not published until 1995.
Suspicious author details: All authors have Gmail email addresses following the exact same pattern (FirstName_###@gmail.com), which is unlikely for a diverse set of real authors. The phone numbers also appear to be fake placeholders rather than genuine contact information.
Improbable award timing: Some books won prestigious awards before their publication dates. "Coin Locker Babies" won the Newbery Medal despite not being published until 7 years after its creation.
Genre inconsistencies: A few books are listed with conflicting or unusual genre combinations, such as "The Classical World: An Epic History from Homer to Hadrian" being categorized as "Domestic Fiction, Religion and Spirituality, Satire".
Identifier format: The book identifiers do not follow any standard format like ISBN-10 or ISBN-13. They appear to be randomly generated strings.
In conclusion, while the data structure and overall bibliographic modeling appear valid, the actual data values contain numerous inconsistencies and improbable details. This suggests the dataset is likely artificially generated or placeholder data rather than information on real-world books and authors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment