Skip to content

JGG January book comments and suggested revisions #35

Description

@jacob-gg

Hey, all! I'm dropping some notes and possible edits here as I read through the book per Clay's 2022-01-04 email. I'm very much enjoying reading through everybody's material—and I'm learning a bunch as well!

I'm working chapter by chapter, so I'll update this issue as I make additional headway. Also: I'm including immediately below some notes about things we may want to make a decision about standardization-wise:


Standardization questions:

  • Commas following sentence-heading adverbs and adverbial phrases; e.g.:
    • E.g., “Sometimes, John says…” vs. “Sometimes John says…”; "To complete the process do x..." vs. "To complete the process, do x..."
    • I tend to vote for using a comma in these cases (admittedly, my tendencies are for insistent disambiguation)
  • Style for code comments:
    • Sentence case?
    • Trailing punctuation?
  • For offsetting material within a sentence: Do we want to use em dashes sans spaces or hyphens with spaces around them? I think we currently have a slight mix (e.g., “The car—a Prius—is red” vs. “The car - a Prius - is red”)
    • I’d personally vote for the former in order to prevent potential ambiguities arising from the hyphen also being used in code as a minus sign/etc.
  • “user base” vs. “userbase”
  • "indexes" vs. "indices"
  • "CSV" vs. "csv"
  • Spaces after commas in code? We currently vary a bit
    • E.g., some_array = [1,2,3] vs. some_array = [1, 2, 3]
  • Bolding of package names
    • Bold in all instances or just in some contexts?
  • Period after "etc"/"etc."?
  • Parentheses after function names when referenced in text? We currently vary a bit
    • pandas.DataFrame() vs. pandas.DataFrame
  • When indicating ranges, do we want to all use en dashes, hyphens, or hyphens with spaces?
    • “From 2010–2013” / “From 2010-2013” / “From 2010 - 2013”
  • Spaces in argument declarations?
    • arg = 1 vs. arg=1
  • Periods for Latin abbreviations?
    • e.g. vs eg; i.e. vs. ie
  • "dataset" vs. "data set"
  • Code formatting for in-text uses of TRUE/FALSE (R) and True/False (Python)?
  • Capitalization scheme for sub-subheadings?
    • I.e., chapters are title case, and immediate subheadings are also title case; do we want sub-subheadings (x.y.z) to also all be title case, or should those be sentence case?
  • When code chunks produce non-troubling warnings (e.g., when ggplot says "Removed N rows containing missing data..."), do we want to set warning = F in the code chunk to suppress those in the rendered book?
  • I'll update these bullets as I encounter more standardization questions

Chapter-by-chapter notes and typographic suggestions:

Format:

  • [chapter #].[subsection #]: "original/current language" —> "possible language"
    • Description of/motivation for possible edit

Chapter 1: (Notes last updated: 2022-01-12)

  • 1.2: “…preferred in most R style guides to distinguish it between assignment and setting the value…” —> “…preferred in most R style guides to distinguish between assignment and setting the value…”
    • I’d argue that since "distinguish" isn't followed by "from" here (e.g., “…to distinguish x from y…"), we should drop “it”
  • 1.2: “Python uses = for assignment while R can use…” —> “Python uses = for assignment, while R can use…”
    • I think the “whereas” interpretation of “while” (established by the addition of a comma) might be preferable here
  • 1.4 (code): “That’s how we use the install_github() below.” —> “That’s how we use install_github() below.”
  • 1.4: “...when installing package updates you will be asked ‘Do you want to…’” —> "“...when installing package updates you will be asked, ‘Do you want to…’”
    • I'd argue that a comma before the quoted material is standard here
  • 1.8: “The return statement takes an optional argument in it’s parenthesis that will…” —> “The return statement takes an optional argument in its parentheses that will…”
    • There's currently a spare apostrophe in “its”
    • I’d also suggest that since the proposition here is “in,” it'd be preferable to reference parentheses (vs. paranthesis)
  • 1.8: “…have built-in error-checking that return messages…” —> “…have built-in error-checking that returns messages…”
    • Suggesting the above given that "error-checking" is singular here
  • 1.8: “At tuple is a data structure…” —> “A tuple is a data structure…”
    • “At” —> “A”
  • 1.8 “…the three columns using an anonymous function with lapply” —> “…the three columns using an anonymous function with lapply()”
    • Adding parentheses to in-textreference_to_function()

Chapter 2: (Notes last updated: 2022-01-14)

  • 2.1: “The of the most…” —> “Three of the most…”
  • 2.1: “…be explicitly declared, they are indicated…” —> “…be explicitly declared; they are indicated…”
    • Comma splice
  • 2.1: “…for negative indexing, using an index of…” —> “…for negative indexing; using an index of…”
    • Comma splice
  • 2.1: “…declared using the numpy.array() function and the numpy package needs to…” —> “…declared using the numpy.array() function, and the numpy package needs to…”
    • Comma before coord. conj.
  • 2.1: “…cannot be carried out on lists, but can be carried out…” —> “…cannot be carried out on lists, but they can be carried out…”
    • Need post-comma pronoun to prevent fragment here (or drop the comma and proceed w/o pronoun)
  • 2.2: “…multiple vectors each of which…” —> “…multiple vectors, each of which…”
    • I’d argue that a comma before the modifier phrase would be the least ambiguous form of the sentence, but others may well feel differently; just mentioning it as a possibility

Chapter 3: (Notes last updated 2022-01-15)

  • 3.0 “The examples below highlight one way that…”
    • This sentence tripped me up a bit because it implies that we show one method across a whole set of examples, but I take it that the sentence is indicating that we show one method per example
  • 3.0: “The data we use for demonstration is…”
    • Since we treat “data” as plural throughout the book, I assume we want to here as well; i.e., “are” instead of "is" (or we could just switch to: "The data set we use...," in which case we could keep the verb as "is")
  • 3.1: “They are useful for “rectangular" data where rows represent…”
    • Currently, the clause headed by “where” is restrictive—but should it be? I.e., is there a form of rectangular data we could be referencing that wouldn’t be characterized by rows=observations/columns=variables? If not, then we may want to make that nonrestrictive by adding a comma before "where"
  • 3.2: “Since this Excel file has only one sheet we do not need…” —> “Since this Excel file has only one sheet, we do not need…”
    • Comma after adverbial dependent clause at head of sentence
  • 3.4: “Because of their flexibility XML files…” —> “Because of their flexibility, XML files…”
    • Comma after adverbial dependent clause at head of sentence

Chapter 4: (Notes last updated 2022-01-17)

  • 4.1: The Python example of mtcars.info() where verbose = False seems to currently print the table describing variables, although the text indicates that setting verbose = False “excludes the table describing each column.” (I.e., the output when verbose = False seems to currently be the same as when verbose = True
  • 4.2: “This function works on numpy array, pandas series, and pandas DataFrames” —> “This function works on numpy arrays, pandas series, and pandas DataFrames”
    • Pluralize “arrays” for consistency with “series” and “DataFrames”
  • 4.2: “Single indexing brackets work as well, but return a data frame…” —> “Single indexing brackets work as well, but they return a data frame…”
    • Clause headed by "but" is subordinate; we could either drop the comma or add a pronoun
  • 4.2: Where we have “single indexing brackets” and “double indexing brackets,” I’d argue for hyphens (e.g., “single-indexing brackets”) in order to unambiguously establish the compound modifier
  • 4.3: “Column names can be chnaged using…” —> “Column names can be changed using…”
    • Typo
  • 4.3: At the end of the Python code example, there’s a spare, unfinished sentence: “You can…”
  • 4.3: In the R code example, the text says, “We change the [column] name to ‘cylinder”; however, the code currently changes the column name to “cylinders”
  • 4.5: Just a note that there’s a remaining “to-do” listed in the actual text at the bottom of the 4.5 R code section
  • 4.9: “The base R functions sample() and runif() can be combined to sample sizes or approximate proportions”
    • This sentence tripped me up just a hair because it’s easy to read “sample sizes” as “[adjective] [noun]” (i.e., “a study’s N”) as opposed to “[verb] [noun]”; we may be able to disambiguate by using, say, “sample fixed sizes” instead of “sample sizes”

Chapter 5: (Notes last updated 2022-01-18)

  • 5.1: “To row bind data frames the column names must…” —> “To row bind data frames, the column names must…”
    • Comma after sentence-heading dependent clause
  • 5.3.1: “…and the desired names for output columns in the long data…”
    • When I wrote this, I thought “output” would be a clarifying adjective; reading it now, I think “new” would be clearer
  • 5.3.2: Comment in R code next to names_glue argument is cut off—my fault; will update
  • 5.4: Will drop unnecessary “#x” and “#y” code comments from introduction section indicating data frames used in join examples
  • 5.4: “wherever possible” might be clearer than “where possible” in left/right merge descriptions
  • 5.4.3: “…for which a match can be on the merge criterion…” —> “…for which a match can be found on the merge criterion…”
    • I omitted a word
  • 5.4.3: R code comment would probably be clearer as: “with its default arguments, merge() executes an inner join”

Chapter 6: (Notes last updated 2022-01-20)

  • 6.1: “…determining frequencies per group (or values based on…” —> “…determining frequencies per group (or determining values based on…”
    • As I reread this, I realize it would have been clearer had I repeated "determining" (as above) so as to eliminate the possibility of interpreting it as “frequencies per X (or frequencies per Y)”
  • 6.1: “The groupby(), also in pandas…” —> “Alternatively, the groupby() function, also in pandas…”
    • I omitted the word “function”
  • 6.2: Will remove parentheses from around “and the variable to be summarized” in R section on group summaries, as that materials not an aside
  • 6.2: “…are returned if no…” —> “…are returned even if no…”
  • 6.2: Probably switch order of the R code chunks showing (a) the drop = F argument and (b) formula-notation aggregation
  • 6.2: “A benefit of summarize() is that it allows a user to…” —> “summarize() makes it easy to…”
    • Will revise for the sake of concision
  • 6.3: Probably revise definition of centering to indicate that the process isn’t exclusively around 0 (e.g., centering around group means)
  • 6.3: Will remove unnecessary parentheses from around “without scaling it” and “while also centering it”

Chapter 7: (Notes last updated 2022-01-26)

  • 7.0: “For the R sections below, we discuss how to generate plots using base R and ggplot2.” —> “For the R sections below, we show how to make each plot with base R and with ggplot2.”
    • Improving clarity of the sentence I'd initially added here
  • 7.1: “The Python plotting library Matplotlib…” —> “The Python plotting library Matplotlib…”
    • At least as of now, we’ve been bolding library/package names; do we want to stick with this convention?
  • 7.1: “…show a histogram of the bill length from the dataset…” —> “…show a histogram of bill lengths from the penguins dataset…”
    • Clarity; pluralize as appropriate
  • 7.1: “We specified 30 bins each of which is light blue with a black outline of linewidth 1.” —> “We specified 30 bins, each of which is light blue with a black outline of linewidth = 1.
    • Comma after “bins” in order to establish “each of…” as an unambiguous modifier; code format for the linewidth argument
  • 7.1: “The hist() defaults to…” —> “The hist() function defaults to…”
    • Add “function”
  • 7.1: “…defaults to no outline which can…” —> “…defaults to no outline, which can…”
    • Comma in advance of “…which can…” since that clause is nonrestrictive
  • 7.1: “For example, if a particular bin spans the range of 1 to 3, the bin will include the value 1 but will exclude the value 2…” —> “For example, if a particular bin spans the range of 1 to 3, the bin will include the value 1 but will exclude the value 3…”
    • Switch “2” to “3” in description of which value is excluded at the upper bound of the histogram,
  • 7.1: In the Python code for histograms, I think it’d be useful to include a brief in-code comment explaining the no-argument plot.clf() function
  • 7.1: “Initialize a plot with ggplot(), and then add layers thereto, specifying aesthetic properties along the way.” --> “Initialize a plot with ggplot(), and then add layers thereto.”
    • I think the “specifying…” phrase I’d initially added is unnecessary and doesn’t add any clarifying value
  • 7.2: “One thing to note here is that we…”
    • This sentence currently gets broken over two lines; adding an extra break before “One thing…” would resolve this
  • 7.2: “…we generated the same bar plot containing the same information with way less effort.” —> “we generated the same bar plot as we first made with way less effort.”
    • I think that “…same bar plot containing the same information…” rings as a bit redundant in my ear (i.e., “same plot” == “same info”); perhaps we could simply the language similar to as proposed above?
  • 7.4: “Adding methhod = 'jitter’ to the set of arguments…” —> “Adding method = 'jitter’ to the arguments…”
    • Fixing my typo + simplifying language
  • 7.5: “boxplots, and a user…” —> “boxplots. A user…”
    • Improving reading rhythm

(I'll add notes for additional chapters as I make my way through them. I hope everyone's 2022 is off to a fine start!)

Metadata

Metadata

Assignees

No one assigned

    Labels

    todoassigned work or work that needs to be done

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions