Programming

Trifecta: Filter, Map, Reduce

I love using Goodreads. There is a great community there; contributing to the site with their comments, reading lists and such. When I discover a new list such as the Riftwar Cycle, I want to copy the book names to a Google Sheet. When I visit a bookstore, I then pull up the file to see if I have already purchased it or not. So, my Google Sheet is ever changing but one thing stays constant. My rate of buying books is exponentially growing over the rate of reading what I already own.

Some lists are mixed, meaning they are part of a universe and there are many authors contributing to it. In that case, I would like to write down the author’s name next to the book’s title. You might wonder if it would be a better idea to group the items by author name first then the book title. I thought about doing it that way too first but then the books that I’m usually interested in are grouped by type so they are all clustered in one bookshelf rather than distributed all over the store by the author’s name.

So, our goal is to parse a list out of page content in the following format: Book Title, Author Name.

Start Collecting

I start collecting the most basic information using document.querySelectorAll. The selectors I need are .bookTitle and .authorName. This will return an array-like NodeList object which is not that useful by itself but we can convert into an array if we pass it into Array.from().

Array.from(document.querySelectorAll(".bookTitle, .authorName"))

This, of course, returns a lot more than we need. If you try the aforementioned line of code, you may see that the items on the sidebar are also included since they have bookTitle and authorName classes applied them. We want to limit our list strictly to the list we are seeing in the main area.

Array.from(document.querySelectorAll(".bookTitle, .authorName"))
    .filter((entry) => entry.attributes.itemprop)

Applying a simple filter to select only the items that have itemprop attribute will do the job. The sidebar items don’t have this attribute so we end up with the list of items we intended to acquire. So far, so good.

DOMinate It!

At this point, we are still left with a bunch of DOM objects in our array. We’ve got to convert this list to a string array eventually. Also, keep in mind, we have an array of DOM objects mixed with book titles and author names but luckily they are consecutive in the array.

This is where map comes into play. Each entry in the array has an innerText property that is actually the text you see on the screen. However, if you simply extract it as is then you’ll end up with a book title which includes, in brackets, another series the book title belongs to. We don’t really need to know that since we know that some books are part of multiple lists and we have already made our way to this list so it’s not so important to know the other lists. We just need the simple title.

Array.from(document.querySelectorAll(".bookTitle, .authorName"))
    .filter((entry) => entry.attributes.itemprop)
    .map((entry) => entry.innerText.replace(/\s\(.*?\)\s?/, ""))

As you can see, we are removing the unnecessary parts of the innerText property of each item in the array by replacing with an empty string. Every other item in the array is an author name and it doesn’t have anything of (…) sort as its innerText property. Technically, this replace operation is running for author names too, unnecessarily but I can live with that. I found it a lot simpler to run over one list than merge two lists later. Besides, one-liners look sexy.

Bring It Down

Time to reduce our result set. You now must have an array that looks similar to this.

["Homeland", "R.A. Salvatore", "Exile", "R.A. Salvatore", "Sojourn", "R.A. Salvatore", ...]

Items in this array are indexed such that odd (1,3, etc.) indexed items are author names and even (0,2, etc.) indexed items are book titles. The way I’ll use reduce to tackle every couple of items in the array is actually not too complicated but if you have never seen how to use reduce to remove duplicate items which depends on an extra argument, segue here.

Array.from(document.querySelectorAll(".bookTitle, .authorName"))
    .filter((entry) => entry.attributes.itemprop)
    .map((entry) => entry.innerText.replace(/\s\(.*?\)\s?/, ""))
    .reduce(
      (result, entry, index, list) =>
        index % 2 == 0 ? result.concat(`${list[index]}, ${list[index + 1]}`) : result,
      []
    )

The technique might seem odd but it’s actually very similar to removing duplicates. Most reduce implementation examples will use the first two arguments which are result and entry in our example. index argument is trivial, list argument is the array we are operating over, which is the mapped results that we are chaining on.

So, what do we want to do here? Using a modulo operator and checking if the index is odd or even, we can process all the items in the array in couples. The first part of the ternary operation is doing exactly that. If we simply used entry¬†then that would be each item as long as reduce ran but we want to basically skip every second item. How do we do that? If the index is odd then the second part of the ternary operation kicks in and we simply assign the last result back to itself. It’s more like result=result. Otherwise, we want to access list[index] and list[index+1] which are the couple we want to process and store in result. That part essentially resolves to result = result.concat(...). And, result is nothing but an empty array when reduce starts running, that [] at the end is for that. Again, if this looks alien to you, check out my other blog post about removing duplicates with a simple reduce.

Wrapping Up

Adding a join(“\n”) after reduce and wrapping everything with a console.log should do the trick to get the result we want. The final code is short but it actually has a lot of more moving parts than it first seems to have. Here is the Gist of it.

console.log(
  Array.from(document.querySelectorAll(".bookTitle, .authorName"))
    .filter((entry) => entry.attributes.itemprop)
    .map((entry) => entry.innerText.replace(/\s\(.*?\)\s?/, ""))
    .reduce(
      (result, entry, index, list) =>
        index % 2 == 0 ? result.concat(`${list[index]}, ${list[index + 1]}`) : result,
      []
    )
    .join("\n")
);

Last but not least, couple this with a bookmark then you are good to go. If you want to see how you can trigger this from a bookmark, look here.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.