“Wait a minute! You said you don’t use conversion tools to make your ePubs!” Well, that was before my clients decided they also want a paperback book from CreateSpace.
Most of the books I format are quite long at over 400 pages with many chapters and hyperlinks to terms used in the text. I really needed an efficient method to construct an ebook as well as format it for print. While it isn’t a perfect solution, Adobe’s CS6 InDesign has provided a decent platform for my efforts.
The learning curve is very steep for InDesign and requires much study and practice. It is the industry standard for print formatting so I think it is worth the effort required to become proficient. I stop short of saying “mastering” it because I believe that could possibly require years.
The InDesign tutorials at Lynda.com are invaluable and well worth the subscription price. Anne-Marie Concepción’s InDesign CS5.5 to EPUB, Kindle, and iPadis very good. She demonstrates the use of Edit All Tags in Paragraph Styles. Use it to assign your CSS styles to your Character Styles. This tidys up the XHTML code; for example, InDesign will use a span class for italic text you have assigned a Character Style, very long and messy. You can assign this Character Style to the html code <em>.
The bottom line is that, even though I use InDesign to make the ePub file, there is still lots of work to fine tune it and complete the opf file. InDesign saves some construction time, however, there is no substitute for understanding the structure of an ePub, proper XHTML code and the CSS template.
It’s not difficult to understand why someone would think that a big publishing company would be a good resource to publish their book. However, Penguin appears to be taking advantage of newbie book authors with a very aggressive contract.
The flat fee of $599 for print and digital formatting is acceptable and reasonable. Interior formatting for print is a complex and time-consuming process. What is not acceptable is owning the rights and taking a life-time royalty.
A new author does not often require a print edition of their book. They will most likely simply require an ePub (digital) edition that they can upload to Amazon, Barnes & Noble, etc. themselves.
In February, 2011, an author and I struggled for several weeks to get a non-fiction book accepted to SW’s Premium Distribution Catalog (SPC). We were both new to self-publishing and the author really wanted to use SW for distribution. She had read about SW on several blogs and forums and determined that it was the best channel for self-publishers.
Therefore, I set about studying their Style Guide. We have a very complex format in the book including 141 bookmarks and 708 hyperlinks; the recommended stripping of all formatting (the “Nuclear Method”), as recommended in the Style Guide, wasn’t a viable option for us. I soldiered on trying to determine the cause of SW’s non-acceptance of our book.
I thought there must be a better way to get published and discovered Amazon and Barnes & Noble accepted EPUB. I taught myself how to create an EPUB manually, without the aid of conversion software. The author (I am working with) submitted my EPUB file to various retailers, but she really wanted to get on SW’s Premium Distribution Catalog. [click here for the whole story and the solution I found]
Perusing various self-pub blogs, I notice that SW seems to remain the self-publisher’s favorite and preferred distribution channel despite the agony of trying to get in their Premium Distribution Catalog. I think it is time to reveal what I have discovered about Smashwords and the Meatgrinder.
The Smashwords Process
Smashwords requires its users to submit their book as a Microsoft Word document [.doc only] after carefully formatting it according to their Style Guide. When you upload a Word document to SW and click “publish”, it is immediately inspected by their AutoVetter and you receive “instant feedback” about any “errors” in formatting it finds. There are 12 common “errors” listed in the Style Guide that “will delay or prevent your book’s acceptance in the Smashwords Premium Catalog.”
If and/or when you pass AutoVetter muster, your book is run through their so-called Meatgrinder for conversion to the e-book format of your choice.
From Smashwords Style Guide:
We affectionately call our file conversion system Meatgrinder.
Your source file, a Microsoft Word .doc document, goes in one end of the Meatgrinder and comes out the other end as multiple DRM-free digital book files…
To sum up, Smashwords strongly encourages its users to purchase and use Microsoft Word as their word processor, puts the onus on the user of finding (“AutoVetter analyzes your book for several potential problems, and its analysis is usually accurate.” [emphasis added]), removing and correcting formatting styles that their Meatgrinder cannot handle.
My search for a solution to AutoVetter errors
The book that I was trying to publish with Smashwords never made it to the Premium Catalog and the author wanted to know why not. I downloaded the EPUB file from SW and began my hunt. It was then that I noticed something interesting. I saw the word “calibre” many, many times in the code of the files comprising the EPUB. I wasn’t sure what to make of it at the time. I was new to self-publishing, EPUB construction and conversion tools but I had discovered Calibre, a open source e-book library manager that can “view, convert and catalog ebooks in most of the major ebook formats”, and had used it to convert my EPUB to mobi in order to ensure our book looked good on the Kindle.
Why was “calibre” in the code of our Smashwords EPUB, I wondered. Let me show you the first few lines of the content.opf file in the EPUB created by SW.
(I include this alternate text code for an image because it amuses me. Can you imagine the automated reading voice on a blind e-reader’s device saying, “t-m-p-underscore-d-e-f-5-7-d-5-8-5……..” as the alternate text for your book cover?! I wonder if they can fast forward?)
The preceding samples of code are simply that—samples. I chose not to bore the reader with too much of it. Suffice it to say that the word “calibre” occurs 268 times in the EPUB created by Smashwords.
What is “OpenOffice” doing in my Smashwords-generated EPUB?
This was my second question. How did OpenOffice get into the metadata when we had used Microsoft Word as recommended by Smashwords? We now know that SW uses Calibre to convert documents to various e-book formats, but Calibre cannot convert a Word document. It can, however, convert an OpenOffice (odt) document although I don’t think that is the method used by SW.
Note the first few lines of one of the html files (chapter/section) of the EPUB created by SW.
<?xml version='1.0' encoding='utf-8'?>
<meta name="generator" content="HTML Tidy for Linux (vers 7 December 2008), see www.w3.org"/>
<meta http-equiv="CONTENT-TYPE" content="text/html; charset=utf-8"/>
<meta name="GENERATOR" content="OpenOffice.org 3.0 (Unix)"/>
<meta name="AUTHOR" content="Mary"/>
<meta name="CREATED" content="20110502;18110000"/>
<meta name="CHANGED" content="20110503;11191600"/>
First, notice the fourth line says the content was generated by “HTML Tidy for Linux.”
Tidy is a widely known, and used, open source program that takes a stab at finding and fixing HTML errors in code submitted to it. I say “takes a stab at” because it is mostly used by beginner programmers still dazed and confused by website construction. Tidy is not relied upon as an accurate “spellchecker” of HTML code by professional programmers.
Second, notice the sixth line indicates that content was generated by “OpenOffice.org 3.0 (Unix)”.
Huh? What is OpenOffice doing here?! Also, note “Unix”. When I opened a Word document in OpenOffice and saved it as HTML, this is what I found in the resultant code:
Notice the GENERATOR is OpenOffice.org 3.3 (Win32) [I am using the current version of OO, 3.3 and I am using it on a Windows computer not a UNIX computer.]
[You might wonder how two different “generators” and “contents” could be in the same file; XML is case sensitive so “generator” and “GENERATOR” are two different entities.]
It would seem that SW takes your Word document and converts it to HTML in OpenOffice.
Third, notice the next line that says that the “author” is “Mary”.
This was our fault. We didn’t strip the metadata from Word before submitting it to SW, so I showed up as the author of this document. Hopefully this won’t show up in a library database somewhere because I am not the author of this book. The “CREATED” and “CHANGED” reveals the date and time our book was “created” and the last time it was “changed”, no doubt trying to correct AutoVetter errors.
From the Smashwords Style Guide:
Q: I don’t use Microsoft Word. Can I still publish on Smashwords?
Yes, though Microsoft Word is your best option. If you want to ensure the best results for your ebook, and you don’t use Microsoft Word, consider investing in a copy. You can usually find it for around $150 or less. Word will give you the greatest control over your formatting by allowing you to follow the Smashwords Style Guide. If your time is valuable to you, and you plan to publish multiple ebooks with Smashwords, Word is a good investment.
This strikes me as odd. Smashwords insists you submit a .doc file and suggests you use MS Word but they use OpenOffice internally. Why?
Calibre can convert an ODT file, however, the best sources to convert from are [quoted from Calibre help] “in decreasing order of preference: LIT, MOBI, EPUB, FB2, HTML, PRC, RTF, PDB, TXT, PDF.” HTML is fifth in order of preference but ODT isn’t on the list so that might explain why SW takes the extra step of using OpenOffice to convert documents to HTML before using Calibre to convert to e-book formats.
“Check your work” (Mark, did you mean my work or your work?)
Mark Coker states in the SW Style Guide that “EPUB is your most important format, and is a requirement for inclusion in the Premium Catalog.” It also instructs users to “check the quality of the EPUB” (the one created for you by SW) by viewing it on a Kindle or Adobe Digital Editions.
In addition, from the Smashwords Style Guide:
CHECK FOR EPUBCHECK COMPLIANCE: If you want your book distributed to
the Apple iBookstore, the EPUB file we generated for you must pass EPUBCHECK,
which is an industry standard compliance validation tool. We’ve built a lot of magic into Meatgrinder that allows us to automatically repair many EPUBCHECK problems without your intervention, but we can’t fix them all.
Coker advises submitting the SW created EPUB to a free on-line validation (epubcheck) service. He stops short of expanding his 30-page Style Guide to include instructions on how to use epubcheck yourself. If the online service reports errors, the SG advises you “take a deep breath” and “try to study and understand” the “incomprehensible spaghetti language” of the “confusing error” messages, go to the “official EPUBCHECK Error Reporting Page to learn more about the errors”, declares “The confusing errors are stupid” and suggests that you go back to the “Nuclear Method” and start over.
When I have to essentially reformat my book to comply with the Style Guide, search for and fix errors, verify the appearance of my book, and use epubcheck to validate the EPUB, the Smashwords service doesn’t feel free anymore. I’m supposed to suck it up because it’s free but it feels like a lot of unnecessary extra work that doesn’t make sense to me anymore and my time is not free.
Smashwords accepts .doc files from its users
Recommends Microsoft Word and similar word processor
Recommends studying their Style Guide (SG) for successful submission of document
Checks file with AutoVetter for errors in formatting
Directs user back to SG if errors are found
My hypothesis regarding Smashwords’ file conversion process:
Converts .doc to HTML using OpenOffice
Checks and fixes HTML using Tidy
Converts HTML to EPUB using Calibre
Directs user to verify appearance and content of EPUB using Kindle, Adobe Digital Editions and epubcheck
Users are own their own to decipher and fix any errors found in the Smashwords created EPUB and reported by epubcheck, likewise for any errors found on Kindle or Adobe Digital Editions
My discovery of this information began with trying to find and correct errors in a complex non-fiction book I was trying to get in Smashwords Premium Distribution Catalog (SPC). It turned out that the author and I decided that the effort was too great to justify the work involved, especially since we had already created a beautiful, well-functioning EPUB.
Others will, no doubt, say that they have been able to successfully format their novels and gain acceptance to SPC without much effort. However, every day I read in forums about authors struggling to get their books through the Smashwords process. Mark Coker should simply accept EPUB files. He submits an EPUB file to Apple, Sony, Barnes & Noble and could submit an EPUB to Amazon which they convert to mobi. Why can’t Smashwords authors submit an EPUB they create and validate?
Accepting EPUB files would streamline the process for both Smashwords and users and it would allow authors real control over the format and metadata of their books.
Coker declares in the Style Guide:
At Smashwords, our motto is “your book, your way.”
When I entered the self-publishing business, I was working with an author who signed up with Smashwords (SW) and was determined to join the many authors who believe that Smashwords is the best distributor of e-books for self-publishers. They may be indeed be a great choice if you can get on what Smashwords calls their “Premium Distribution List.”
When we initially began submitting a Microsoft Word document, we were approximately #187 in the queue. After a few hours, our document was rejected for errors. We studied the Smashwords Style Guide and began our quest for advanced formatting skills in MS Word. On our next document submission to SmashWords, we were #362 in the queue and a day later were rejected again with “errors” in our document.
After hours and days of trying to find the “errors” (hyperlinks that didn’t work) in our book, I decided to see what file formats other distributors were accepting for e-books. My research led me to the realization that the EPUB format was, and is, the industry standard for electronic books. I dislike and distrust automated conversion services, so I set about learning how to convert our Word document to the EPUB format manually.
The learning curve was steep but I had our first EPUB completed in a couple of weeks, verified as error-free with epubcheck and we submitted it directly to Amazon. It was accepted by Amazon in less than 24 hours.
Smashwords requires MS Word docs
So, why does Smashwords continue to require files submitted to them be MS Word documents? Only Mark Coker, the founder of Smashwords, can answer that and he isn’t saying. We have repeatedly asked him to allow the submission of EPUB files. He repeatedly insists that if we studied the Style Guide, we could fix the “errors” in our document and attain Premium Distribution.
I quickly lost all interest in using Smashwords as a distributor. I grew weary of the mysterious “meatgrinder” process that SW puts documents through. I didn’t want them to “smash” the words of our book through their “meatgrinder” any longer. The author of the book felt the same way. “We have an EPUB that is perfect and beautiful, why can’t we submit it to Smashwords?”
You may be thinking, “what a loser! Why can’t you fix the errors?” In our defense, there are 708 hyperlinks in the book. It is non-fiction and there are 141 terms that are defined and hyperlinked in the text.
My author just couldn’t let the possibility of Smashwords as a distributor go, so she wrote to Mark again and pleaded for clarification about the repeated rejection of our document. Mark was nice enough to look at the problem and declared that there were “hidden bookmarks” in our document. Aha! She had read about this “problem” in their forum and directed me to look into it.
There were suggested fixes. Chief amoung them: delete all the bookmarks in your document using a complex Visual Basic macro. This is what Microsoft has to say about removing hidden bookmarks.
WARNING: Microsoft Word uses hidden bookmarks for table of contents entries, cross-references, and captions. After stripping the hidden bookmarks from a document with cross-references, and then updating fields, all the references are lost. In their place you get one of the following error messages: “Error! Reference source not found” or “Error! Bookmark not defined.”
That was NOT a option. I was not going to delete 141 bookmarks and re-create them, so I kept looking. Mark had pointed to a specific line in SW’s code that demonstrated what he said was the problem. He said (believed it to be) the problem with our document was hidden bookmarks. I looked at the code in the failed epub. [An EPUB file is simply a zipped file. If you want to see the files inside an EPUB, simply rename the file extension to .zip instead of .epub and you can extract the files. This is very useful if you want to study the method used by other authors/constructors of EPUB files.]
I opened our Word document and saved it as a “Web Page, filtered” document and looked at the code. The offending line Mark had identified did indeed have an error in it, but it wasn’t a “hidden bookmark.” It was an extra bookmark. Word will kindly do whatever you ask of it even if it makes no sense. We have many, many bookmarks and hyperlinks in our book. Mistakes were made adding bookmarks. When links didn’t work, they didn’t go where they should, the right bookmark was added. What we didn’t realize is that adding a bookmark did not remove the first bookmark, it simply added another one. Thus, when I looked at the HTML code, there were two or more “<a name=[bookmark]>.”
If you find you have the same problem, here’s how to fix it. Click on the word that is bookmarked. Click Insert/Bookmark. The bookmark that is currently being used will be highlighted. Click Delete (even if it is the right bookmark). Click Insert/Bookmark again, a bookmark will be highlighted if there is another bookmark. Click Delete. Repeat this process until there is not a bookmark highlighted in the open dialog box. Next, click Insert/Bookmark and Add the correct bookmark. Problem solved.
It was too late for Smashwords and Mark Coker, though. The author had finally grown tired of beating her head against the “meatgrinder.” She already had her 8 books published on Amazon, Barnes & Noble, iBooks, LSI, xiinxii and her website.
Michael Connelly’s The Black Echo is a nice read. I don’t buy much fiction, but I couldn’t resist the impulse to purchase this book for $.99 when I saw it advertised on my Nook Color. I was not disappointed; I am really enjoying it and find it hard to put down.
I wanted to “reward” Michael Connelly for his decision to drop the price, a purchase I would not have made if not for seeing the ad, however. I am beginning to understand the frustration of independent self-publishers, like my clients, with the marketing strategies of Barnes & Noble and Amazon.
You can’t blame them for choosing to promote “big” names like Connelly. They want to generate sales and choose to advertise known, successful authors. Let’s hope dropping prices will level the playing field for all authors.