Why doesn’t Smashwords accept EPUB files?

In February, 2011, an author and I struggled for several weeks to get a non-fiction book accepted to SW’s Premium Distribution Catalog (SPC). We were both new to self-publishing and the author really wanted to use SW for distribution. She had read about SW on several blogs and forums and determined that it was the best channel for self-publishers.

Therefore, I set about studying their Style Guide. We have a very complex format in the book including 141 bookmarks and 708 hyperlinks; the recommended stripping of all formatting (the “Nuclear Method”), as recommended in the Style Guide, wasn’t a viable option for us. I soldiered on trying to determine the cause of SW’s non-acceptance of our book.

I thought there must be a better way to get published and discovered Amazon and Barnes & Noble accepted EPUB. I taught myself how to create an EPUB manually, without the aid of conversion software. The author (I am working with) submitted my EPUB file to various retailers, but she really wanted to get on SW’s Premium Distribution Catalog. [click here for the whole story and the solution I found]

Perusing various self-pub blogs, I notice that SW seems to remain the self-publisher’s favorite and preferred distribution channel despite the agony of trying to get in their Premium Distribution Catalog. I think it is time to reveal what I have discovered about Smashwords and the Meatgrinder.

The Smashwords Process

Smashwords requires its users to submit their book as a Microsoft Word document [.doc only] after carefully formatting it according to their Style Guide. When you upload a Word document to SW and click “publish”, it is immediately inspected by their AutoVetter and you receive “instant feedback” about any “errors” in formatting it finds. There are 12 common “errors” listed in the Style Guide that “will delay or prevent your book’s acceptance in the Smashwords Premium Catalog.”

If and/or when you pass AutoVetter muster, your book is run through their so-called Meatgrinder for conversion to the e-book format of your choice.

From Smashwords Style Guide:
We affectionately call our file conversion system Meatgrinder.

Your source file, a Microsoft Word .doc document, goes in one end of the Meatgrinder and comes out the other end as multiple DRM-free digital book files…

meatgrinder process
Copied from the Style Guide

 

Smashwords takes your original Microsoft Word .doc source file and converts it into multiple ebook formats such as .EPUB, PDF. .RTF, .PDB, .MOBI, LRF and TXT, as well as into online HTML and Javascript formats.

To sum up, Smashwords strongly encourages its users to purchase and use Microsoft Word as their word processor, puts the onus on the user of finding (“AutoVetter analyzes your book for several potential problems, and its analysis is usually accurate.” [emphasis added]), removing and correcting formatting styles that their Meatgrinder cannot handle.

My search for a solution to AutoVetter errors

The book that I was trying to publish with Smashwords never made it to the Premium Catalog and the author wanted to know why not. I downloaded the EPUB file from SW and began my hunt. It was then that I noticed something interesting. I saw the word “calibre” many, many times in the code of the files comprising the EPUB. I wasn’t sure what to make of it at the time. I was new to self-publishing, EPUB construction and conversion tools but I had discovered Calibre, a open source e-book library manager that can “view, convert and catalog ebooks in most of the major ebook formats”, and had used it to convert my EPUB  to mobi in order to ensure our book looked good on the Kindle.

Why was “calibre” in the code of our Smashwords EPUB, I wondered. Let me show you the first few lines of the content.opf file in the EPUB created by SW.

<?xml version='1.0' encoding='utf-8'?>
<package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="uuid_id">
  <metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:opf="http://www.idpf.org/2007/opf" xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:calibre="http://calibre.kovidgoyal.net/2009/metadata"
xmlns:dc="http://purl.org/dc/elements/1.1/">

These lines are created by Calibre when you convert a document to EPUB.

Now look at these sample lines from the CSS stylesheet in the SW created EPUB :

.calibre1 {
    display: block
    }
.calibre2 {
    height: 11;
    width: 64
    }
.calibre3 {
    font-weight: bolder
    }
.calibre4 {
    display: block;
    page-break-after: always
    }

This CSS stylesheet was created by Calibre. I converted an OpenOffice document to EPUB and confirmed this.

 

Note this line in the toc.ncx file of the SW created EPUB:

<meta content="calibre (0.6.34)" name="dtb:generator"/>

Calibre was the generator of this file. Also, note the version number of Calibre, 0.6.34. The latest version of Calibre is 0.8.13.

Here are several excerpts of the table of contents file in the SW created EPUB:

<body dir="ltr" class="calibre">
<h3 class="western4" id="calibre_toc_2">
<br class="calibre1" />
alt="tmp_def57d58574ba2c10a67ad3464da2310_qryAnY_html_m5a1bb885.jpg" class="calibre2" />

(I include this alternate text code for an image because it amuses me. Can you imagine the automated reading voice on a blind e-reader’s device saying, “t-m-p-underscore-d-e-f-5-7-d-5-8-5……..” as the alternate text for your book cover?! I wonder if they can fast forward?)

The preceding samples of code are simply that—samples. I chose not to bore the reader with too much of it. Suffice it to say that the word “calibre” occurs 268 times in the EPUB created by Smashwords.

What is “OpenOffice” doing in my Smashwords-generated EPUB?

This was my second question. How did OpenOffice get into the metadata when we had used Microsoft Word as recommended by Smashwords? We now know that SW uses Calibre to convert documents to various e-book formats, but Calibre cannot convert a Word document. It can, however, convert an OpenOffice (odt) document although I don’t think that is the method used by SW.

Note the first few lines of one of the html files (chapter/section) of the EPUB created by SW.

<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content="HTML Tidy for Linux (vers 7 December 2008), see www.w3.org"/>
<meta http-equiv="CONTENT-TYPE" content="text/html; charset=utf-8"/>

<meta name="GENERATOR" content="OpenOffice.org 3.0 (Unix)"/>
<meta name="AUTHOR" content="Mary"/>
<meta name="CREATED" content="20110502;18110000"/>
<meta name="CHANGED" content="20110503;11191600"/>

First, notice the fourth line says the content was generated by “HTML Tidy for Linux.”
Tidy is a widely known, and used, open source program that takes a stab at finding and fixing HTML errors in code submitted to it. I say “takes a stab at” because it is mostly used by beginner programmers still dazed and confused by website construction. Tidy is not relied upon as an accurate “spellchecker” of HTML code by professional programmers.

Second, notice the sixth line indicates that content was generated by “OpenOffice.org 3.0 (Unix)”.
Huh? What is OpenOffice doing here?! Also, note “Unix”. When I opened a Word document in OpenOffice and saved it as HTML, this is what I found in the resultant code:

<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=windows-1252">
	<TITLE></TITLE>
	<META NAME="GENERATOR" CONTENT="OpenOffice.org 3.3  (Win32)">
	<META NAME="AUTHOR" CONTENT="Mary ">
	<META NAME="CREATED" CONTENT="20110721;11221687">
	<META NAME="CHANGEDBY" CONTENT="Mary ">
	<META NAME="CHANGED" CONTENT="20110724;19175749">

Notice the GENERATOR is OpenOffice.org 3.3 (Win32) [I am using the current version of OO, 3.3 and I am using it on a Windows computer not a UNIX computer.]
[You might wonder how two different “generators” and “contents” could be in the same file; XML is case sensitive so “generator” and “GENERATOR”  are two different entities.]
It would seem that SW takes your Word document and converts it to HTML in OpenOffice.

Third, notice the next line that says that the “author” is “Mary”.
This was our fault. We didn’t strip the metadata from Word before submitting it to SW, so I showed up as the author of this document. Hopefully this won’t show up in a library database somewhere because I am not the author of this book. The “CREATED” and “CHANGED” reveals the date and time our book was “created” and the last time it was “changed”, no doubt trying to correct AutoVetter errors.

From the Smashwords Style Guide:

Q: I don’t use Microsoft Word. Can I still publish on Smashwords?
Yes, though Microsoft Word is your best option. If you want to ensure the best results for your ebook, and you don’t use Microsoft Word, consider investing in a copy. You can usually find it for around $150 or less. Word will give you the greatest control over your formatting by allowing you to follow the Smashwords Style Guide. If your time is valuable to you, and you plan to publish multiple ebooks with Smashwords, Word is a good investment.

This strikes me as odd. Smashwords insists you submit a .doc file and suggests you use MS Word but they use OpenOffice internally. Why?

Smashwords says it can convert books to EPUB, RTF, PDF, PDB, MOBI, LRF, TXT, Javascript, HTML and “more coming”. [Note: OpenOffice can convert .doc files to RTF, PDF, TXT and HTML, natively.]

Calibre can convert books to EPUB, FB2, OEB, LIT, LRF, MOBI, HTMLZ, PDB, PML, RB, PDF, RTF, SNB, TCR, TXT, TXTZ.

Calibre can convert an ODT file, however, the best sources to convert from are [quoted from Calibre help] “in decreasing order of preference: LIT, MOBI, EPUB, FB2, HTML, PRC, RTF, PDB, TXT, PDF.” HTML is fifth in order of preference but ODT isn’t on the list so that might explain why SW takes the extra step of using OpenOffice to convert documents to HTML before using Calibre to convert to e-book formats.

“Check your work” (Mark, did you mean my work or your work?)

Mark Coker states in the SW Style Guide that “EPUB is your most important format, and is a requirement for inclusion in the Premium Catalog.” It also instructs users to “check the quality of the EPUB” (the one created for you by SW) by viewing it on a Kindle or Adobe Digital Editions.

In addition, from the Smashwords Style Guide:

CHECK FOR EPUBCHECK COMPLIANCE: If you want your book distributed to
the Apple iBookstore, the EPUB file we generated for you must pass EPUBCHECK,
which is an industry standard compliance validation tool. We’ve built a lot of magic into Meatgrinder that allows us to automatically repair many EPUBCHECK problems without your intervention, but we can’t fix them all.

Coker advises submitting the SW created EPUB to a free on-line validation (epubcheck) service. He stops short of expanding his  30-page Style Guide to include instructions on how to use epubcheck yourself. If the online service reports errors, the SG advises you “take a deep breath” and “try to study and understand” the “incomprehensible spaghetti language” of the “confusing error” messages, go to the “official EPUBCHECK Error Reporting Page to learn more about the errors”, declares “The confusing errors are stupid” and suggests that you go back to the “Nuclear Method” and start over.

When I have to essentially reformat my book to comply with the Style Guide, search for and fix errors, verify the appearance of my book, and use epubcheck to validate the EPUB, the Smashwords service doesn’t feel free anymore. I’m supposed to suck it up because it’s free but it feels like a lot of unnecessary extra work that doesn’t make sense to me anymore and my time is not free.

Summary

  • Smashwords accepts .doc files from its users
  • Recommends Microsoft Word and similar word processor
  • Recommends studying their Style Guide (SG) for successful submission of document
  • Checks file with AutoVetter for errors in formatting
  • Directs user back to SG if errors are found
    • My hypothesis regarding Smashwords’ file conversion process:
    • Converts .doc to HTML using OpenOffice
    • Checks and fixes HTML using Tidy
    • Converts HTML to EPUB using Calibre
  • Directs user to verify appearance and content of EPUB using Kindle, Adobe Digital Editions and epubcheck
  • Users are own their own to decipher and fix any errors found in the Smashwords created EPUB and reported by epubcheck, likewise for any errors found on Kindle or Adobe Digital Editions

Conclusion

My discovery of this information began with trying to find and correct errors in a complex non-fiction book I was trying to get in Smashwords Premium Distribution Catalog (SPC). It turned out that the author and I decided that the effort was too great to justify the work involved, especially since we had already created a beautiful, well-functioning EPUB.

Others will, no doubt, say that they have been able to successfully format their novels and gain acceptance to SPC without much effort.  However, every day I read in forums about authors struggling to get their books through the Smashwords process. Mark Coker should simply accept EPUB files. He submits an EPUB file to Apple, Sony, Barnes & Noble and could submit an EPUB to Amazon which they convert to mobi. Why can’t Smashwords authors submit an EPUB they create and validate?

Accepting EPUB files would streamline the process for both Smashwords and users and it would allow authors real control over the format and metadata of their books.

Coker declares in the Style Guide:

At Smashwords, our motto is “your book, your way.”

It doesn’t feel like that to me.

18 thoughts on “Why doesn’t Smashwords accept EPUB files?”

  1. Mary, for those who hand-code their book “chapters” and build their own ePub files, you can avoid all of these errors. I use Calibre to “explode” the ePub and allow me to edit the individual files, but you won’t find any reference to Calibre when doing that. I also make a point of editing the contents.opf and toc.ncx files to be sure that they are squeaky-clean, too. But the MAIN reason is that I NEVER use an auto-generated Table of Contents, so there are never any “foreign” entries. The easiest way to achieve this is to use one of many ePub-building tools (I use Anthemion eCub) and allow it to auto-generate a TOC… then pull that into an html editor and hack it. Replace ALL the auto-generated stuff with your own. Then, from that point on, always select “Do not generate a TOC” and “Do not create a css file” when you build your ePub. The other advantage is that you can then have a “pretty” TOC instead of the rather bland auto-generated ones, more in keeping with the styles used throughout the book, and without any auto-generated css styles intruding on your work, and messing up your chapter file code.

  2. Sorry to add to an already-lengthy reply. Once you have built an ePub without any “foreign” file entries, you still have to validate it – using the online service at threepress.org is easy but the cryptic nature of the error messages may seem daunting to many authors. It’s important to understand that ePub validation is all about Xhtml compliance, which is quite different to plain-vanilla html. Most of the errors (for me) were about the way I confused block-level xhtml tags with inline-level tags… I had block-level tags nested inside inline-level ones and vice-versa, but the error messages allowed me to track them all down.
    Then, having avoided Smashwords and built a fully-validated ePub, you next have to find publication outlets. Amazon will not accept an ePub as a conversion-to-AZW source, so you will need to use a tool like Calibre to convert your ePub to MOBI for them… it does an excellent job.

  3. Lindsay,
    Thanks your feedback!

    I’m assuming you mean the error we got in Smashwords about the “hidden” bookmarks which I addressed in this post. I don’t use any conversion tools to create my epubs. I clean the MS Word ToC if one exists in the document. I manually create all of the files in the EPUB, including the CSS, NCX and OPF files.

    When I am ready to validate the EPUB, I use epubcheck in a command prompt. I have found out that epubcheck not only finds errors in XHTML, but also in the structure of the EPUB itself. For instance, documents in the EPUB not referenced in the OPF and the structure of the NCX. The XHMTL errors can be daunting. As I have said before, I open my text xhtml files in Google Chrome as it stops on errors and reports the line they are found on. Easy peasy.

    I have submitted EPUB files to Amazon since February, 2011. To check the appearance of my EPUBs on Kindle devices, I have used Calibre to convert my EPUBs to the mobi format but have found that Kindlegen informs of things that Amazon considers to be errors which epubcheck does not. Many things that work just fine in EPUB are not supported by the Kindle.

    I don’t understand your use of Calibre to “explode the epub” since you say in this Kindle forum that you hand-code your epubs. I suppose you are editing epubs not creating them from scratch as I am since this code you submitted to the forum:

    earths-blood.epub/tmp_af7e6e36650d4b06688430391833559a_cJg63I.ch.fix ed.fc.tidied.stylehacked.xfixed_split_000.html

    appears to have been generated by Smashwords. Maybe you are using Tidy in Calibre? I couldn’t replicate this code in Calibre.

    For me, this sort of code is too messy. The file name of the chapter is 106 characters long, by my count!

  4. Hi, thanks for this article. I have had a book on Kindle for several years and just decided to try the free lending library for 90 days, but was looking for additional outlets. Smashwords seemed intriguing until I saw that they wouldn’t allow ePub uploads.

    I write my books in xhtml, put the chapters in a database, and deliver the content in html and LaTeX-generated PDF, as well as generate the ePub, all from one source. I a script to create the two formats, download them, then run another script to generate the PDF and ePub. As long as my original is in valid xhtml, I can easily update any book and re-create the files with a few clicks. You can see the html and PDF versions here.

    So I too would love to see smashwords accept ePub at least, because distribution/marketing is still not my strong suit.

  5. I also have a book on amazon.com. It was created in macbook pages, converted to a doc. file, then to html. When I’ve edited it, I’ve gone right into the html to do that because I can then easily transfer that file to calibre for conversion. It’s not worth it for me, at this time, to do an editing job pages. Not sure why smashwords won’t accept calibre created epub files.

    1. I’m soooo over Smashwords. I can’t imagine what the hold up is regarding them accepting epub files. It either passes epubcheck or it doesn’t. Is Coker going to check whether or not the epub file looks good on the myriad of devices that accept epubs? Let’s hope he isn’t working on a “meatgrinder” for epubs!

  6. Yes this “No Epub” dictate is annoying. I already have an epub file that passes epubcheck – Why do I have to go back and mess with a Word doc? I only want to use SW to get into one store that uses epub – I have the rest covered.

    1. Smashwords announced earlier this year that they will accept epubs some time this year. Time is running out. They run the risk of becoming obsolete if they don’t get it done soon.

    2. I agree, it is anoying. That’s my case too: I already have a couple of ebooks in epub format, already validated with epubcheck. I just doesn’t make sense to start again on a Word document. Specially when I don’t use Word.

      Sigil is such an excellent tool to make epubs. And there are other excellent options available, many of them free. Why mess with Word, when you can use tools like Sigil, specially designed to handle epubs?

      I hope Smashwords adds support to epubs very soon.

      1. Since Smashwords said they were going to accept ePubs in 2012 and it’s December 4, it doesn’t look very likely they will get it done this year. Assuming they’re working on it.

  7. From a personal point of view, it makes no difference to me if they accept epub… but that’s only because I ended up banging my head through the thing until I got there with .doc.

    Originally I did use Word as suggested, I actually bought a copy for that specific purpose. It was so pointlessly complicated following the style guide and then trying to click here, then there, then this sub option onto the next one… and on and on…

    Finally, I read about someone who used Open Office (software I already owned and used, but Mark didn’t recommend it), and the instructions for Open office are so simple anything else was pointless.

    For anyone sick of Word it’s simple in Open Office:

    1. Insert>page break (after title/chapters table on first page)
    2. Highlight main text: format>paragraph (automatically comes up with table as ‘indents & spacing’)
    3. Change the field ‘first line’ from whatever it is (usually 0.00) to 0.50. Click OK/APPLY
    Indents done…
    4. Backspace out of any bad looking indents and re-center (if you have timeline breaks like: ‘…’ )
    5. add ‘###’ at the end of your book

    6. Repeat page breaks for chapters.
    7. Highlight chapter headings (you will have already created the list on the first page too) insert>bookmarks.
    8. Name book marks with ‘chapter 1’/2/3 and so on.
    9. Highlight ‘chapter 1’ in your front page, click the hyperlink ‘world’ icon at top.
    10. Click the ‘document link’ on the left, then click the icon that looks like a target (archery target)
    11. On the left a table will appear, click on the ‘+’ sign beside the word ‘bookmarks’, and you will see the previous bookmark titles you made. Click ‘chapter 1’.
    Done. Linked.
    12. Repeat 9-11 for each link to your chapters.
    13. Save book file>save as: ‘Microsoft word 97/2000/XP.doc’
    14. Upload to smashwords. No problems (unless you forgeot to put ‘Smashwords Edition’).

    If your just writing a short story, then only bother with 1-5 and of course 13/14

    So, after reading this article, I find it even more annoying that they actually use part of the Open Office software (that’s free) but don’t even recommend it! Instead of me having to mess around for days with word and reading the style guide all I ever needed was the above 14 points! It’s not rocket science is it, or shouldn’t be.

    To the writer of the article, I totally get how changing all those links would be a nightmare… it shouldn’t be that hard.

  8. Pingback: Gregory Smith
  9. Pingback: Vanessa Smith

Leave a Reply