Forgot your password?
typodupeerror
Books Hardware Hacking Build Hardware

The DIY Book Scanner 177

Posted by Soulskill
from the lightening-the-load dept.
azoblue writes "Daniel Reetz did not want to lug around heavy textbooks, so he built a book scanner to create digital copies. '... over three days, and for about $300, he lashed together two lights, two Canon Powershot A590 cameras, a few pieces of acrylic and some chunks of wood to create a book scanner that's fast enough to scan a 400-page book in about 20 minutes (PDF). To use it, he simply loads in a book and presses a button, then turns the page and presses the button again. Each press of the button captures two pages, and when he's done, software on Reetz's computer converts the book into a PDF file. The Reetz DIY book scanner isn't automated — you still need to stand by it to turn the pages. But it's fast and inexpensive.'"
This discussion has been archived. No new comments can be posted.

The DIY Book Scanner

Comments Filter:
  • by Anonymous Coward on Sunday December 13, 2009 @02:29PM (#30424308)

    This would be a good activity for the winter months when farming isn't possible.

    • Re: (Score:2, Funny)

      by Anonymous Coward

      This would be a good activity for the winter months when farming isn't possible.

      That's why God gave us illegal immigrants.

  • Look out! (Score:4, Insightful)

    by Chris Tucker (302549) on Sunday December 13, 2009 @02:31PM (#30424328) Homepage

    Here comes the Publisher's Copyright Enforcement Gundams to give you "What For!".

    Imagine that, thinking you could actually DO Something like that with your very own property.

    What cheek!

    • Re: (Score:3, Insightful)

      by nametaken (610866) *

      Not sure he did it for his own property. But it does prove that books have the best DRM of all.

  • A bargain (Score:5, Informative)

    by thethibs (882667) on Sunday December 13, 2009 @02:39PM (#30424390) Homepage

    Except for the lack of an automatic page-turner, Daniel's device is the same as one you can buy commercially for about $20,000 (http://www.treventus.com/bookscanner_pageturner.html).

    He was wise to decide on manual page-turning.

    • Re: (Score:3, Interesting)

      by Farhood (975274)

      I have Kinko's/Staples/ Office Depot cut off the spine ($1-$5), clip it on all sides, and go home to my Fujitsu ScanSnap for ADF scanning, auto color/ b/w selection, and OCR. Oh, and you press the button once and walk away.

      • So your books are worth nothing to you, are they? Or you don't have any books remotely considered rare. Seriously, some of us have books worth more than $3 and some of us would like to be able to resell some of them at one point which is decidedly hard to do when you've butchered them.
  • Heh (Score:4, Insightful)

    by sys.stdout.write (1551563) on Sunday December 13, 2009 @02:43PM (#30424414)
    I do this for my law school textbooks (unless you're a book publisher, in which case I am joking and would never break the law).

    I was excited when I read this because it is a pain in the ass to turn the pages in a 1000 page Constitutional Law textbook. Thus, you can imagine my disappointment when I read that his machine doesn't automate this.

    Most universities have at least one library which has a Ricoh scanner that does exactly what his does, i.e. it writes out a PDF onto your USB stick. I don't know where he's a graduate student, but I bet if he looked in his library he could have saved himself $300.
    • by Vellmont (569020)


      Most universities have at least one library which has a Ricoh scanner that does exactly what his does, i.e. it writes out a PDF onto your USB stick. I don't know where he's a graduate student, but I bet if he looked in his library he could have saved himself $300.

      Except most scanners take on the order of tens of seconds to scan a page, and force you to pick up the book, turn the page, and put it flat again. This arrangement takes a picture and the book is in its normal orientation, so page turning is easy.

      • by emj (15659)
        1 second per page that makes 6 minutes per book, if you only want the images then you can do it faster. Flipping by hand, and not using glas to straigthen the pages, I've got as low as 0.4s per page and 0.9s/p on average for two books. (that's 6 minutes for 400 pages, though you don't want to do more than 200-270 pages without one of these scanners)
    • Re:Heh (Score:4, Informative)

      by atarkri (1092827) on Sunday December 13, 2009 @03:34PM (#30424774)
      The school is NDSU. Yes we (he) looked. No our library does not have one.

      He has details of the reasons on his blog danreetz.com/blog [danreetz.com]
    • Re: (Score:3, Insightful)

      by TubeSteak (669689)

      I do this for my law school textbooks (unless you're a book publisher, in which case I am joking and would never break the law).

      What law are you breaking?
      Whether you scan it and convert the OCRed text into an audio book, rip all the pages out and turn it into an art exhibit, or use the book for toilet paper, the publisher has no legal right (AFAIK) to stop you.

      • by cgenman (325138)

        IANAL, but... Scanning books is a form of copying. Converting OCRed text into an audio book is a form of creating a derivative work. Both of these fall under the purvey of copyright law, and may or may not fall under fair use. It may be the sort of thing that you could fight and win in court, but you'd probably have to fight. And, of course, if the original poster explicitly created this machine because textbooks are expensive, then the "significant non-infringing uses" defense is definitely lower.

        Using

      • by cdrguru (88047)

        You might want to read the front matter of just about every book published to see that they specifically address feeding the book into a computer in any way possible and say it is a violation of the copyright if done without permission.

        Of course, nobody gives a rat's ass about copyright any longer. So torrenting the books from somewhere like Romainia should be just fine.

        • Re:Heh (Score:4, Insightful)

          by Toonol (1057698) on Monday December 14, 2009 @12:33AM (#30428272)
          You might want to read the front matter of just about every book published to see that they specifically address feeding the book into a computer in any way possible and say it is a violation of the copyright if done without permission.

          It doesn't matter what they say. It matters what the law says, and if they tell you that you can't do something the law says you can, the law wins. The more books add legal crap in order to be more like software EULAs, the more lies they will incorporate, like software EULAs.

          I doubt there's much of a chance at all that you would be found guilty of copyright infringement for making a format change of your own book, for your own use. That's nearly the most straightforward example of fair use you could imagine. If you distributed it, sure; that's not fair use.
    • Re:Heh (Score:4, Insightful)

      by rdnetto (955205) on Sunday December 13, 2009 @04:27PM (#30425240)

      Not sure where you live, but in most areas format shifting is usually recognized as fair use. Whether or not torrenting the PDF counts as format shifting isn't a question that the courts have answered yet, but it's currently the most convenient method.

  • by i_want_you_to_throw_ (559379) on Sunday December 13, 2009 @02:44PM (#30424430) Homepage Journal
    How soon before the manufacturer of the $20,000 commercial version files a lawsuit against him? That would be extraordinarily sad because the American system of patent/copyright only serves to stifle independent innovation like this.
  • by Slugster (635830) on Sunday December 13, 2009 @02:52PM (#30424490)
    It may work well enough for basic textbooks, but the problem is that (for high-quality scans) you can't ever get the same image quality from a $800 camera that you can from a $80 scanner. At 1200 DPI, a scanner is equivalent to a ~384 MP camera. Even scanning at "only" 300 DPI is ~90 MP, a far bigger image than any consumer-grade camera can provide.

    The cameras he used were only five megapixels.

    Might work for looking at the pages on your iPhone. Not gonna look very readable on your laptop screen, and forget about reading the book's footnotes.....
    ~
    • Re: (Score:3, Informative)

      by bloobloo (957543)

      There's no problem with the resolution.

      9" x 6" page, scanned at 300 dpi = 2700 x 1800 pixels = 4.86 MP.

    • Re: (Score:2, Informative)

      by maxume (22995)

      Lots of book scanners use ccds. They are good enough. No one really wants a 'portable' scanned document that weighs in at 3 gigabytes anyway, current laptop IO makes that a pain in the ass.

    • by smallfries (601545) on Sunday December 13, 2009 @03:06PM (#30424590) Homepage

      You haven't actually tried this have you? I've had various flatbed A4 scanners over the years, all at much higher resolution than a camera, and hence all got down-sampled afterwards for my display that is only 1.5MP anyway. Then I switched to using a phone camera with only a 2MP CCD, but a really good lens and decent macro mode (Sony-Ericcson Cybershot for those that are interested). As long as the focus was good it produced perfectly readable shots, and so it became my portable scanner. These days I mostly shot stuff at home so I have a 12MP DSLR to hand. It's huge overkill, and I massively down-sample stuff afterwards, but entirely readable. So your basic claim that this can't be done with a camera based on the resolution compared to a scanner is a complete load of bollocks. The focus of the lens tends to be the important issue.

    • Re: (Score:3, Informative)

      by Chris Tucker (302549)

      FYI, the color camera on the Mars Rovers.

      One Megapixel. Really spiffy and detailed images of the Martian landscape for only one megapixel, don't you think?

      Also, TFA states he's using OCR to create a PDF.

      If the image from the camera is sharp enough, the OCR software should have no trouble "reading" it.

      • by cdrguru (88047)

        It might be nice if you understood digital photography before opining on something you clearly know nothing about.

        The Mars Rover camera is a very special instrument. How consumer digital cameras work is with something called a Bayer matrix of red, green and blue filters. The end result is that you get RGB values by interpolation - in reality you have about 1/4th the resolution of the sensor. You can get pretty fancy with the interpolation, but there is still a huge loss of detail. When the output is a J

    • by Toonol (1057698)
      Not gonna look very readable on your laptop screen, and forget about reading the book's footnotes.....

      No, it's fine. A laptop screen is 1400x900 or thereabouts; even a cheap camera will have better resolution than that. It's not going to be a problem unless you're doing a fair amount of zooming in.

      At 1200dpi, an 8.5" x 11" document will be 10,200 x 13,200 resolution. That may be useful for some purposes, but for simple text browsing it's overkill by nearly a whole order of magnitude.
  • I've (Score:4, Funny)

    by Kamineko (851857) on Sunday December 13, 2009 @02:53PM (#30424498)

    What a coincidece! I too have a book scanner that scans books, and requires a human operator to attend to turning the pages.

    It's called a scanner.

    • Re: (Score:3, Interesting)

      by iammani (1392285)
      We would love to see you scan 400 pages in 20 minutes with your 'book scanner'.
      • I can do it in 20 minutes. Each scan takes 5 or 6 seconds, but you do two pages at a time. Thus:

        (200 scans) * (6 seconds / scan) = 1200 seconds

        Otherwise known as 20 minutes.
        • Re: (Score:3, Insightful)

          by fbjon (692006)
          Really? Scanning takes a fair number of seconds, then you need to lift the book in order to turn the page, set it down correctly, and start the next scan. Compare with: push button, turn page, push button. Limited pretty much by how fast you turn the page.
        • by iammani (1392285)
          Hmmm i wonder if turning the page and pressing a button takes the same time as lifting the book and turning the page, placing the book back and pressing a button.

          And if you can do 400 pages on a flat bed scanner in 20 min, i bet you could do it much much faster on this guys setup.
    • by Patik (584959)
      Let us know how long it takes you to scan a 400-page book using that method. I bet it's a tad over 20 minutes.
    • Re:I've (Score:5, Informative)

      by The -e**(i*pi) (1150927) on Sunday December 13, 2009 @04:03PM (#30425038)

      http://www.geocities.jp/takascience/lego/fabs_en.html [geocities.jp]

      turning the pages and scanning is childs play

  • repost (Score:5, Informative)

    by AnonGCB (1398517) <7spams AT gmail DOT com> on Sunday December 13, 2009 @02:54PM (#30424500)

    http://bkrpr.org/doku.php [bkrpr.org]

    Same thing, much cheaper (I built mine for ~150 USD.)

    • Re:repost (Score:5, Informative)

      by idji (984038) on Sunday December 13, 2009 @03:13PM (#30424636)
      yeah, but you have to press 2 buttons and then lift your two cameras with your 4 sided PMMA/perspex/plexiglass box every time - he has a hinged L-shaped piece of perspex and one button - a more elegant solution - half the button presses, the cameras don't move and less weight.
      • by fwarren (579763)

        Besides being half the price of the other setup, there is a larger consideration. It is the size. I have no place to store the scanner they use in this article. I am hard pressed to find a place I could set it up other than in the middle of my living room. The smaller scanner for half the price I could find a place to store it when I am not using it.

      • I don't get it. Can't you simply leave out the "front" side of the box, that is the side where you'd sit if you were reading the book? The cameras don't need a piece of glass there, and the whole contraption could still be stable. That way you could reach in and turn the page without lifiting the glass box. Seems much more convenient. I must be missing something.

        • Re: (Score:3, Informative)

          by idji (984038)
          the bottom two sides of the box are holding the pages flat for the cameras. He has to lift the box to turn the page.

          Your idea would end up with bent pages.
  • If so, wouldn't it be easier to just rip out the binding and put in the pages? The $15 cost of buying another copy is less than all that boring, repetitive manual labor.

  • by phantomcircuit (938963) on Sunday December 13, 2009 @03:03PM (#30424558) Homepage

    He keeps talking about how expensive the books are. Clearly he is just using this to scan other people's books to avoid paying.

    Still a pretty cool build though :P

    • Re: (Score:3, Informative)

      by atarkri (1092827)
      Actually, the motivation behind the project stem's from Dan's stay in Russia before his graduate studies. He realized that their are tons of old posters, pictures, and other soviet propoganda floating around the country's libraries that many people in the western world would like to view, but are unwilling to go to Russia to see. He wanted to digitize some of these posters (works of art, in his view) in order to circulate them on the web. He soon became very frustrated with using a flatbed scanner, and st
    • by fwarren (579763) on Sunday December 13, 2009 @04:31PM (#30425282) Homepage

      He may be scanning books to pirate them. However, I am a college student as well but trying to save money by pirating the books is not my objective.

      I am in my 40's and my eyesight is not what it used to be. Here is why I would buy the books and scan them.

      1. To be legal and comply with the law. I may very well by the books used, to get them as cheaply as possible. But I will buy them.
      2. It is much lighter for me to carry one laptop around on campus, perhaps with copies of all the books I have used for all terms up to the current term.
      3. I can zoom the pages to a comfortable size to read the text.
      4. I now have the ability to search through the text.
      5. I can use a text-to-speech reader to listen to the book, I can even make an mp3 of the book if I so desired.

      To me it sounds like a bargain

    • by couchslug (175151)

      "Clearly he is just using this to scan other people's books to avoid paying."

      Textbook makers and colleges exploit a captive student population, so that attitude is understandable.

  • by Surt (22457) on Sunday December 13, 2009 @03:09PM (#30424608) Homepage Journal

    This is a market that relies on outrageous reproduction prices just like cd's used to. They are equally doomed. I know a LOT of college students who no longer buy books ... they rent them for free by buying them, shooting them, and returning them. It may take a couple of hours to do manually without a device like this, but $80 per hour is pretty good wages for a college student.

    • by Weezul (52464)

      Or just download them from other students?

      I recently taught an upper level computer science course in a second world country. I was worried about whether the students would have access to books. No problem, students have already digitized all common undergraduate text books and share them on various eastern european websites. So the official course webpages often just link the textbook directly.

    • And a new department of the MAFIAA is born... coming soon to a courtroom near you!
  • by skroz (7870)

    Just use a bandsaw to cut off the spine and feed it through a normal scanner with a sheet feeder. Duh. Faster, cheaper, and better results along the spine.

    Oh, you wanted to keep the books INTACT?

  • better wy (Score:4, Informative)

    by cinnamon colbert (732724) on Sunday December 13, 2009 @03:17PM (#30424672) Journal

    from the comments with the article
    posted by: irrational | 12/11/09 | 11:56 pm

    I do it in 5 steps, and you get rid of the book when you’re done since you don’t need to store it. After you get done putting 200 hours into your creation, you’ll have spent thousands of dollars worth of your time. I solved this problem much more quickly years ago:

    1. Buy a good sheet-fed and high-speed scanner. I have a Panasonic KV-S2026 color.
    2. Get a decent jigsaw from Home Depot. Use metal cutting blades (24 teeth/inch or better)
    3. Saw the spines off the book and for God’s sake use some C-clamps on each end of the book. Preferably sandwich them between two flat boards.
    4. Remove and feed sheets through the scanner to OmniPage and text recognize the pages.
    5. Save as PDF.
    6. Repeat. You now have searchable digital books!

  • might not be able to *write* the entire collection of Shakespeare, but with this setup, I'm quite sure that they would be able to digitize it!

  • Well, ironically (Score:3, Insightful)

    by Anonymous Coward on Sunday December 13, 2009 @03:27PM (#30424726)

    Ironically, all these books that he and others are trying to scan into a digital format where created in a digital format from the start, sitting on a publisher's computer somewhere.

    Thanks copyright laws! Thank you very little.

    • Re: (Score:3, Informative)

      Even more evil: because some students are blind or vision impaired, they need digital copies they can have their computers blow up in size on screen or read audibly to them.

      This means that every textbook HAS a doc or PDF version you can get from the publisher. As a professor I regularly get pdf versions of my text books for "disabled" students who can't afford the $95 these leeches charge for the text I use.

      I'm in the process of putting together a "text pack" that consists of short excerpts from dozens

    • Imagine how many trees we wouldn't have gotten to beat to a pulp though!
  • by milesw (91604) on Sunday December 13, 2009 @03:34PM (#30424770) Homepage
    I'm amazed at how good OCR has gotten. I did the same thing without building anything: just connected my Canon PowerShot A540 to a tripod, lay the tripod on a coffee table, put the book on the floor, and started snapping away. Fed the JPGs to ABBYY FineReader 10, and it spit out plain text that was *at least* 97-98% accurate on every page. I did not use any special lights, do not know anything about photography, and frankly thought I'd have to buy all sorts of special equipment. The only other thing I added for convenience sake was Dirk's CanoRemote [canoremote.de] so that I would not move the camera (however imperceptibly) every time I pressed the shutter.
    • Re: (Score:3, Insightful)

      by hansamurai (907719)

      I was reading about OCR accuracy in my Game Developer magazine just last night, and they were lamenting that 98% accuracy really wasn't good enough for them. I know that the difference between personal and professional use is rather wide, but they printed a few sentences with 98% accuracy and I will admit, it was distracting. Of course, if they hadn't mentioned, would I have noticed?

      • Re: (Score:3, Interesting)

        by ShooterNeo (555040)

        When you OCR the resulting PDFs from using a scanner, you use a mode that includes data from the original scan. For instance, I just use Adobe Acrobat's "clear scan" OCR mode. What it does is it OCRs the text, and uses the OCR data to sharpen the scan of the letters in the PDF document. It then downsamples all the image data to a resolution that you specify. Basically, the resulting PDF is a hybrid between an OCRed file and the original image data that was scanned in. You can easily read all of the tex

  • by paiute (550198)

    I thought about doing this several years ago to archive a huge stack of old lab notebooks, then we bought some Ricoh copiers that were also scanners with a platen large enough to scan two pages at once. I was able to turn a 300 page notebook into pdfs in about a half hour.

  • Dupe ? (Score:3, Informative)

    by eulernet (1132389) on Sunday December 13, 2009 @03:51PM (#30424960)

    The scanner was described 3 months ago in a question to Ask Slashdot:
    http://ask.slashdot.org/story/09/09/27/199251/Software-To-Flatten-a-Photographed-Book [slashdot.org]

    The answer:
    http://ask.slashdot.org/comments.pl?sid=1383895&cid=29559637 [slashdot.org]

  • I have a project that requires text recognition. I'm need to quickly identify the presence of text URLs in several thousand photographs. In the easy cases, the URL is a solid color on a contrasting background, added as a band across the top or bottom of the photo. But in the hard cases it's a partially transparent watermark across the center of the photo that may be rotated several degrees from horizontal. The good news is that the URLs all start with "http://", and I don't need the software to capture

  • by kfogel (1041) on Sunday December 13, 2009 @05:10PM (#30425580) Homepage

    See also the BookLiberator [bookliberator.com], a somewhat more compact cube-in-cradle design, that's also easy to build. Although soon you won't have to build your own: we're prototyping a manufacturable, flat-packed kit to sell from our online store; see questioncopyright.org/bookliberator [questioncopyright.org] for more about the project. It should be ready next year.

    None of which is to detract from Reetz's accomplishment, of course. This renaissance in personal book scanners is going to make it easier for all of them, in the long run, especially as we can share the same open source software among all the scanners.

  • I have a book I would LOVE to preserve digitally. I have an extremely rare and out of print book -- it doesn't have an ISBN or anything! Technically, though, I believe it is copyrighted. I would like to scan it in and OCR it into a usable format that can then be put anywhere. (PDF bitmap pages are ridiculously large!) It is "Home Again" by James Edmiston. Copyright 1955 by James Ewen Edmiston, Jr. First Edition, signed by the author. Library of Congress number 55-5265. It is a significant and import

  • by Mista2 (1093071)

    Now if only textbooks came as e-books, then this whole tech would be un necessary.

    • They do, but they cost the same and some times more than the text book. Some will have a kill date attached to the file for a small discount off the full price.
  • "It was a watershed moment when I realized getting an 8-megapixel Canon camera was cheaper than buying a bunch of textbooks."

    There in lies the real problem. Textbooks are too damn expensive and have been for many years.

    -ted

  • Something just like this setup was in a comment for an ask slashdot article -

    http://ask.slashdot.org/comments.pl?sid=1383895&cid=29559637 [slashdot.org]

  • There's plenty of people working on this at the DIY Book Scanning site [diybookscanner.org], but what they all lack... is page turning. I found this great project [youtube.com] some students came up with that is simplistic and doesn't require you to preload pages at all.

    Incorporate that, with the glass/plexi platen of the stock DIY book scanning projects, and you have a 100% complete, automatic, turn-it-on-and-walk-away book scanner from beginning to end.

  • by saccade.com (771661) on Monday December 14, 2009 @01:48AM (#30428544) Homepage Journal
    A while back I got a Fujitsu ScanSnap S510. Now when I want to scan a book, I just saw the spine off (table saw, band saw or even a steel ruler and X-acto knife will do the trick). Take the loose sheets, about 40 at a time, and put them into the ScanSnap. The ScanSnap comes with Acrobat Pro and does a fine job of making a searchable PDF file of the book. The paper? Into the recycle bin. I've cleared off several feet of shelf space.

Your own mileage may vary.

Working...