• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Parsing word documents in .Net 2

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    #11
    Originally posted by vetran
    Sharepoint, rotating not reinventing the wheel!
    Rotating it slowly with lots of memory...
    Serving religion with the contempt it deserves...

    Comment


      #12
      Originally posted by vetran
      if you are talking about properties such as title etc & custom properties they will go straight into a sharepoint list and autofill the columns.

      They can be added offline using Colligio Contributor or Digilink revelation.

      Security is taken care of and you will be able to full text search if you use full SQL server as the back end.


      Sharepoint, rotating not reinventing the wheel!

      Sharepoint is not an option - I need to take a document that has been written in Word by people who refuse to use anything else, and magically turn it into website content that can, and will, be displayed in many different ways...

      My current plan is to combine a schema based template to 'encourage' them to follow certain guidelines - such as 'title' rather than simply selecting text and making it bold and 18pt, and creating an add in to Word that parses the document and outputs XML with tags that my import procedures can use to dissect the document into the relevant persistable objects. I am basically 50% there, I just need to extract images, store them and replace them with references to the correct imageID that is then rendered as and when neccessary by the website.
      Vieze Oude Man

      Comment


        #13
        Originally posted by mcquiggd
        Sharepoint is not an option - I need to take a document that has been written in Word by people who refuse to use anything else, and magically turn it into website content that can, and will, be displayed in many different ways...
        Create in list using automatic doc template (new document), using office 2003, automagically save in a list.Even our salesmen can manage it.

        memory is cheap!

        Comment


          #14
          Originally posted by vetran
          Create in list using automatic doc template (new document), using office 2003, automagically save in a list.Even our salesmen can manage it.

          memory is cheap!

          Well, I have found a very suitable solution, based on a Codeproject article, which includes a template file, a toolbar to insert styles, and an XSLT that is applied to the absolutely ridiculous Word 'XML' format, that manages to make sense of it by throwing 90+% of it away.

          Now the word document is supplied to an editor type person, who clicks a button added to their standard toolbar, which adds a new word template and toolbar to the new document, that allows formatting with embedded XML tags, that in turn will allow server based processing of documents, and a schema that validates the document as it is altered. Effectively the editor now takes any old word document, selects and applies predefined xml tags to the content, presses a toolbar button, and an XML file is generated that can be uploaded to the server-based application where it processed. It is quite neat - the original authors work is here: http://www.codeproject.com/soap/Word...leTemplate.asp

          And all credit to him.

          The last step I need to cover is extracting images embedded within the XML into the database (they are small images).
          Last edited by mcquiggd; 22 August 2006, 22:31.
          Vieze Oude Man

          Comment


            #15
            Originally posted by mcquiggd

            The last step I need to cover is extracting images embedded within the XML into the database (they are small images).
            I believe John Reid is going to sort it out. After a period of consultation.

            Comment


              #16
              You are indeed correct, Mr P.

              I have asked him, and he said he will get back to me, but that he felt it was an important issue, and it must not be judged on past failures. Execept, the tories made it more difficult, and he urges me to celebrate the differences between the bizarre output from Word, and any sane XML format. In fact, under new labour, Word's XML format will be taught as part of the national curriculum, in tandem with XML that supports 160 different langauge enhancements and encompasses all religions and ethnic origins, including the new processing instruction 'explode near people'.
              Vieze Oude Man

              Comment

              Working...
              X