Using Xpath with LibOPC

Sep 26, 2012 at 9:27 AM

Hi,

I'm a bit fuzzy about what parts of libxml2 I can and can't use with LibOPC.

I want to be able to query the styles part using an xpath expression to pull out particular styles as I hit them when parsing the /word/document.xml part. 

I'm using Qt, and my current thoughts are that I can use mceTextReaderDump to dump /word/styles.xml into a xmlBuffer which I can then use to instantiate a QByteArray and use Qt's QXmlQuery (xquery) to pull things out, but it will do a deep copy of the data when it initialises the QByteArray and libxml2 already has code for doing xpath queries.

So, it would be helpful if someone could describe how I would go about using the xpath module of libxml2 to run a query on a part I've pulled out using opcXmlReaderOpen?

Thanks for the help,

Gemmell

Sep 26, 2012 at 12:32 PM
Edited Sep 26, 2012 at 12:33 PM

Following the simple examples here I need to do this:

xpathCtx = xmlXPathNewContext(doc);

Which takes a:

xmlDocPtr 

Thus far I haven't worked out how I get a xmlDocPtr when opening a part, so any tips would be much appreciated.

Sep 26, 2012 at 12:34 PM

I obviously didn't look hard enough. Just found opcXmlReaderReadDoc

Sep 26, 2012 at 1:47 PM
Edited Sep 27, 2012 at 12:36 AM

The next question in my monologue is whether I can do an xquery and then create an mce reader to parse the results... or is that a silly concept since we've already parsed everything using opcXmlReaderReadDoc? I guess what I'm trying to do is to be able to use the mce_start_element() style macro's as they're quite convenient for picking out bits that you want.

Sep 27, 2012 at 1:32 AM

Damn. I should have checked this first. From config\win32-msvc\libxml2-2.7.7\libxml\xmlversion.h

/**
 * LIBXML_XPATH_ENABLED:
 *
 * Whether XPath is configured in
 */
#if 0
#define LIBXML_XPATH_ENABLED
#endif
i.e. It's not available in the Windows Visual Studio build. Guess I'm going to have to dump it to memory and copy it into QByteArray to use Qt's xquery....

Sep 27, 2012 at 2:11 AM
Edited Sep 27, 2012 at 2:23 AM

So the next thing I'm trying to work out is whether I can get mceTextReaderDump to use a xmlTextWriter to write directly into a QByteArray/String via the callback that gets called to write. Can I give xmlOutputBufferCreateIO a null ioctx and just use the callbacks to populate a QBuffer/QByteArray/QString?

... <time passes> ...

The answer looks like a yes. The context seems to only be used to pass through to the two callbacks anyway. I'll give it a try.

Sep 27, 2012 at 4:56 AM
Edited Sep 27, 2012 at 4:58 AM

For anyone interested, this works:

 

#include "OpcPartDumper.hpp"
#include <DesignByContract.hpp>

#include <QDebug>

static int  WriteToBuffer(void* context, const char* charString, int len) 
{
   REQUIRE(context != nullptr);
   QBuffer* buffer = static_cast<QBuffer*>(context);   
   quint64 numBytesWritten = buffer->write(charString, static_cast<quint64>(len));
   return static_cast<int>(numBytesWritten); //This is plenty safe as we get an int in anyway.
}

static int CloseBuffer(void* context) 
{
   REQUIRE(context != nullptr);
   QBuffer* buffer = static_cast<QBuffer*>(context);
   buffer->close();
   return 0;
}


// Note: This should potentially be using the readDoc API rather than the reader and writer (efficiencies maybe?) 
// The text below is from the opc examples and shows how it would be done (it's actually simpler):
// xmlDocPtr doc=opcXmlReaderReadDoc(c, part, NULL, NULL, 0);
// if (NULL!=doc) {
//   xmlSaveCtxtPtr save=xmlSaveToIO(xmlOutputWrite, xmlOutputClose, out, NULL, XML_SAVE_FORMAT | XML_SAVE_NO_DECL);
//   if (NULL!=save) {
//      xmlSaveDoc(save, doc);
//      xmlSaveClose(save);
//   }
//   xmlFreeDoc(doc);
//  }

bool OpcPartDumper::DumpPartToBuffer(opcContainer& opcContainer, const QString& partName, QBuffer& bufferToFill)
{
   bool result = false;
   bool opened = bufferToFill.open(QIODevice::WriteOnly);
   if (opened == true)
   {
      // Setup the callbacks, passing a pointer to the QBuffer through the void* ctx pointer.
      xmlOutputBuffer* outBuffer = xmlOutputBufferCreateIO(WriteToBuffer, CloseBuffer, &bufferToFill, NULL);
      if (outBuffer != nullptr) 
      {
         // Create a text writer, find the part, open a reader on it and dump from the reader to the 
         // writer. The callbacks do the work of writing out to the QBuffer.
         xmlTextWriter* writer = xmlNewTextWriter(outBuffer);
         DBC_ASSERT(writer != nullptr);

         opcPart part = opcPartFind(&opcContainer, _X(partName.toUtf8().constData()), NULL, 0);
         if (part != OPC_PART_INVALID)
         {
            mceTextReader_t reader;
            opc_error_t openResult = opcXmlReaderOpen(&opcContainer, &reader, part, NULL, NULL, 0);
            if (openResult == OPC_ERROR_NONE)
            {
               int dumpResult = mceTextReaderDump(&reader, writer, PTRUE);

               if (dumpResult != -1)
               {
                  // It successfully dumped
                  result = true;
               }
               else
               {
                  qWarning() << "There was an error dumping a part, error number: " << mceTextReaderGetError(&reader);
               }
               mceTextReaderCleanup(&reader);
            }
            else
            {
               qWarning() << "DumpPartToBuffer: Couldn't open an opc reader for part " << partName;
            }            
         } 
         else
         {
            qWarning() << "DumpPartToBuffer: Couldn't find part " << partName << " in opc container.";
         }  

         // Clean up the writer
         xmlFreeTextWriter(writer);
      }
      else
      {
         qWarning() << "DumpPartToBuffer: Failed to create an xml output buffer";
      }
   } 
   else
   {
      qWarning() << "DumpPartToBuffer: Couldn't open buffer for writing: " << bufferToFill.errorString();
   }
   return result;
}

Feedback welcome (especially if you can tell me if it's better to just readDoc style api instead).

Sep 28, 2012 at 6:25 AM
Edited Sep 28, 2012 at 8:01 AM

This also works (note I'm now dumping to a QString as the QtXmlQuery was going to do a copy of the QByteArray anyway) and keeps the xml tag including the encoding. It is a double copy though as it's dumped into a buf then a QString.

bool OpcPartDumper::DumpPartToBuffer(opcContainer& opcContainer, const QString& partName, QString& bufferToFill)
{
   bool result = false;

   opcPart part = opcPartFind(&opcContainer, _X(partName.toUtf8().constData()), NULL, 0);
   if (part != OPC_PART_INVALID)
   {
      opcContainerInputStream* stream = opcContainerOpenInputStream(&opcContainer, part);
      if (stream != nullptr)
      {
         opc_uint8_t buf[4096]; // Read it in in 4k chunks (usually one page).
         opc_uint32_t len = 0;
         while((len = opcContainerReadInputStream(stream, buf, sizeof(buf))) > 0) 
         {
            bufferToFill += QString::fromUtf8(reinterpret_cast<const char*>(buf), len);
         }
         result = true;
         opcContainerCloseInputStream(stream);
      }
      else
      {
         qWarning() << "DumpPartToBuffer: Couldn't open an opc reader for part " << partName;
      }            
   } 
   else
   {
      qWarning() << "DumpPartToBuffer: Couldn't find part " << partName << " in opc container.";
   }  

   return result;
}

Oct 5, 2012 at 1:42 AM

So I ended up dumping it to a QBuffer, and loading it into a DOM tree. Using xquery is a very slow way of pulling out the styles, and I've come across a document with an 11,000 line numbering.xml part - each query on that takes more than a second (it would have to parse the whole doc each query). Much quicker to use DOM and a hash from id to DOM element.