DTD settings for HTML tags inside XML tags.
Thread poster: Lorena Prieto
Lorena Prieto
Lorena Prieto
Argentina
Local time: 14:16
Spanish to English
+ ...
Apr 20, 2008

Hi! I need to prepare some XML files for translation. The client didn't provide me with the DTD files but I managed to create one myself that works ok with the XML tags. The problem is that in some XML tags there are HTML tags, like "P" and "B".

Code:



<B>George:</B> text that need translation
more text than need t... See more
Hi! I need to prepare some XML files for translation. The client didn't provide me with the DTD files but I managed to create one myself that works ok with the XML tags. The problem is that in some XML tags there are HTML tags, like "P" and "B".

Code:



<B>George:</B> text that need translation
more text than need translation here.





I'm working with TRADOS Tageditor 7. HTML tags are with "".

I've tried to modify the Tageditor settings in order to recognize them, but so far it was useless. Could you please give any tip to help me? I woud be really greatful.



Thanks!
Lorena



[Editado a las 2008-04-20 20:14]
Collapse


 
tectranslate ITS GmbH
tectranslate ITS GmbH
Local time: 19:16
German
+ ...
CDATA again? Apr 21, 2008

It should work just fine without modifying the settings UNLESS the affected tags are in CDATA sections, as described in this earlier discussion thread.

So, are they in CDATA sections in your file?

HTH
Benjamin


 
Wojciech Froelich
Wojciech Froelich  Identity Verified
Poland
Local time: 19:16
English to Polish
Seems like good reason to upgrade to SDL Trados 2007 Apr 21, 2008

Seems like good reason to upgrade to SDL Trados 2007 and use the brand new plugin that will do the trick.

 
Jerzy Czopik
Jerzy Czopik  Identity Verified
Germany
Local time: 19:16
Member (2003)
Polish to German
+ ...
Tell us more about it Apr 21, 2008

Which plug-in will do that trick?

BR
Jerzy


 
Wojciech Froelich
Wojciech Froelich  Identity Verified
Poland
Local time: 19:16
English to Polish
Not tested that yet, but opportunity will come tomorrow Apr 21, 2008

http://talisma.sdl.com/
Article # 1378
Article # 1374

The Snippet Mark-up Plug-in 1.0 allows users to convert parts of text embedded in files to internal tags so that they are not included in translation. For example, it is now possible to mark embedded HTML in CDATA elements in XML files or in Cell elements in XLS/XLSX files as internal tags, so that the embedded
... See more
http://talisma.sdl.com/
Article # 1378
Article # 1374

The Snippet Mark-up Plug-in 1.0 allows users to convert parts of text embedded in files to internal tags so that they are not included in translation. For example, it is now possible to mark embedded HTML in CDATA elements in XML files or in Cell elements in XLS/XLSX files as internal tags, so that the embedded HTML code is not treated as translatable text. For more information on the Snippet Mark-up plug-in, refer to the updated Translator's Workbench User Guide.

More detailed info tomorrow. Need some testing
Collapse


 
tectranslate ITS GmbH
tectranslate ITS GmbH
Local time: 19:16
German
+ ...
VERY interesting Apr 22, 2008

Thank you, Wojciech!

This looks very interesting. Now I remember I read something about that in the release notes of Trados 8.2. Totally forgot about it since.


 
Yann Rousselot
Yann Rousselot
France
Local time: 19:16
French to English
+ ...
HTML in CDATA tags which must be translated. May 11, 2011

Hello all, i'm reviving this thread as i have a request that is related to this, but the solution proposed below is not adequate.

I have the same issue of CDATA tags in an XML file, within these tags is HTML code that appears as-is in tageditor (that is to say, a big mess of code).

The problem is that these tags contain most of the text that we need translated (they are Flash animations).

The solution offered below gives us the opportunity to treat CDATA
... See more
Hello all, i'm reviving this thread as i have a request that is related to this, but the solution proposed below is not adequate.

I have the same issue of CDATA tags in an XML file, within these tags is HTML code that appears as-is in tageditor (that is to say, a big mess of code).

The problem is that these tags contain most of the text that we need translated (they are Flash animations).

The solution offered below gives us the opportunity to treat CDATA tag content as "not to be translated" - however what i would need is a method to have the HTML tags hidden but the text displayed as translatable material.

Hope someone can help, as our native software produces only XMLs with this issue, and translating these XMLs is a very complex and nerve-wrecking affair as you can imagine (most of our translators must go through the code and single-handedly identify the text to be translated...)

Thanks,

Yann
Collapse


 
RWS Community
RWS Community
United Kingdom
Local time: 19:16
English
Can you use Studio 2009... May 11, 2011

... for this? It's quite straightforward to handle this in Studio and I'd be happy to explain the process for you if you can.

Regards

Paul


 
Adam Łobatiuk
Adam Łobatiuk  Identity Verified
Poland
Local time: 19:16
Member (2009)
English to Polish
+ ...
@Paul May 11, 2011

SDL Support wrote:

... for this? It's quite straightforward to handle this in Studio and I'd be happy to explain the process for you if you can.

Regards

Paul


Paul, could you do so anyway? I've been trying without much success to figure that out in Studio (I did read the Help) and stuck with Tag Editor instead. My xml files look like this:

< quote >< p >Translatable text.< /p >< /quote >

[The P tags are in lt/gt entities, which don't display here].

Thanks a lot in advance.



[Edited at 2011-05-11 19:36 GMT]

[Edited at 2011-05-11 19:36 GMT]


 
RWS Community
RWS Community
United Kingdom
Local time: 19:16
English
Sure... can you give me a file... May 11, 2011

... and I can use that to show you? Or maybe a part of the file with amended text if you like and I can post the process here.

Regards

Paul
[email protected]


 
RWS Community
RWS Community
United Kingdom
Local time: 19:16
English
How to create an XML file with embedded content May 12, 2011

Hi Adam,

ok - thanks for the file snippet. I turned it into an XML file with a few extra lines but the content we are concerned about is basically the same. So, this is what I'm starting with:


Then lots of lines like this:


The intere
... See more
Hi Adam,

ok - thanks for the file snippet. I turned it into an XML file with a few extra lines but the content we are concerned about is basically the same. So, this is what I'm starting with:


Then lots of lines like this:


The interesting thing about this file is that it is not really HTML in the XML as Loreno originally asked, it is all text using entities. So we need an extra step to handle this (thanks Patrik who helped me a lot this morning to understand this concept). But I'll cover this as I go through the steps. Here's what I did.

Create a new XML Filetype as follow:

1. Tools - Options - Filetypes - New
2. Select XML as the type and then work through the Wizard adding info as follows:

In this first screen I only completed two fields. The second one is useful because this allows you to make sure the correct filetype is being used in the Editor View in Studio:


In this one I just select create based on default settings. I could import the test file but this is easier for the info I have from you:


Next I add a few parser rules based on the content of your XML. Note that I started with a non-translatable rule using XPATH of //*. This prevents anything from being parsed at all, and then I just add what I actually want to translate. This is a neater approach to adding every element and then deciding what you want done (I think so anyway):


Next I add the root element of the file so that the file is recognised and the correct filetype is used:


Then the final step is to save the filetype. That's it, and now I have a new filetype like this:


When I open the XML now just to check that all is well I see this and you can see that all the entities are automatically converted in the Editor, not are not tags yet, but note that I can see that Adamfiletype has been used because if I display TagID instead of No/Partial/Full Tags, then the Filetype Identifier we used in the first wizard is displayed in the Orange Tab at the top of the translation grid:


So, to make them tags I need to use the Embedded Content window in the FileType Options for the new Adamfiletype. If I look at this option you see that I need to be able to add document structure information that I want to apply an embedded content rule to:


As all the text with these entities in them seem to live in one element - <html-text/> - I'll just give this one some additional information in the parser rules. So, go back to this option and edit the parser rule in the structure info box:


I click on Add in the next window and then select a predefined standard field... actually I used "Field" (it could be something else if you like)


I then click OK three times and my parser rules should now look like this:


Now I can go back to the embedded content option and tell it I want to use information parsed from rules with the "Field" structure. When you select "Field" it will look like "Field", but as it's a standard field, as opposed to a custom one we append sdl: to it.


Now I can define what tags I want and how they should be represented. So, I could do these individually, and you can play with this if you like, but to make it easy I use a single regular expression to find all likely tags and convert them to placeholders (I hope you can read this on the screenshot):


When I open the XML in the Editor now I see something like this:


What is also interesting is that at the start of your file you had what looked like non-translatable material in the same <html-text/> element. If I look at the start of the file these are all gone:


I can use the display filter to show all content (which also displays hidden external tags) and I can see that this information is now safely moved outside the translation so I don't need to worry about it at all:


Almost forgot a bit, sorry. Your file is quite a poorly structured file so there is another step you need to address to make sure the target file is correct. When you come to save the target file you need to make sure that any entities that have not been escaped correctly in the original XML file, such as "" and ' that are in your file, are not converted incorrectly. So you go the Entity Conversion and make sure that Entity Conversion in checked and then click on Add:


Then uncheck the two characters that have not been escaped correctly in your file under the writer settings:


I hope you can follow this explanation... but if you have any questions post them here.

Regards

Paul



[Edited at 2011-05-12 08:30 GMT]
Collapse


 
Adam Łobatiuk
Adam Łobatiuk  Identity Verified
Poland
Local time: 19:16
Member (2009)
English to Polish
+ ...
Thanks! May 12, 2011

Thanks a lot, Paul! That's brilliant and very informative!

 
RWS Community
RWS Community
United Kingdom
Local time: 19:16
English
You're welcome... May 14, 2011

... I wonder if it helped Yann? Apologies for stealing your original question.

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

DTD settings for HTML tags inside XML tags.







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search