This site uses cookies.
Some of these cookies are essential to the operation of the site,
while others help to improve your experience by providing insights into how the site is being used.
For more information, please see the ProZ.com privacy policy.
Transit NXT translation memories w/ different number of source/target segments
Thread poster: Gary Hess
Gary Hess Local time: 00:15 German to English + ...
Mar 16, 2023
I am trying to create a custom QA tool for my own use and was analyzing some .DEU and .ENG files. Sometimes there is a mismatch between the source and target files, e.g. source file has 30 segments and target file has 31 segments. I would assume that some other translator during the translation process split 1 segment into 2 segments (that would explain the discrepancy).
I have a technical question: How does Transit NXT know which segments belong to one another? I have looked at the... See more
I am trying to create a custom QA tool for my own use and was analyzing some .DEU and .ENG files. Sometimes there is a mismatch between the source and target files, e.g. source file has 30 segments and target file has 31 segments. I would assume that some other translator during the translation process split 1 segment into 2 segments (that would explain the discrepancy).
I have a technical question: How does Transit NXT know which segments belong to one another? I have looked at the XML quite a bit, but I can't figure it out yet.
BTW: I loaded a pair of mismatched .DEU and .ENG files into XBench, but Xbench doesn't correctly align the mismatched segments either.
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
wotswot France Local time: 00:15 Member (2011) French to English
Misaligned language pairs
Mar 16, 2023
What I do is open the two files in two separate windows of a powerful text editor (like Notepad ++), place them side by side then find and delete the offending segment.
Segment lines begin with where n is a number, and end with .
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
wotswot France Local time: 00:15 Member (2011) French to English
Follow-up to my previous message
Mar 16, 2023
Segment lines begin with Seg SegID=n (where n is the segment's number) and end with /Seg
[Edited at 2023-03-16 16:19 GMT]
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Gerald Dennett United Kingdom Local time: 23:15 German to English + ...
Re-align
Mar 16, 2023
You need to perform an alignment on the offending pair of files. Otherwise the pair will be ignored in any TM.
Gerald
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Gary Hess Local time: 00:15 German to English + ...
TOPIC STARTER
How to do this automatically?
Mar 16, 2023
I should have said that I want to write a program to recognize and interpret the mismatch automatically. I can edit the file manually, but there must be something inside the XML that points to the correct alignment. I want to figure out how Transit NXT manages this misalignment.
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Hans Lenting Netherlands Member (2006) German to Dutch
Reverse engineering
Mar 17, 2023
Gary Hess wrote:
I should have said that I want to write a program to recognize and interpret the mismatch automatically. I can edit the file manually, but there must be something inside the XML that points to the correct alignment. I want to figure out how Transit NXT manages this misalignment.
Did you already create a project with one segment and split this segment, to see what happens in the xml? Silly question perhaps, since you seem to know how to write code...
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Gary Hess Local time: 00:15 German to English + ...
TOPIC STARTER
Maybe it's really an error...
Mar 17, 2023
I tried your idea on a project (joining and splitting some segments to look at the results). The number of segments is actually never mismatched after these steps. So maybe the files in question do indeed have an error.
Thanks for all the suggestions!
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations.
Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.