Pages in topic:   [1 2] >
Any free aligners out there?
Thread poster: Samuel Murray
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 05:18
Member (2006)
English to Afrikaans
+ ...
Nov 1, 2010

G'day everyone

I'd like to make a list of free aligners that translators with CAT tools can use to combine source and target texts into translation memories. I'm aware of these free aligners:

1. Andrew Manson's TextAlign
2. Wordfast's PlusTools
3. CTexT's Alignment Interface Pro
4. Susana Santos Antón's Bitext2TMX

Do you know of any other free aligners?

What I mean by "aligner" is a program that will segment two files and pr
... See more
G'day everyone

I'd like to make a list of free aligners that translators with CAT tools can use to combine source and target texts into translation memories. I'm aware of these free aligners:

1. Andrew Manson's TextAlign
2. Wordfast's PlusTools
3. CTexT's Alignment Interface Pro
4. Susana Santos Antón's Bitext2TMX

Do you know of any other free aligners?

What I mean by "aligner" is a program that will segment two files and present them to the user in a unified way so that the user can check if the segments were correctly aligned, and make corrections to misalignments if any, and then save it as a translation memory in TMX format or in a format that can easily and freely be converted to TMX. Programs that require the user to pre-segment his files or to correct misalignments in separate programs aren't really aligners for my purpose.

If there are demo CAT tools or free CAT tools with built-in aligners that can be used freely, without restriction, then I'd be happy to hear about those too. In fact, I don't mind hearing about non-free aligners, as long as I can test them (so Stingray would be on that second list, even though it isn't free, because it is fully functional for 30 days, but ABBYY's aligner wouldn't be because it only aligns the first 50 lines of text, and besides, it requires the user to correct misalignments in a separate program that is not supplied by ABBYY).

Thanks
Samuel
Collapse


 
Selcuk Akyuz
Selcuk Akyuz  Identity Verified
Türkiye
Local time: 06:18
English to Turkish
+ ...
Trans Suite 2000 Align Nov 1, 2010

AFAIK, it is free.

http://arm.proz.com/forum/sdl_trados_support/94240-is_there_a_programme_purely_for_alignment.html


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 05:18
English to Hungarian
+ ...
Here's mine Nov 1, 2010

I have dropped the link here a couple of times here, but since you asked, here it comes again:

I wrote this free, open source aligner using the very smart hunalign autoalignment algorithm:
sourceforge.net/projects/aligner

It's focused on efficiency and handling large amounts of data well, not on being foolproof and providing a nice, flashy UI. You get a good
... See more
I have dropped the link here a couple of times here, but since you asked, here it comes again:

I wrote this free, open source aligner using the very smart hunalign autoalignment algorithm:
sourceforge.net/projects/aligner

It's focused on efficiency and handling large amounts of data well, not on being foolproof and providing a nice, flashy UI. You get a good autoaligner, fairly deep customization and support for crazy file sizes like 400,000 segments in a single file, but you don't get a GUI.

Note: keep an eye on the sourceforge page and this forum, as a much more advanced & user friendly version is coming in a few days or weeks. It'll still have a command line interface, but I have medium-term plans to build a GUI for it.

Features include:
- Input: txt, HTML and web URL (new version will add pdf and docx)
- Output: tab delimited txt and TMX (new version will add xls)
- Download, convert and align web pages or EU legislation
- Works on files of any size
- Aligns 3 or even 4 documents for multilingual projects, creating a 3- or 4-column table
- Supports all UTF-8 characters
- Runs on MS Windows (new verison will add mac and linux support)

[Edited at 2010-11-01 14:54 GMT]
Collapse


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 04:18
Member (2009)
Dutch to English
+ ...
Here are some: Nov 1, 2010

- Aligner.bat by FarkasAndras (uses Hunalign to create TMs from earlier translations or other bilingual texts. Output: tab delimited txt and TMX. (http://sourceforge.net/projects/aligner/)
... See more
- Aligner.bat by FarkasAndras (uses Hunalign to create TMs from earlier translations or other bilingual texts. Output: tab delimited txt and TMX. (http://sourceforge.net/projects/aligner/)

- Okapi Framework (http://www.opentag.com/okapi/wiki/)

- Wordfisher aligner (http://wordfisher.com/)

- aligner.py ~ A simple Python script for creating a TMX file from two texts. Written by Dmitri Gabinski. (http://www.omegat.org/resources/aligner.zip)

- bligner.py ~ A simple Python script for creating a TMX file from two texts. Written by Didier Briel. (http://www.omegat.org/resources/bligner041.zip)

- Uplug ~ Uplug is a collection of tools for linguistic corpus processing, word alignment and term extraction from parallel corpora. (http://www.let.rug.nl/~tiedeman/Uplug/)

- GIZA++ (http://www-i6.informatik.rwth-aachen.de/Colleagues/och/software/GIZA%20%20.html)

- YouAlign (online) http://www.youalign.com/

Michael



p.s. Here is my own list of useful (and often free) stuff for translators using Windows: http://beijer.mx/computer.html

[Edited at 2010-11-01 15:18 GMT]
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 05:18
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
@Michael -- thanks for that, but... Nov 1, 2010

Michael J.W. Beijer wrote:
Aligner.bat by FarkasAndras (uses Hunalign to create TMs from earlier translations or other bilingual texts. Output: tab delimited txt and TMX. (http://sourceforge.net/projects/aligner/)


Aligner.bat attempts to align the two files, but if there are mismatches, well, the user guide gives instructions about how to fix misalignment errors. The instructions are... copy the files' content to Excel in two columns and fix the misalignment in Excel. Once the user has corrected misalignments in Excel (or in whatever other program), the TMX_creator.bat can be used to create the TM.

YouAlign (online) http://www.youalign.com/


This tool requires that the user pre-segment the text and pre-align the segments in some other program (e.g. two text editors open side by side, perhaps) before uploading the files. There is no interactive fixing of misalignments. The tool shows a two-column preview of what has been created, but the user can't fix things that didn't align correctly.



The web site says:
Text units from the source and target documents must be perfectly synchronized (aligned). For example, if the source document has more text units than the target document an error will be generated.
http://www.opentag.com/okapi/wiki/index.php?title=Sentence_Alignment_Step

In other words, same as ABBYY and YouAlign... it's not really an aligner. Aligner.bat is half an aligner because it allows the user to fix misalignments in mid-process, but... the user has to fix misalignments in a separate, third-party program with much copy/pasting.


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 05:18
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
@Michael II Nov 1, 2010

Michael J.W. Beijer wrote:
- aligner.py ~ A simple Python script for creating a TMX file from two texts. Written by Dmitri Gabinski. (http://www.omegat.org/resources/aligner.zip)
- bligner.py ~ A simple Python script for creating a TMX file from two texts. Written by Didier Briel. (http://www.omegat.org/resources/bligner041.zip)


No interactive misalignment fixing in either of these two. You have to fix misalignments in two instances of Notepad side by side (or in Excel, or in a word processor with a table function, or...).

- Uplug ~ Uplug is a collection of tools for linguistic corpus processing, word alignment and term extraction from parallel corpora. (http://www.let.rug.nl/~tiedeman/Uplug/ )


Have you actually used this tool? I get the impression from the help files that it also presupposes that the two input files already match perfectly before it "aligns" the two files.



I can't figure out how to use this aligner. Got any ideas?


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 05:18
English to Hungarian
+ ...
Correcting misalignments Nov 1, 2010

Samuel Murray wrote:

Michael J.W. Beijer wrote:
Aligner.bat by FarkasAndras (uses Hunalign to create TMs from earlier translations or other bilingual texts. Output: tab delimited txt and TMX. (http://sourceforge.net/projects/aligner/)


Aligner.bat attempts to align the two files, but if there are mismatches, well, the user guide gives instructions about how to fix misalignment errors. The instructions are... copy the files' content to Excel in two columns and fix the misalignment in Excel. Once the user has corrected misalignments in Excel (or in whatever other program), the TMX_creator.bat can be used to create the TM.

Correct. I don't see the problem, though.
Producing a very good autoalignment and letting you correct any errors is the most any aligner will ever be able to do for you, and a spreadsheet program provides a pretty good UI for correcting misalignments in autoaligned text. Of course Excel is not perfect for the purpose, but it's reasonably good. You can also use Plustools if that's your thing, the latest release has instructions and a dummy file needed to start Plustools. I tried Plustools myself, but I ended up sticking with Excel.

BTW the new release generates nicely formatted xls files for you and opens them automatically to make the process a bit more seamless. Again, you can still use Plustools or any other aligner GUI that can be coaxed into taking presegmented input and not messing with it.

For obvious reasons, I don't want to devote massive amounts of time to creating a UI for correcting misalignments. The idea here is to generate good enough autoalignments so that you don't need to spend much time (if any) correcting errors manually.
E.g. if I'm doing a one-day translation job and I come across a couple of thousand segments of parallel text related to the text I'm working on, I autoalign it and drop it in a reference memory without any manual corrections. 2 minutes and I'm back to translating. Then if a misaligned segment comes up in a concordance search, I can always go back to the raw tab delimited file and find the real match (or just use right click/See context in Xbench). Doing that with 20 misaligned sentences still takes a lot less time than manually aligning/reviewing several thousand segments. Of course this approach doesn't work in every situation, but this change in approach allows you to use a vastly increased range of reference material more efficiently.

You believe that autoaligners that don't allow manual intervention don't deserve to be called aligners. On the other hand, I believe that aligners that don't do autoalignment don't deserve to be called aligners... they are just segmenters and editing GUIs, they don't do any aligning.
Autoalignment algorithms are pretty smart. Try them out and you'll see. If you use aligner.bat, give it a good glossary in your language pair to improve the performance and you'll see 95-99% correct matches on good input material.

[Edited at 2010-11-01 15:55 GMT]


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 05:18
English to Hungarian
+ ...
Others Nov 1, 2010

Just for the sake of completeness, here are a couple of other free aligners:

maligna gets a bit more than one download a day on sourceforge - about as much as aligner.bat... it shows that some people do use it.

The
... See more
Just for the sake of completeness, here are a couple of other free aligners:

maligna gets a bit more than one download a day on sourceforge - about as much as aligner.bat... it shows that some people do use it.

The tools used for making the europarl corpus are also public

Microsoft also published an aligner written in perl

And here's Yves Champollion's aligner

I don't use any of them, and they are all even further from a "consumer-oriented" solution than aligner.bat. They were mostly made for corpus building, and I think hunalign pretty much made them all obsolete. They are mostly of interest for programmers, perhaps for corpus builders.

Also, here's Tag aligner, built specifically to align html, xml and other tagged files. It uses the tag structure to complement the good old length-based (Gale-Church) method for more accurate results. A pretty smart concept, and I'm considering integrating it into my project.

[Edited at 2010-11-01 17:33 GMT]
Collapse


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 04:18
Member (2009)
Dutch to English
+ ...
@FarkasAndras Nov 1, 2010

Hey, I know it's not free, but the aligner in the new memoQ 4.5 has been greatly improved, and uses Structural Alignment like you mentioned in reference to Tag aligner.

So far this seems to be the best and easiest way to quickly align large amounts of data. Although I still haven't had time to properly check out your Aligner.bat ...

Michael


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 05:18
English to Hungarian
+ ...
Aligner.bat Nov 1, 2010

Michael J.W. Beijer wrote:

Although I still haven't had time to properly check out your Aligner.bat ...

Go ahead and try it if you're interested - but be prepared for the lack of a GUI. Send feedback here in a new thread or through my profile.
That said, if you're not in a hurry, I'd recommend waiting for the next version. It's a complete rewrite and it's a lot more powerful and quite a bit easier to use than aligner.bat.

Samuel: GIZA++ is an industrial-strength tool for corpus building. I have heard it mentioned in quite a few places so I'm sure it's good at what it does, but I never used it. From what I know, it's a staple of the more basic MT teaching systems. Not sure how usable it is for mere mortals, and it seems to do word alignment, not sentence alignment. Here's what a quick googling turns up:
http://wiki.apertium.org/wiki/Using_GIZA++

It starts with "Download your corpus, and convert into one sentence per line." so it may not be up your street.

[Edited at 2010-11-01 20:00 GMT]


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 05:18
English to Hungarian
+ ...
MemoQ Nov 1, 2010

Michael J.W. Beijer wrote:

Hey, I know it's not free, but the aligner in the new memoQ 4.5 has been greatly improved, and uses Structural Alignment like you mentioned in reference to Tag aligner.

That sounds good and the whole integrated search concept sounds even better. (The idea is to add reference material to your project and have the CAT itself search your autoaligned texts and even monolingual texts.)
I myself don't use MemoQ so it's of no interest to me, but it's good to see some meaningful innovation going on. Perhaps SDL will once make a usable aligner, too...


 
Jorge Payan
Jorge Payan  Identity Verified
Colombia
Local time: 22:18
Member (2002)
German to Spanish
+ ...
Key for TRANS Suite 2000 Align Nov 1, 2010

Just if somebody is really interested in this program, which has a nice GUI and is very user friendly (and a bit outdated), I have the key that was freely given by Cypersoft, before disappearing.

I have understood that this key is not included in the download the link Selcuk informed points to.

Just drop me a line to [email protected]

Saludos


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 05:18
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
Key to TS2000, and more... Nov 2, 2010

Jorge Payan wrote:
Just if somebody is really interested in this program, which has a nice GUI and is very user friendly (and a bit outdated), I have the key that was freely given by Cypersoft, before disappearing.


Yes, they key is necessary to save the alignment. You can align freely in the program but you can't export a TM unless you have a license for the product. There are conflicting reports about just how freely the free key is that Cypresoft gave out before it closed shop -- some say that this key was only meant for people who already had licenses and who needed to relicense upon reinstallation.

The TS2000 aligner is very cumbersome but it has some nice features (which could have been greatly improved upon). I think the TS2000 aligner was thought to be great, at a time when the only other free aligners were the Wordfisher aligner and the old PlusTools aligner (the old two-document version of it), both of which were really, really primitive and required a bit of computer skill to use.

TS2000's aligner and CTexT's aligner both work on the principle that the user has to click *every* segment and link them together. Any segments not linked together (using the mouse, of all input devices) are not added to the TM. TS2000 has a special block processing feature, though, in which one can identify blocks of text that should be auto-linked by the program (although this still takes quite a few mouse clicks to perform).

Bitext2TMX and the PlusTools aligner will add all paired segments to the TM (so if you won't want a segment in the TM, simply pair it with an empty cell (or delete it)).


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 05:18
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
Yves Champollion Nov 2, 2010



I don't think this is Yves' aligner.


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 05:18
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
About GIZA Nov 2, 2010

FarkasAndras wrote:
GIZA++ is an industrial-strength tool for corpus building. ... [The info page] starts with "Download your corpus, and convert into one sentence per line." so it may not be up your street.


Actually, it starts with "Download and compile GIZA++.", so it's definitely not up my street

I didn't say so specifically but I'm really looking for tools that were designed for translators who actually wanted to be translators (not translators who actually wanted to be programmers).

As for converting text to "one sentence per line", that is not a problem -- many CAT tools can do that when they do text extraction. Wordfast Classic's Extract feature produces a file with one segment per line.

I've been bugging the OmegaT guys to put such a feature into their program too, but at present the only way to extract text in segmented format in OmegaT is on a file-by-file basis by selecting all text in the edit pane and copy/pasting it.


 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Any free aligners out there?







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »