arimareiji wrote: ↑Mon Apr 09, 2018 1:08 pm
It could be worse. English doesn't even use them, so I pity the machine translator that has to figure out whether I read the story (past tense) or read the story (present tense). (^_~)
That's a whole level of complexity beyond what I'm asking for. All I'm saying is that when I type "coir" on the left side of Google translate (irish -> english) it should give me
both "crime" (the gloss of "coir" that showed up in a few peoples' early guesses this thread) and "fair" (the gloss of "cóir") (and also glosses of any other words involving the same letters but different diacritics), as opposed to assuming that I have any clue how to enter said diacritics at my keyboard (or could be arsed to if I did know). That's trivial to implement (store all the words in your dict with an associated "diacritic-stripped" version, and use that stripped version for matching the word typed in by the user) and doesn't require any kind of contextual information (like the problem you mention above would).
Out of curiosity I went to Google's Chinese-English translator and typed "ni hao" (in roman letters, not characters). While it still pretended not to know what I was asking for (it gives "ni hao" on the English side, just as I originally got "coir" on the right side when searching the Irish word), it did at least give a "Did you mean <characters>?" below the roman I'd typed. I consider that an acceptable solution too (if I type "coir" give me all the irish words that use those letters, with appropriate diacritics added). That seems more helpful than assuming the user knows all the complexities of the language's writing system
and has a suitable keyboard set up for entering said complexities. (Again, I wouldn't have been able to do
any of this if I hadn't been able to just cut-and-paste the diacriticed version from the transcript, so All Cookies be Unto paarfi on this one.)
Enough off-topic stupidity from me though... Yowza. Yakugashi's enthusiastic "hey, don't like that version of the story, I can do something totally different!" in the last panel immediately made me think, hmm, not a lot of authorial commitment to whatever initial idea she had.
I mean there's a big gap between accepting constructive editorial criticism and completely ripping out the guts of said story based on not much more than flinching and foreign-language-muttering by said editor.
It does nothing to curb my fears from previous strips that while Yaku may have the
power to develop a new story for Miho (I am still calling her "magic-weaver Yaku" in my head
), she doesn't (so far) project the air of someone with enough maturity to have said story do what Miho needs it to do (whatever that may be)... or remain particularly coherent for that matter. ("Come on, I said to choose
ten things from this list of Stuff That Would Totally be Epic, don't stop now!!!")
EDIT:
iffy wrote: ↑Mon Apr 09, 2018 4:20 pm
Maybe it's not time to go anywhere, but to arrange Yuki to show up to round out the set of three.
With One Asako to Rule Them All, and in the Darkness Bind Them.
Holy shit iffy, you can be really scary when you want to be.
EDIT2:
arimareiji, below wrote:The permutations of possible misspellings in even a short passage of text would be way too much to handle.
Sorry if I was unclear, but I am specifically talking about handling (ignoring, really) diacritics,
not arbitrary misspellings. As I said, that's trivially handled in a way that scales linearly as the size of the dictionary; you don't need to worry about "permutations". (Your search is still O(n), not O(n^2) or whathaveyou.)
Right now their dictionary has something morally equivalent to (mostly perl-ish, but I'm using shorthand pseudocode not true unicode for the diacritic crap):
Code: Select all
my %eng_of_irish = (
...
'coir' => 'crime',
'c{SMALL O WITH ACUTE ACCENT}ir' => 'fair',
...
);
(or maybe it's o followed by {COMBINING ACUTE ACCENT}, who cares). So I type in 'coir' and somewhere there's a
Code: Select all
my $result is $eng_of_irish{$entered}
which spits back "crime", and I think, hmm, that doesn't sound right. What I'm saying is there should instead be:
Code: Select all
my %eng_of_irish = (
...
'coir' => {'no_dia' => 'coir', 'gloss' => 'crime'},
'c{SMALL O WITH ACUTE ACCENT}ir' => {'no_dia' => 'coir', 'gloss' => 'fair'},
...
where the 'no_dia' entries can certainly be constructed automatically from the original entries (you take a table that maps something like {SMALL O WITH ACUTE ACCENT} to a plain o, and then run each word in your dict through a regexp that matches on table entries). Then the matching code is more like ($entered is again plain 'coir')
Code: Select all
my @matches = grep { $eng_of_irish{$_}{'no_dia'} eq $entered } keys %eng_of_irish;
if (@matches > 1) {
print "Did you mean:\n";
for my $match (@matches) {
print " $match: $eng_of_irish{$match}{'gloss'}\n";
}
} else {
# do simple stuff if only 1 match
}