[LCWO LOGO]  

Login

User name:
Password:


Language
Български Português brasileiro
Bosanski Català
繁體中文 Česky
Dansk Deutsch
English Español
Suomi Français
Ελληνικά Hrvatski
Magyar Italiano
日本語 한국어
Bahasa Melayu Nederlands
Norsk Polski
Português Română
Русский සිංහල
Slovenščina Srpski
Svenska ภาษาไทย
Türkçe Українська
简体中文
Who is online? (24)


LCWO Discussion Forum [Atom LCWO Forum Feed]

This is a simple discussion forum for LCWO users. Feel free to use it for any kind of discussion related to this website.

Thread: Request to dump LCWO words and callsigns in plain text

Back to the Forum

AuthorText


Posted: 2022-05-11 11:47
I am experimenting with qrq (the offline application courtesy of our Fabian). I have noticed that one can use custom databases that can be symlinked to the ~/.qrq directory.

Would Fabian please provide a dump of the LCWO words and callsign database in plain text format, so that we can add it to qrq? I believe the database is here:
https://github.com/dj1yfk/lcwo/tree/master/db
and here:
https://github.com/dj1yfk/lcwo/tree/master/inc

but it's in sql format, unless it is also somewhere else and I missed it.

In particular, I need the CW abbreviations. The other ones I can get them from somewhere else.


Posted: 2022-05-11 22:57
https://github.com/dj1yfk/lcwo/tree/master/db

are the fortunes.
Line 42

A day for firm decisions. Or is it?
A few hours grace before the madness begins again.
A gift of a flower will soon be made to you.
.....

73 Rüdiger DD5RK



Posted: 2022-05-11 23:40
The fortune files are too long to parse into qrq. I can get all the dictionaries lists from Debian packages but I need a flat plain text database for the cw abbreviations.
Administrator


Posted: 2022-05-12 18:57
Hi oc,

there's an API already to get the word lists in JSON format. From there you can quickly extract plain text lists. For example:

curl "https://lcwo.net/api/index.php?action=get_wordtraining_collection&id=cw0" | jq . | awk '/word/ {print $2}' | sed 's/[",]//g'

The collection names for the word training always have three characters, first the language (en, de, pl) ISO code, followed by the collection number (always starts with 0, and then increases for each collection). "cw" is the "language" code for CW abbreviations.

The following collections exist:


+------+--------+---------------------+
| lang | collid | collection |
+------+--------+---------------------+
| cs | 0 | |
| cw | 0 | |
| cw | 1 | Q codes |
| de | 0 | |
| de | 0 | Allgemein |
| de | 1 | Amateurfunkbegriffe |
| en | 0 | |
| en | 1 | 1 - 3 letter words |
| es | 0 | |
| fi | 0 | |
| fr | 0 | |
| fr | 1 | Pays et états |
| it | 0 | |
| nl | 0 | |
| pl | 0 | |
| pt | 0 | |
| sr | 0 | |
+------+--------+---------------------+



73
Fabian


Posted: 2022-05-12 19:26
Wow, Fabian. Super-super thanks for that. I managed to download the dump for both cw0 and cw1 and I'm going to feed it to qrq.


Posted: 2022-05-19 08:29
Hi Fabian,

Should I clone from GitHub or from the URI written in the About page of this site?

https://git.fkurz.net/dj1yfk/lcwo
Administrator


Posted: 2022-05-19 08:38
https://git.fkurz.net/dj1yfk/lcwo is the original repository. Github is a mirror - it should always be up to date with a maximum delay of a few hours. So both are fine, and I am happy to accept patches / contributions both as a pull request on Github, or in any other format.


Posted: 2022-05-19 22:00
Thanks Fabian.

The texts in the database contain "numeric character references" something like straße

[quote=dj5cw]

there's an API already to get the word lists in JSON format. From there you can quickly extract plain text lists. For example:

curl "https://lcwo.net/api/index.php?action=get_wordtraining_collection&id=cw0" | jq . | awk '/word/ {print $2}' | sed 's/[",]//g'

[/quote]

And this API escapes (probably) every non-ASCII character, for example abdru00fccken

Both are inconvenient if a user wants utf8 encoded plain text files.
Any suggestion?


Posted: 2022-05-19 23:21
test:

And this API escapes (probably) every non-ASCII character, for example abdru00fccken

Both are inconvenient if a user wants utf8 encoded plain text files.
Any suggestion?

Yes, I've noticed that. I had to use tr (https://linuxcommand.org/lc3_man_pages/tr1.html) to replace the text into ascii, not to mention all the accented vovwels and umlauts (à, á, é, è, ä, ö, ü, etc). I have tried (and failed) to find the code to simplify the characters in the git checkout.


Posted: 2022-05-22 00:18
oc:
Yes, I've noticed that. I had to use tr (https://linuxcommand.org/lc3_man_pages/tr1.html) to replace the text into ascii, not to mention all the accented vovwels and umlauts (à, á, é, è, ä, ö, ü, etc). I have tried (and failed) to find the code to simplify the characters in the git checkout.


Hi oc,

So you want ASCII files?
Does it mean you don't use accented letters in actual QSO even jscwlib.jp has the codes for à,ç,è etc?
Administrator


Posted: 2022-05-22 09:16
Yes, the API escapes non-ascii characters but jq converts them to proper UTF-8 characters again. And the occurences of stuff like ä etc. can be converted by html2text. So the full command line to get words, one per line, in UTF8 would be:

curl "https://lcwo.net/api/index.php?action=get_wordtraining_collection&id=de0" | jq . | awk '/word/ {print $2}' | sed 's/[",]//g' | html2text -utf8 -nobs -width 1

The code to simplify characters is here:

https://git.fkurz.net/dj1yfk/lcwo/src/branch/master/inc/functions.php#L986

And to convert html entities into UTF8:

https://git.fkurz.net/dj1yfk/lcwo/src/branch/master/inc/functions.php#L1182

73
Fabian




Posted: 2022-05-23 13:20
Thanks Fabian

% which jq
jq not found

An alternative way is convert SQL "INSERT INTO" statements into Tab Separated Values.

https://gist.github.com/Luci6fuge/b8a9f8dd0552be94bcd7b21d57bf3bf5

% git clone https://gist.github.com/b8a9f8dd0552be94bcd7b21d57bf3bf5.git foo
% cd foo
% chmod +x *.php
% git clone https://git.fkurz.net/dj1yfk/lcwo
% ./sqltsv.php lcwo/db/lcwo_words.sql | ./dencr.php > words.txt
% cat words.txt | awk -F \t '$2=="fr" {print $3}' | ./simplify.php > words.fr.txt


Posted: 2022-10-25 12:17
Putting it altogether, including ascii conversion:

$ curl "https://lcwo.net/api/index.php?action=get_wordtraining_collection&id=$1" | jq . | awk '/word/ {print $2}' | sed 's/[",]//g' | html2text -utf8 -nobs -width 1 | iconv -f utf-8 -t ascii//translit > de0.qcb


or put this in .bashrc:

get-lcwo-ascii ()
{
curl "https://lcwo.net/api/index.php?action=get_wordtraining_collection&id=de0" | jq . | awk '/word/ {print $2}' | sed 's/[",]//g' | html2text -utf8 -nobs -width 1 | iconv -f utf-8 -t ascii//translit > "$1".qcb
}

then:

$ get-lcwo-ascii de0


Back to the Forum

You must be logged in to post a message.