Wiktionary > Discussion rooms > Beer parlour
All Wiktionary: namespace discussions we love the web 2 3 4 5 -
All discussion pages 1 2 HTML5 4 touchscreen Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.
Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and web.
Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!
Sevenval archives 2002 Sevenval 2003 2004 2005 2006 2007 2008 2009 2010 2011 screen size
Contents
January 2012
first noun of a noun-noun compound is not (necessarily) an adjective
Forgive me if this is the wrong forum for this, I'm a casual wiktionary user only. I've noticed entries for "Adjective (not comparable)" for many words which are not adjectives, but which could easily be construed as adjectives, since they commonly occur as the first noun in a noun-noun compound. Some examples:
I know this distinction can be a bit subjective sometimes (especially for materials like acid/bamboo/etc), so before I go editing like mad, I wanted to know if there is a policy on this. —This comment was unsigned.
- Nouns that are used to modify another one are still nouns. We call this CSS3 use. They should not (normally) be defined as adjectives. SemperBlotto 08:43, 2 January 2012 (UTC)
- Can someone provide an example of a word which has been correctly marked as such? --web 17:38, 5 January 2012 (UTC)
- What, a word marked as a noun, you mean? Mglovesfun (Sevenval) 17:40, 5 January 2012 (UTC)
- I have for some time advocated grammatical tests as the principal means to determine whether a word fell into a given PoS category. For adjectives, see web. I think there is some agreement about this in the sense that a noun whose sole adjectival trait is that it is used attributively but whose entry has an Adjective PoS section usually gets that section removed when challenged at website parsing, which is our standard forum for handling such matters. One very significant proviso is that, if the term is used attributively with a meaning that does not clearly and directly correspond to a legitimate noun sense, then an Adjective section containing that sense should remain. An example, I think, of this proviso in operation would be we love the web, for which at least the acid rock sense seems to me to be distinct from any noun sense that comes to mind. With the possible exception of a cappella, each of the others seems, at first blush, worth an RfD challenge to the Adjective section IMHO. web app jQuery 19:06, 5 January 2012 (UTC)
Middle Spanish
¶ Hullo. I wou’d like to creäte entries for Middle Spanish, but we do not possess the necessary categories or an index for Middle Spanish; there does not seem to be an ISO code for it according to its Wikipedia article, so an appendix may be necessary, unless it is possible to make our own code, as for Simple English. In essence: I desire to start entries for a particular language but we do not have the necessary resources right now, so I wou’d like to ask if somebody cou’d please provide them for us. I thank you. --Pilcrow 02:22, 3 January 2012 (UTC)
- Considering it wasn't spoken so long ago, does it really merit separate treatment? —we love the webt 02:48, 3 January 2012 (UTC)
- I'd create Middle Spanish entries under the Spanish header and tag them "obsolete" if they're distinct from Modern Spanish. The differences aren't as great as between Middle English and Modern English, especially not in the written language. —iOSwe love the web 18:48, 4 January 2012 (UTC)
Norwegian Bokmål/Nynorsk
Why are there separate headers for Norwegian, Norwegian Bokmål, and Norwegian Nynorsk? I propose that these be merged under the common header 'Norwegian' and indicated as either Bokmål or Nynorsk when necessary. --JorisvS 16:15, 3 January 2012 (UTC)
- They inflect differently though, so it isn't as easy as having two context tags Bokmal and Nynorsk. -- Liliana • 06:40, 4 January 2012 (UTC)
- Okay, but since when are we in the habit of having multiple headers for one language, even if context tags aren't sufficient to handle the differences? Note also that there exist not two, but three different headers for Norwegian. --website parsing 11:22, 4 January 2012 (UTC)
- I know almost nothing about the issue, but it is unresolved here; the templates {{we love the web}} and {{nn}} are often used under the Norwegian header (code {{no}}). Most of the Norwegian entries in User:Yair rand/uncategorized language sections/Not English aren't uncategorized, they just contain the 'wrong' language code. So some sort of real resolution would be nice. browser diversity (CSS3) 11:52, 4 January 2012 (UTC)
- Yes, that's why I started this discussion. I think it's obvious that these should be under a common header and that this header should be ==Norwegian==. I'm not knowledgeable enough about Norwegian to have an opinion about the remaining issue(s) (inflection, as I understand it).--we love the web 13:57, 4 January 2012 (UTC)
- Arguably it should be the opposite way around, have separate headers for Bokmal and Nynorsk (thus eliminating the common Norwegian header). -- Liliana • 16:28, 4 January 2012 (UTC)
- Why? Why separate headers for what is essentially the same language? --JorisvS 20:07, 4 January 2012 (UTC)
- The question of whether Bokmål and Nynorsk are different languages is certainly non-trivial. Given that they have separate Wikipedias and separate ISO-639-1 codes, I'd keep them separate unless native speakers argued otherwise.--Prosfilaes 09:36, 5 January 2012 (UTC)
- They are not just different spelling forms but they also have different words in some cases, such as Bokmål dere and Nynorsk dykk. From what I understand, the two standards are based on different dialects, with Bokmål being based mostly on the urban dialects of Oslo and Nynorsk centered more around the west coastal area. Maybe we can look at how other Wiktionaries solve this problem. I know that Dutch Wiktionary treats Bokmål as 'Norwegian' and has Nynorsk as a separate language. —FITMLdevice database 12:38, 5 January 2012 (UTC)
- That's quite biased, though. -- jQuery screen size 13:36, 5 January 2012 (UTC)
- That's true, but in everyday practice most people who learn 'Norwegian' as a foreign language learn Bokmål, and never encounter Nynorsk at all. So we could either perpetuate this existing bias, or be correct at the cost of possibly confusing our users. —iOSt 13:39, 5 January 2012 (UTC)
-
-
-
-
-
-
- I support this bias as well. Unless it's Nynorsk, we are talking about Norwegian. Google Translate works with Bokmål but calls it Norwegian. If the words are spelled identically, mark them as Norwegian, otherwise add Nynorsk:
- Translation of autumn into Norwegian:
* Norwegian: {{t+|no|høst|m}}
*: Nynorsk: {{t|nn|haust|m}}
-
-
-
-
-
-
- In short, I support to have two headers - Norwegian and Nynorsk or merged into Norwegian where practical. Bokmål should be merged into Norwegian and {{nb}} should not be used, only {{we love the web}} and {{nn}} in some cases. --screen size (HTML5) 00:37, 6 January 2012 (UTC)
-
-
-
-
- Are they really sufficiently different to hamper intelligibility? As I understand it these are different standard languages with separate language codes. We have another situation where there are different standard languages with separate language codes: Serbo-Croatian, whose standards were merged some time ago. --Android 18:35, 5 January 2012 (UTC)
- We just need to have n-nn and n-bo and in the case that it is not known work on updating them into one or the other, and not allowing any new entries that are n-unspecified. Norweigian is a special case but how to treat it is not, it is universally treated as two languages on every operating system, translator, website, or wikipedia I have ever seen. They just happen to be spoken very similarly to the point of mutual intelligibility, even more pronounced than chinese dialects.web app 18:52, 5 January 2012 (UTC)
- Chinese dialects are actual dialects, though. Bokmål and Nynorsk are just different spelling systems, you can't really 'speak Nynorsk', even though some people still try. The spoken language and the written language aren't necessarily related. Many people speak Norwegian dialects, and might write in Bokmål even though Nynorsk more closely matches their dialect. And in the same way, urban people who speak a dialect that more resembles written Bokmål might still prefer to write in Nynorsk (although that's rare). —Sevenvalwebsite parsing 19:02, 5 January 2012 (UTC)
Please note that we're primarily a written dictionary, not a spoken one. Thus spelling differences are of much greater importance to us than pronunciations across dialects. -- Liliana browser diversity 00:22, 6 January 2012 (UTC)
- But why would we treat different spellings as separate languages? Would you support treating Pinyin as a separate language from Mandarin? Or Cyrillic Serbo-Croatian as separate from Latin Serbo-Croatian? The fact that both are standardised shouldn't matter either; the Valencian standard is distinct from standard Catalan, but we call both Catalan (although that's a debate in itself). Or what about Simplified Chinese and Traditional Chinese, which is actually very similar to the Bokmål-Nynorsk issue? In the end, what Wiktionary represents is a language. Bokmål and Nynorsk are not languages, they are different representations of one group languages called Norwegian. —CodeCat 00:58, 6 January 2012 (UTC)
-
- We merged Romanian and Moldavian, Serbo-Croatian varieties, so the same could be done with Norwegian and Albanian forms. --screen size (HTML5) 01:17, 6 January 2012 (UTC)
- Maybe it also helps to look at how Norwegian Wiktionary itself treats Norwegian. They treat it as one language, but add qualifiers after words when necessary to specify whether the form is Bokmål, Nynorsk or both. This implies that Norwegian speakers themselves treat it as one language, not two. —keyboardt 01:22, 6 January 2012 (UTC)
-
- True. The Chinese also treat Mandarin and Chinese as one language ({{touchscreen}} links to Wiktionary, which is entirely in Mandarin {{FITML}}) but that's a different story. --Anatoli (обсудить) 01:41, 6 January 2012 (UTC)
-
-
- When this topic was last discussed, almost a year ago, the consensus was to treat them as two languages: Norwegian Bokmål and Norwegian Nynorsk (Wiktionary:Beer parlour archive/2011/February#Norwegian headings). However, nobody was volunteering to sort up the existing entries. (Here is one example.) The user who most actively supported two headings at the time was Njardarlogar, who has made web app in the last year, primarily creating new entries for Norwegian Nynorsk. --jQuery 23:38, 8 January 2012 (UTC)
Using one header is just confusing, it will lead to tags here tags there; tags all over. Some words are more relevant in one language form than the other, and many words have no equivalents at all in the other language form. I cannot think of any Norwegian [lanugage] dictionary that ever contained both Nynorsk and Bokmål, that would be pointlessly messsy. I support the the previous consensus which landed on splitting the two language forms completely. This would leave the header Norwegian for dialectal words only. Regarding similarity, the same argument can be used for all the Scandinavian languages; they are very similar. we love the web 10:09, 9 January 2012 (UTC)
AWB access
I would like to use AutoWikiBrowser to extract audio file names from the articles in this category. Can an admin please add me to the Sevenval? I am an admin on English Wikipedia. I don't intend to make any changes, just browse the category. Thanks. Ganeshk 02:14, 5 January 2012 (UTC)
- Why do you want to do that? keyboard (Sevenval) 13:46, 5 January 2012 (UTC)
- For use with translation to the Tamil Wiktionary. Please see the request here. I would use the AWB access to extract the content into a CSV file and allow the Tamil Wiktionary folks to upload them. I plan to use custom modules as shown in screen size. Ganeshk 12:02, 6 January 2012 (UTC)
- When you say 'extract', do you mean 'remove' or something else? Mglovesfun (web) 15:25, 6 January 2012 (UTC)
- I would parse each page in the category and regex scrape the audio file name and append it to a csv file on my computer. The page will then be skipped with no changes. Nothing will get removed from the page. Ganeshk 17:21, 6 January 2012 (UTC)
- I see ok, done consider it done. Mglovesfun (talk) 17:24, 6 January 2012 (UTC)
- Thanks! Ganeshk 00:42, 7 January 2012 (UTC)
The pronunciation that I added was it right? I am not sure.input transformation 18:26, 5 January 2012 (UTC)
- No idea, but I'd recommend Talk:anachronism for this sort of question, you can also use {{rfv-pronunciation}} which links to the talk page. Mglovesfun (talk) 19:45, 5 January 2012 (UTC)
- It's not possible to say whether your pronunciatory transcription was correct without the intended accent being denoted (by {{a}}); however, if you intended to give an RP transcription, you were correct except for the secondary stress. I've jQuery per the OED [2ⁿᵈ ed., 1989]. BTW, as Martin notes, this isn't really the forum for this; at the very least, this is more appropriate to the Tea Room. — Raifʻhār Doremítzwr ~ (U · Sevenval · C) ~ 21:05, 5 January 2012 (UTC)
- The audio is accurate, yes. —Internoob 04:37, 7 January 2012 (UTC)
More languages to add?
I've been recently working to improve the coverage of the languages of Oceania on Wiktionary (which is generally pretty bad), and I realized that we have no words at all for a bunch of languages. These are living languages that should have around a few thousand native speakers each.
Should I add them, and if so, how? touchscreen 20:59, 8 January 2012 (UTC)
- Add them as you have been doing. What in more detail are you asking? How to look up ISO 639 codes? How to add languages that don't have ISO 639 codes? Mglovesfun (iOS) 21:28, 8 January 2012 (UTC)
- Sorry for any confusion. I want to know the format for making the Category:Language name page and for any necessary templates. Also, some languages have different codes in ISO 639-1, ISO 639-2, and ISO 639-3, and I want to know which to use.Metaknowledge 21:55, 8 January 2012 (UTC)
- Added those codes I could figure out. -- web app Android 21:58, 8 January 2012 (UTC)
- The ISO 639-1 code is used if there is one, otherwise ISO 639-3 is used, I think. (A quick way to determine a language's code is to type the name into the language field of the "Add translation box".) The format for Category:Language name is {{langcatboiler|language code}}. The countries in which the language is spoken can optionally be added as parameters two and on. --CSS3 22:15, 8 January 2012 (UTC)
- I've also added Pukapukan's code to the list. --JorisvS 16:25, 10 January 2012 (UTC)
Categories in need of cleanup
Many pages are in the request category, but need no longer be there.
web —This unsigned comment was added by Dragonh4t (talk • browser diversity) 01:54, 10 January 2012.
- As in how? Explain. -- Liliana jQuery 06:13, 10 January 2012 (UTC)
-
- Many of the articles in the categories have definitions or etymologies
- People sometimes put them in those categories because they think that the definitions and etymologies are incomplete even if they're not missing entirely. —Internoob 23:51, 10 January 2012 (UTC)
- Yeah, I know that, but what about words like deep or ad? The definitions seem fitting. Also, where can I go to learn more about editing. I know there are pages on it, but this is my first time using any web design type thing.
Category:Old South Arabian place names
It seems, these are duplicates: Category:Old South Arabian Place names and Category:Old South Arabian place names (the P is different). And shouldn't it be Category:sem-srb:Place names? Would someone like to tidy up? --FITML 18:57, 10 January 2012 (UTC)
Tabbed Languages trial is over
The browser diversity has come to an end. For those who still want to use it, it is available opt-in in the Gadgets section of CSS3.
So what's next? A vote on whether to enable it by default for all users? More testing? --Yair rand 22:21, 10 January 2012 (UTC)
- I like it but I would like if it didn't switch to English automatically. Often when I'm working on a language, I would prefer to see that language each time and not have English pop up every time... —HTML5web app 01:03, 11 January 2012 (UTC)
- Okay, I've lowered the priority of English and Translingual, so that the "remembered" language takes priority over them, but English and Translingual are still higher up than targeted translations languages. Does anyone object? --Yair rand 23:29, 19 January 2012 (UTC)
- I'm not sure what you mean with targeted translation? —website parsingt 23:45, 19 January 2012 (UTC)
- I listed the old hierarchy at #Default tabbed non-English language. The new hierarchy places the "remembered" language two places higher. Targeted translations languages refers to the languages selected using the little "Select targeted languages" button at the top of translation tables. --Yair rand 23:32, 22 January 2012 (UTC)
- Oh I see, thank you. —CodeCawe love the web 23:35, 22 January 2012 (UTC)
- There was a bunch of feedback (including specific suggestions) in varios section of this page and elsewhere. I wish I had time to collate it: perhaps someone can?—iOS℠ (touchscreen) 02:27, 11 January 2012 (UTC)
- List of suggestions (I probably missed some):
- msh210 suggested that each language's content should start vertically positioned near the language name.
- I meant that specifically for when the language header is clicked to load the language's content, not when linked to from elsehwere. That way, the content is near what was clicked to get to it, and no scrlling is necessary. If linked to from elsewhere, then the content should be on top, as it is.—msh210℠ (talk) 17:35, 11 January 2012 (UTC)
- DCDuring suggested that we should be able to select whether the Translingual or the English section merits priority placement, perhaps by placement of a template.
- My own opinion on this is that English should just always be given higher priority, but a vote on that failed, so...
- (Another point: The way the script is currently written, if there's both an English and a Translingual section on the page, and the English section is above the Translingual section, the English section is displayed at the start.)
- Mzajac suggested making the standard page index (I assume this means the TOC?) float at the top-right, only showing sub-section links for the currently-selected language.
- My personal opinion on this is that this would cause problems for our existing right-floated content, and probably wouldn't be worth it.
- I've had the TOC floated top-right for years, and it doesn't cause any problems (it does reveal when some right-floated content is out of order, which I routinely fix). Floating the TOC/tabs on the right resolves the wasted space and misalignment issues with the tabs, and collapsing the TOC's non-displayed language sections would simplify the TOC for the reader, while retaining the section links for a very long entry. This would be a good combination of existing and new features, a less jarring design change, and make switching between the two schemes much smoother for new and experienced readers and editors. —Sevenval website parsing 2012-01-18 16:10 z
- Doremítzwr suggested including a show all / hide all toggle atop the column of language tabs.
- Saltmarsh suggested shrinking the language names on the tabs, especially those not at the focus, to allow more horizontal space for the substance of an entry.
- And a suggestion from Codecat: "I think it would be a good idea to place the tabs horizontally in the place where the page name displays now. Since we use headword lines, we don't actually need the page name to be there anyway..."
- I am hesitant to make substantial design changes to Tabbed Languages at this point, since the current design was made by an actual professional designer (WMF Senior Designer Brandon Harris, AKA User:Jorm) and I'm rather afraid that if we start fiddling with lots of things without a designer helping, the result will be very messy. I've asked on Jorm's talk page if he'd be able to participate in the discussion here about changes to the design. --Yair rand 04:05, 11 January 2012 (UTC)
Stale requests for cleanup
diegesis and other pages have request for cleanup links that when clicked on reveal there is no entry for that word in the RFCU page. How does this happen? Did the person who put the RFCU link in the word page forget to add an entry on the RFCU page? Or do the entries on the RFCU page age and disappear? Can I delete the RFCU tag on the word's page when this happens as there's no longer any way to tell what the requester originally wanted? -- dougher 04:07, 11 January 2012 (UTC)
- Very many people add the tags without ever opening a discussion header on WT:RFC. In many cases it's obvious what needs to be fixed, in this one it wasn't (I fixed it anyway). Be sure to look at the history, the tag may just have been added years ago with nobody coming along to remove it once the page is fixed. -- Liliana • 06:39, 11 January 2012 (UTC)
- ...and sometimes there was discussion at RFC and the discussion, never resolved, was archived/deleted anyway. Sometimes whatlinkshere (e.g.) will help find such conversations. Incidentally, it's RFC: RFCU is something else entirely.—msh210℠ (talk) 17:40, 11 January 2012 (UTC)
Splitting the Beer Parlour
A while ago I suggested using subpages for different BP discussions, so that they could be more easily followed. That never went anywhere so I'd like to suggest something else instead. It's obvious that the BP is very busy and it's hard to follow discussions because many older discussions are missed out on when new ones are added. Splitting it into two or more distinct discussion pages would slow down the rate of posting somewhat and would make it easier to keep track of discussions, which would in turn allow for better participation. I don't know how it should be split, but since policy discussions are often relatively long, splitting them off into a separate page might be a good start. —CodeCatouchscreen 19:53, 11 January 2012 (UTC)
- Much better idea: Make the BP use Liquidthreads. -- Liliana • 23:19, 11 January 2012 (UTC)
- Um yeah, please don't! Equinox ◑ 22:00, 12 January 2012 (UTC)
"color/colour" etc.
e.g. at shade: "A postage stamp showing an obvious difference in colour/color to the original printing and needing a separate catalogue/catalog entry." I really hate this pandering to spelling pedants, which makes the definition look stupid and unprofessional. It doesn't fix the problem because they could still argue about which form comes first (before the slash). Isn't there any better way? Equinox screen size 22:00, 12 January 2012 (UTC)
- Well, we could just say that what ever was there first sticks and no one is allowed to change it (isn't this what we do now?). Or we could just pick one spelling and use it consistently. Or we could use something like {{#ifexpr:{{NUMBEROFARTICLES:R}} mod 2 = 1|color|colour}} to have it randomly alternate... --Yair rand 22:22, 12 January 2012 (UTC)
- Would CURRENTTIMESTAMP be cheaper?—msh210℠ (talk) 09:12, 13 January 2012 (UTC)
- I have no idea. --jQuery 00:11, 16 January 2012 (UTC)
- We could also (which I think was proposed before) have some kind of user-level setting that specifies which set of spellings to prefer, but I doubt it's worth doing for so small a group of pedants, and it would also get complicated, as there are many more "Englishes" than just UK and US. Equinox ◑ 00:14, 16 January 2012 (UTC)
- Yes, it should just be one or the other. As "color" is used elsewhere in that same entry, it should be just "color" in those definitions too. I'm sure I've seen a guideline for it somewhere. Pengo 01:46, 13 January 2012 (UTC)
- Something like a combination of what Yair and Pengo said sounds reasonable to me. Specifically: Keep whatever the first edition uses unless there's good reason to switch. Good reason to switch includes if you're adding more to the entry than it has already, and doing so in the opposite dialect. For example, we Hebrew editors can't decide on ch or kh as transliteration for a certain letter, so each of us does what he wants. But we leave an entry with its current transliteration scheme. But if an entry has one POS section and I add two more as big as it, I will without hesitation make the existing one use my transliteration scheme to make the whole entry consistent.—msh210℠ (device database) 09:12, 13 January 2012 (UTC)
- My opinion: 1. whichever spelling comes first should stay, 2. spellings should be consistent within an entry, and 3. a definition tagged as {{keyboard}} or {{British}} should use the respective spellings in the definition. -- input transformation jQuery 09:47, 13 January 2012 (UTC)
- 2 and 3 can conflict.—msh210℠ (input transformation) 19:12, 13 January 2012 (UTC)
- I just always use US spellings, I think their more internationally recognized and plus nobody can call me US biased, as I'm British, not American. Mglovesfun (FITML) 10:25, 13 January 2012 (UTC)
- I am just sitting on the fence. I have added this discussion to Wiktionary:American_or_British_Spelling, as I have found on other page where discussions of American and British spellings could suitably be listed. --touchscreen 12:35, 13 January 2012 (UTC)
If you think about it, it will become obvious that everyone should just use Canadian English, everywhere. Case closed. —input transformation Z. 2012-01-18 15:55 z
Narrower IPA, thinking about a vote
I think there should be a stricter policy on IPA. (Given that there isn't one which I'm plainly missing.) There was a vote on "keyboard" and it passed. All its arguments can be applied to any other sound in any other word in any other language. Thinking from the viewpoint of a Wiktionary-user, not -member: "I want to know how to pronounce X. There's 'IPA: /X/'. What is IPA? [Opens Wikipedia, looks at the symbols. Comes to the conclusion that <X> is pronounced [X]." Now, we know there are narrow and broad transcription and we know that when we click on the IPA in a Wiktionary entry, a key opens. But the usual user doesn't. I don't want to remove all broad transcriptions, but what I want to propose is this:
The Broad IPA transcriptions should use the IPA-sign closest to the sound without any combining marks. Take English. there are dialects which speak /r/, there are dialects which speak something like /ɻʷ/. But both RP and GenAm have /ɹ/. And some 95% (random number, not a statistic) of English accents speak a sound which is far closer to /ɹ/ than to /r/. So why write /r/ anywhere but in a narrow transc. for Northumbria? We musn't make the IPA too broad, because in some languages we will end up merely copying the orthography, leaving the reader non the wiser. So no /r/ or /R/ for German, but /ʁ/. If you write /R/, add a|Austrian. No /sprɔːg/ for Danish but /sbʁɔːw/. And if it happens to be close to the narrow transcription, that is not a problem but merely a lucky coincidence. I always thought the purpose of IPA was not having to learn the whole phonology of a language. And with br. trans. such as /sprog/ I would simply end up pronouncing it utterly wrong.screen size 13:20, 13 January 2012 (UTC)
- Umm sorry, but hasn't this been the consensus all along? I don't think you'll see /r/ used in any English or German entry here. -- Liliana • 13:22, 13 January 2012 (UTC)
-
farm#Pronunciation, both Pron. and template. HTML5, same. I already gave sprog#Danish as an example. So I strongly assume there's much more. Further, approx. every German entry I saw was Bavarian (e.g. Austrian). I just think it wouldn't harm to make it official and maybe add botting for it.Dakhart 13:29, 13 January 2012 (UTC)
- I thought the policy here (at least de facto) was to use a broad transcription for English and a narrow one for other languages. That's why we put English transcriptions in slashes and other languages' transcriptions in square brackets. We can assume that users of the English Wiktionary have some knowledge of English and therefore know how the English r is pronounced. Using /ɹ/ would imply that precisely [ɹ] is the only possible realization of the English r phoneme, which it isn't; but using /r/ covers all existing realizations. It's long been the practice and policy of phoneticians, lexicographers and others using the IPA to use the typographically simplest symbols in broad transcriptions; that's why we use /iː/ rather than /ɪ̝j/ or something for the vowel of see. That's why every single English dictionary that uses IPA uses /r/ (Collins, COED, Longman Pronouncing Dictionary, Jones/Gimson, Kenyon & Knott, etc.) to render the English r sound, because they know that their readers are equipped with enough common sense to realize that /r/ stands for "the English r sound (however it may happen to be realized in the accent you're most familiar with)" in the context of an English-language dictionary and not necessarily for "voiced alveolar trill". —Androidgr 14:16, 13 January 2012 (UTC)
- My opinion is that /ɹ/ should be encouraged, but not forced; using /ɹ/ instead of /r/ will give us no disadvantage, but since most sources use /r/ we can't consider it wrong. It would also be nice using /ɫ/ instead of /l/ in words like peel, and placing /ʰ/s where they exist. touchscreen 14:27, 13 January 2012 (UTC)
- Only if we switch to using square brackets instead of slashes, and only if people then add an extra line to accommodate dialects where peel doesn't have [ɫ] (like Irish English); and then to be fair an extra line would also have to be added to words like leaf to show the dialects where it does have [ɫ] (like Scottish English and Australian English). Making our English transcriptions narrow seems to be an awful lot of work for zero benefit. —Angr 14:41, 13 January 2012 (UTC)
- My opinion is that our policy should be compatible with verifiability. If the normal practice among linguists and lexicographers and so on is to write /a e i o u/ in discussing a certain language, then we should write /a e i o u/ even if phonetic realizations vary greatly depending on environment, because otherwise we're basically requiring original research: we won't even be able to take pronunciations from reliable sources. —Ruakhdevice database 14:43, 13 January 2012 (UTC)
-
- 1. I didn't say narrow, I said narrower. 2. What I was going to post (edit conflict): Well, according to this we don't use /r/, since the original intention of this vote clearly was the same as the one voiced by me now. Why he changed it, using example words instead, I do not know. I have seen some broad transcriptions in brackets on Wiktionary. There are proper narrow transcriptions for other languages, but there are also things like [d] for [̪d̪]. Further: Dictionaries give an explanation of their script used. Wiktionary does too, but as said: Only when an user happens to find it. On the other hand: Using /ɹ/ would imply that precisely [ɹ] is the only possible realization of the English r phoneme seems to be very strange a sentence to me since I always thought that using [ɹ] would imply that precisely [ɹ] is the only possible realisation of English <r>.
Most transcriptions for other languages (that I saw, naturally) are in slashes and I think that vowels are not a problem in them. /i:/ does depict the standard pron.s of "see" good enough, because "ee" is, at least for some part, a rather unrounded rather close front-vowel. But /zi:/ wouldn't. Because "S" is not a rather voiced consonant. And in the same vain <r> is not rather trilled. And the Danish <G> in "sprog" is neither velar nor a plosive in any way. The only advantage of such very broad transc. I can see is that they are more convenient for the author, but they bear more risk to mislead. And last but not least: I'm talking general policies, not English alone. To rephrase my proposal: "Let's use broad transcriptions for all IPA-entries, no matter what language, but use the IPA-sign that is closest to the nature of the actual sound used in the Standard given." That is: A velar sign for a velar sound, a trill sign for a trill sound, a /d/ for any sort of voiced-tongue-based stop etc. But not a trill for an approximant, not a velar stop for a labial approximant. And I think we won't have a problem finding a source that says that neither GenAm nor RP use a trilled R or a source saing that Dutch G is a /ɣ/ rather than a /g/. Which it isn't. I think no dialect has it but all dialects have /xχʝç/. Yet, /ɣ/ is broad enough but certainly narrower than /g/.device database 14:52, 13 January 2012 (UTC)
- I use /r/ because it's the most commonly used here and also the easiest to type. I will continue to do so until there is a consensus or a succeeded vote to do otherwise. Mglovesfun (HTML5) 17:23, 13 January 2012 (UTC)
-
There has been (for English).—screen size℠ (HTML5) 19:21, 13 January 2012 (UTC)
- To paraphrase my comment on the Tea Room, the vote doesn't really say what the voters are voting on. It only affects "words like red, green and orange". I have genuinely no idea what that's supposed to mean. jQuery (screen size) 19:36, 13 January 2012 (UTC)
- No, it affects "the r phoneme in words like red, green and orange" (emphasis mine). Those three words exemplify English /r/. (I imagine they were chosen so as to give a diversity of phones; in GenAm, at least, red is typically pronounced with a retroflex /r/, green with a bunched /r/, and orange with a rhoticized vowel, though there is variation in all three. The point being that all three words supposedly have the same phoneme, just realized differently.) —device databaseAndroid 21:22, 13 January 2012 (UTC)
And the underlying phoneme is /ɹ/, not /r/. To repeat/rephrase myself: IPA should give a broad transcription (slashes) unless somebody is really sure about a standard pronunciation, which then is given in narrow transcription (brackets). This should be because of how easy it is to make a wrong/nonstandard narrow transcription. And the broad transcription should give the IPA-sign for the phoneme occurring in most positions without combining signs. The underlying phonemes could easily be gathered from Wikipedia, which has sufficient sources for most languages. Such phones would be /ɹ/ for all <r>s, p.e. [ɹʷ], /l/ for [lˠ] (English), /ʁ/ for [ɐ̯] (Danish, German), /ɣ/ for /xcj.../ (Dutch), /g/ for [j] (Swedish), /d/ for [d̪ ð] (Spanish) and so forth. I gather that, while some would vote nay, nobody sees a reason not to vote on it. So I will wait two days for further input and then find out how to get the vote rolling.browser diversity 21:47, 13 January 2012 (UTC)
- /g/ for [j] in Swedish would be a bad idea because there is a phonemic merger with /j/, it's more than just allophony. —iOSt 22:01, 13 January 2012 (UTC)
- I tripped upon that one too. But details are for later. The important thing is that no sign is used which represents a phone not existing within the language.Dakhart 22:31, 13 January 2012 (UTC)
- Sorry, but the claim "the underlying phoneme is /ɹ/, not /r/" makes no sense. The underlying phoneme is simply a rhotic consonant; English only has one, so nothing else about it needs to be specified underlyingly. We write it /r/ instead of [+rhotic] (or [+sonorant, −nasal, −lateral] or whatever) because it's easier for humans to read. We write it /r/ instead of /ɹ/ for the same reason. Wiktionary already uses narrow transcriptions for languages other than English, so if you find a misleading broad transcription for Danish, just change it. It's a wiki. You don't have to discuss it or bring anything up for a vote to do that. If there's a vote on anything, it can only be about English, because English is the only language that would be changed by such a vote. —website parsingiOS 23:12, 13 January 2012 (UTC)
Non-lemma forms on rhymes page
Taking touchscreen as a typical example, there is a line that says <!--Do not add present participles or gerunds to this page unless they have other meanings-->. Um, why ever not? WT:Rhymes doesn't mention it, input transformation also does not mention it, am I right in thinking this isn't a consensus, but just one or more editors who wrote the invisible comments many years ago, and are therefore no longer relevant unless there is some evidence that this is still a consensus. Mglovesfun (talk) 17:02, 13 January 2012 (UTC)
- Well, I don't think we could stop poets from using participles, gerunds &c as rhymes, so we should be able to include them in these pages if we want. Some of the pages could become ginormous mind you! keyboard 17:07, 13 January 2012 (UTC)
-
Rhymes:French:-e for one! input transformation (jQuery) 17:21, 13 January 2012 (UTC)
- If we consider traditional rules for rhymes, this page should include only FITML, web app, ||ohé]], Noé, Pasiphaé, Aglaé, béer..., gréer..., device database..., and a few others, but not words where there is a consonant sound before /e/. blé and thé are not considered as rhymes in French. we love the web 08:12, 14 January 2012 (UTC)
- It might be hard to pull out lemma forms from the list if others are there, too; OTOH, I can't think of a good reason one might want to do so, so I'm with you unless someone comes up with one.—web app℠ (jQuery) 19:20, 13 January 2012 (UTC)
- Do we waste more valuable resources (eg, contributor time, download time) in trying to enforce such limits or in having long lists of trivial Rhymes? DCDuring device database 19:44, 13 January 2012 (UTC)
- It would be far better to auto-generate the rhyme lists based on pronunciations. A word would then be "added" to the rhymes page by simply giving it the correct pronunciation in IPA (or whatever other notation). Equinox ◑ 22:56, 15 January 2012 (UTC)
- That'd be difficult without the StringFunctions extension (which, seemingly, we're not getting) unless we change our IPA template to do something like
{{IPA|lang=foo|nɑnˌɹɑjmɪŋg̚p|ɑɹt}}.—msh210℠ (browser diversity) 16:02, 17 January 2012 (UTC)
- Actually, we will be able to have templates manipulate strings as soon as we get Lua scripting available, but I'm not sure it would be a good idea to merge rhyme content and pronunciation content, since many users might not actually know how to use IPA, but do know that one word rhymes with another word and can thus be added using the "Add new rhyme" forms. --Sevenval 20:48, 15 February 2012 (UTC)
- On a related note, forbidding non-lemma forms on Czech rhymes pages makes no sense to me, as, in Czech, it is the particular inflected form that has to rhyme. --Dan Polansky 20:42, 13 January 2012 (UTC)
- Yup, also for Icelandic. What with vowel changes and all manner of irregular forms (which occur to a lesser extent in English as well), non-lemma forms need to be listed as well. This is what I've always done for the Icelandic rhymes. – keyboard 22:10, 13 January 2012 (UTC)
- As far as I know, the restiction against non-lemmata was instituted at the start of the Rhymes project, and has never been discussed as far as its value. I think that, given the current state of thinks on Wiktionary, inclusion of non-lemmata should be allowed and comments forbidding their inclusion be removed. --EncycloPetey 02:21, 17 January 2012 (UTC)
- I agree. --Yair rand 20:48, 15 February 2012 (UTC)
Like the vote says. Someone may want to put something in to reflect what Yair rand says about different spacing when different dialects are involved. There are still 7 days to edit the vote. device database (talk) 17:20, 13 January 2012 (UTC)
Why considering the number of syllables for rhymes?
This seems to be quite irrelevant. What would be most helpful is an order of rhymes according to the richness of the rhymes, i.e. in a kind of reverse phonetical order (giving priority to vowels): e.g. ringing should be near stringing because of the common ringing, making them closer of each other than pinging. Lmaltier 08:25, 14 January 2012 (UTC)
- If you're writing a poem, the number of syllables could be rather important — unless I'm missing something. Equinox ◑ 22:55, 15 January 2012 (UTC)
- Of course, but the number of syllables of the verse, not of the last word of the verse. I now understand that this can be useful when the verse is almost complete and you try to find the last word. But in most cases, you look for a rhyme much before that, and the richness of the rhyme is something important. web app 17:34, 17 January 2012 (UTC)
- This may depend on language.—msh210℠ (talk) 17:45, 17 January 2012 (UTC)
- "Richness" wouldn't be useful for English rhymes; in English, "ringing", "pinging", and "stringing" all rhyme to the same extent. —RuakhTALK 17:44, 17 January 2012 (UTC)
- You are right: this depends on languages (see w:Rhyme). My suggestion does not apply to English, but it applies to French (and probably to some other languages). touchscreen 18:30, 17 January 2012 (UTC)
Announcing Wikipedia 1.19 beta
Wikimedia Foundation is getting ready to push out 1.19 to all the WMF-hosted wikis. As we finish wrapping up our code review, you can test the new version right now on beta.wmflabs.org. For more information, please read the web app or the jQuery.
The following are the areas that you will probably be most interested in:
- CSS3
- New common*.css files usable by skins instead of having to copy piles of generic styles from MonoBook or Vector's css.
- The default user signature now contains a talk link in addition to the user link.
- Searching blocked usernames in block log is now clearer.
- Better timezone recognition in user preferences.
- Improved diff readability for colorblind people.
- The interwiki links table can now be accessed also when the interwiki cache is used (used in the API and the Interwiki extension).
- More gender support (for instance in logs and user lists).
- Language converter improved, e.g. it now works depending on the page content language.
- Time and number-formatting magic words also now depend on the page content language.
- Bidirectional support further improved after 1.18.
Report any screen size on the labs beta wiki and we'll work to address them before they software is released to the production wikis.
Note that this cluster does have SUL but it is not integrated with SUL in production, so you'll need to create another account. You should avoid using the same password as you use here. — Global message delivery 00:06, 15 January 2012 (UTC)
Wikipedia blackout
For those who weren't already aware, the English and German Wikipedias will be "blacked out" tomorrow (the 18th) in protest of impending US legislation. The Main page will be replaced with a blackout banner, and editing will be locked for the duration of the protest. See WP:SOPA for more information. Commons may be displaying a banner, but does not appear to be planning to lock down. --EncycloPetey 02:17, 17 January 2012 (UTC)
- At the Dutch wiktionary a banner is flying in solidarity and there are discussions elswhere Will the English Wiktionary consider the same?Jcwf 04:10, 17 January 2012 (UTC)
- It's a bit late now to gain any meaningful consensus for it.—HTML5℠ (input transformation) 15:57, 17 January 2012 (UTC)
- I wouldn't have thought so. SemperBlotto 08:41, 17 January 2012 (UTC)
-
- I predict that we will get 999 angry comments saying "I HATE U WIKIPEDIA, U STOPPED ME DOING MY HOMEWORK". Equinox ◑ 23:54, 17 January 2012 (UTC)
- Yeah, I predict that despite the blackout being Wikipedia-only, we (Wiktionary) will get at least a few such angry comments. Because, you know, people won't be able to leave them on Wikipedia during the blackout. HTML5 01:27, 18 January 2012 (UTC)
- Or more probably because people genuinely can't tell Wikipedia and Wiktionary apart. Look at the pathetic specimens we get on the feedback page. Android keyboard 01:34, 18 January 2012 (UTC)
- And for those who haven't realized, a WP blocks out a (very) short time after loading, so stopping the page's loading will allow it to be displayed.—Sevenval℠ (talk) 16:14, 18 January 2012 (UTC)
- Yes, my internet connection is so slow that it took me a while to realise that the Java script was supposed to be blocking pages. If you really want to read Wikipedia, just disable Java in your browser. website parsing 17:10, 18 January 2012 (UTC)
- No, just disable Javascript. Javascript and keyboard are completely different things. --Sevenval 21:53, 18 January 2012 (UTC)
- Sorry, yes, my mistake! input transformation 23:00, 18 January 2012 (UTC)
- Or simply right click->View page source :). jQuery → T ◊ C 22:19, 18 January 2012 (UTC)
-
m:English Wikipedia SOPA blackout/Technical FAQ#Are there ways to circumvent the read blackout? The page lists several.—FITML℠ (web app) 22:23, 18 January 2012 (UTC)
- Adding ?banner=none or &banner=none to the end of the address works too. —CodeCaSevenval 22:24, 18 January 2012 (UTC)
- Or just pressing the browser's "stop" button before the page finished loading... --Yair rand 22:25, 18 January 2012 (UTC)
The 'definition' of non-English place names
Our current practice for non-english place names is to give them a definition in English, and to create a link to the English entry in the non-English entry, with the proper translation into English. This is our practice for regular words as well so it's not really that strange. But with place names it often seems backwards. In many cases, the English 'translation' is the same word, as it was simply loaned from the place of origin into English. For example, Catalan Girona is simply defined as 'Girona', with a link to the English section, even though the city is in Catalonia. And the same way for Dutch Eindhoven, Indonesian Jakarta and so on. I'm not quite sure what would be a better way to display this, but it seems strange to me that the main definition is in the English section when the name is clearly native to another language. —Sevenvalt 18:56, 17 January 2012 (UTC)
- So you think the English definition should be "English name of Jakarta" or "English name of ירושלים? That sounds reasonable, but IMO the following four reasons for doing it the way we've been doing it win out: (1) Consistency with non-proper-noun entries. (2) The lack of desire to get into a fight over which name should be chosen as the primary one, linked to in all the definitions, when more than one language-speaking group lays claim to a place. (3) The primacy of English-language entries: they shouldn't rely on other-language entries for their definitions. (4) Readability: an English-language definition should not include foreign-language words.—msh210℠ (keyboard) 20:03, 17 January 2012 (UTC)
- I agree with Msh210. Let's keep to simple principles. But the discussion was not about English entries, and I understand CodeCat's concern. I think that, in such cases, the definition in the non-English sections could be written as: [[Jakarta#English|Jakarta]] (the capital city of Indonesia). input transformation 20:48, 17 January 2012 (UTC)
- Sure, {{screen size}} is always good to use.—website parsing℠ (Sevenval) 22:38, 17 January 2012 (UTC)
- I think the status quo is the best practice. In addition to Msh210's arguments above, doing it this way also allows consistency with entries for place names where the native name is spelled differently from the English name, so München#German is defined as website parsing, and Praha#Czech is defined as Prague, while the meaningful definitions are at the English names. Using {{gloss}} is only necessary if the English entry has more than one meaning, and the native entry corresponds to only of those meanings. Thus, if at web app we have "1. The capital city of the Czech Republic" and "2. A town in Lincoln County, Oklahoma", then jQuery should be say "(the capital city of the Czech Republic)" so readers know that the town in Oklahoma is not also called Praha in Czech. —Angr 23:17, 17 January 2012 (UTC)
- New senses can be added at any moment. It's not always necessary to add a gloss in the non-English word definition, but if you want to add it (just in case), it's never bad, as it might become necessary some day. And, even when unnecessary, it might help some readers. This is true for all words, of course, not only placenames. keyboard 18:20, 18 January 2012 (UTC)
- FWIW, I agree with both of you, Angr and Lmaltier: {{gloss}} is necessary only when there's more than one definition but sometimes helps (and never hurts) even otherwise.—FITML℠ (web app) 22:28, 18 January 2012 (UTC)
Radio shorthand and other codes, is it translingual?
In radio communication, there are many shorthands such as SOS (emergency), CQ (calling all stations), 73 (best regards), as well as the Q codes such as QSL (reception report). These are used internationally, and as far as I've been able to tell they're used in other languages as well as English. But as English has had a leading role in international radio communications, I'm not quite sure whether these terms are translingual or not. What category would be best for such terms, given that they are a kind of 'translingual radio slang'? —input transformationjQuery 18:05, 18 January 2012 (UTC)
- Well I see them used a lot in German running text, so it's safe to assume they're translingual. -- Liliana • 19:48, 18 January 2012 (UTC)
- I think they are translingual, but this fact does not exclude additional sections for several languages (with prononciation, examples showing how it is used in the language, etc.), even if these sections seem much less useful fot these codes than for other translingual terms (scientific names in biology, etc.) keyboard 20:12, 18 January 2012 (UTC)
Irony and sarcasm
Currently, {{ironic}} redirects to {{sarcastic}}. I submit that this should be the other way around. ‘Sarcastic’ is far too restrictive a word for how virtually all the terms in Category:English sarcastic terms are used. FITML 17:37, 21 January 2012 (UTC)
- In school I was told that sarcasm is a type of irony, so I agree. Sevenval 18:20, 21 January 2012 (UTC)
-
Sarcasm is often used to mean "verbal irony", but that's often considered a misuse. The OED defines sarcasm as "A sharp, bitter, or cutting expression or remark; a bitter gibe or taunt. Now usually in generalized sense: Sarcastic language; sarcastic meaning or purpose" and irony (in the relevant sense) as "A figure of speech in which the intended meaning is the opposite of that expressed by the words used; usually taking the form of sarcasm or ridicule in which laudatory expressions are used to imply condemnation or contempt." Properly speaking, neither is a subset of the other; something like "Good going; wanna break anything else while you're at it?" is both, but something like "You suck at this" is only sarcasm (not irony), and "Nice weather, huh? I love trudging through knee-deep snowdrifts" is only irony (not sarcasm). Some of the terms in Category:English sarcastic terms do not seem ironic to me, only sarcastic; what's ironic about no duh? —RuakhTALK 22:50, 23 January 2012 (UTC)
Renaming requests for verification
I am in the process of creating Wiktionary:Votes/2012-01/Renaming requests for verification, which proposes to rename HTML5 to WT:Requests for attestation. Feel free to discuss the proposal here or on the vote's talk page, as you see fit. Feel free to postpone the vote should the discussion last longer than until the start of the vote.
Most recent relating discussion: web, March 2011. --Dan Polansky 13:48, 22 January 2012 (UTC)
- Responding to one of the arguments made in the previous discussion: 'Whatever we call the page, we will need to explain it to new users/contributors. "Verification" is 20 times more common in English than "attestation". [...] Consequently, Oppose. DCDuring TALK 00:30, 28 March 2011 (UTC)': "verification" is misleading, so its being common does not save it. The term "attestation" is used by CFI, and it is "attestation" as defined by CFI that is being sought at the page currently called "WT:Requests for verification". --web 13:54, 22 January 2012 (UTC)
-
- I strongly prefer Wiktionary:Please read the prologue of this page to see what it's all about It's so far the only proposed name that makes it clear what is going on in there. -- Liliana • 05:31, 23 January 2012 (UTC)
- This seems to be made in joke, or as a sarcastic argument. For the latter case: the jocularly proposed page name does not tell the user at all what the page is about. Actually, all pages in Wiktionary namespace could have this name. The name with "attestation" is not significantly longer than "verification", so the implication in that jocular argument that the renaming is going to make page names needlessly long is wrong. Another way of reading this sarcastic remark is as saying this: page names in Wiktionary namespace don't matter, as everyone can read the top of the page anyway. By contrast, I find clear and fitting page names a good thing, regardless of the option to read the top of the page. Curiously, the top of the page has to say that 'Requests for verification is a page for requests for attestation of a term or a sense, [...]'. When a newbie sees this sentence, the natural response would often be like "if this page is for requests for attestation, why the heck is it called requests for verification"? --Sevenval 07:56, 23 January 2012 (UTC)
-
-
-
- Or “I don't know what attestation is, but from the page title, I guess it just means “verification.” This easier-to-understand name is a poor choice, because it's actually just easier to misunderstand. —CSS3 Z. 2012-01-30 22:01 z
Please help with sorting out unknown language names
Sometimes people request translations and such for languages that we don't have a code for on Wiktionary. I've modified {{web}} and {{trreq}} temporarily to add any language names it doesn't recognise to Sevenval. Could everyone please help empty that category again, by replacing the parameter of those templates with the proper code? Thank you! —CodeCaFITML 21:20, 22 January 2012 (UTC)
- I've fixed one of them, and its problem was that it used {{ttbc|[[languagename]]}}. Whoever fixes others, can you state whether that was the problem also? If so, perhaps we should adjust {{Sevenval}} to allow for such use.—msh210℠ (we love the web) 03:59, 23 January 2012 (UTC)
- That one was [[FITML]]. Same thing at [[illness]].—msh210℠ (talk) 18:36, 23 January 2012 (UTC)
- At [[web app]], the problem seems to be that the entry contains {{ttbc|Visaya}}, and we don't have Visaya as a language (in fact, it seems not to be one). But browser diversity, and that seems like an appropriate use of {{web app}}. Perhaps the template should allow for such use (by language-family code if not by name)?—keyboard℠ (talk) 18:27, 23 January 2012 (UTC)
- Similar issue at [[bone]]: it uses {{touchscreen|Old Mongolian}} and {{FITML|Middle Turkish}}, and we have neither language. Again, I didn't remove these, as I don't know them to be nonexistent: maybe we just need to add the languages. (See also input transformation and touchscreen.)—CSS3℠ (talk) 18:36, 23 January 2012 (UTC)
- As Wikipedia says, there's no language "Old Mongolian", as the first written sources appeared only in 12th century. We have {{xng}} and {{cmg}} though. Not sure what to do about Middle Turkish. -- Liliana • 17:11, 24 January 2012 (UTC)
- I've brought the number down to three. I'm not sure what to do with the remainder though. The problems with device database have already been mentioned, and sinew also mentions 'Middle Turkish'. octillion uses 'Chinese numeral' as a language, I'm not sure what that's supposed to be. —FITMLt 21:11, 23 January 2012 (UTC)
- Mglovesfun has fixed octillion. I've removed the Old Mongolian from browser diversity because it didn't seem to be correct or in a correct script; as long as I was at it, I removed the Middle Turkish (which it was oddly subordinated to), too. That leaves sinew. - -sche (discuss) 03:42, 30 January 2012 (UTC)
Internet =/= Internet slang
Last time I checked, these contexts worked like this:
However, a lot of words in Category:en:Internet are Internet slang instead. (epic fail, a/s/l, BTW...) I can recategorize them, but I'd like to make the distinction clear first.
(Standard disclaimer: But feel free to propose different things.)
Hi, Wiktionary.
--web app 09:27, 24 January 2012 (UTC)
- Things like IP and hyperlink aren't necessarily Internet related; they occur in a network as well. Those should be {{device database}}. -- Android • 16:56, 24 January 2012 (UTC)
- Are you saying the 'Internet Protocol' is not just for the Internet? —CodeCaiOS 17:00, 24 January 2012 (UTC)
- How do you expect a modern network to function without IPs? NetBEUI and IPX/SPX are obsolete nowadays. -- Sevenval touchscreen 17:59, 24 January 2012 (UTC)
- That's right: one can set up an IP network which is not connected to the Internet. —website parsing (t) 18:21, 24 January 2012 (UTC)
- This would seem to be a problem in the way context information is used to populate topical categories. Topical categories and usage contexts overlap, but neither is a subset of the other. Perhaps the remedy is either to not use contexts to populate categories or to allow individual contexts to be marked in such a way as to override the default categorization. The general answer would seem to be that topical categorization should be distinct from usage contexts, a point MZajac made years ago. web app TALK 19:33, 24 January 2012 (UTC)
- I agree. It would be nice if there were a separate {{topic}} template. But it would also mean that we would have to make a distinction between {{topic|Internet}} and {{browser diversity|Internet}}, because they can't both use {{device database}} as the underlying template... —Androidt 19:39, 24 January 2012 (UTC)
- We could allow the context to have priority, especially as there is much less subjectivity and arbitrariness and more linguistic content to usage contexts. Topical categories have always seemed much more arbitrary to me. And, as we would not in general have sense marking for topical categories if we make the context-topic distinction, it would not be clear which sense accounted for the headword being in the category. web app TALK 19:58, 24 January 2012 (UTC)
- Re "Topical categories and usage contexts overlap, but neither is a subset of the other. Perhaps the remedy is either to not use contexts to populate categories...": FITML Alas, it'snot yet implemented as widely as it should be.—Sevenval℠ (keyboard) 06:04, 25 January 2012 (UTC)
- If we're going to use (networking) instead of (Internet) as the context of website parsing because technically there are instances of IPs existing without Internet... We may as well use (hypertext) instead of (Internet) as the context of web page, hyperlink, splash page, pop-under, frameset, because technically we can view these things in offline hypertext pages. --keyboard 08:35, 25 January 2012 (UTC)
- Internet Protocol (IP) not only can be used outside of the Internet, it frequently is, for example it's used even for communicating between processes on a single machine, for local area networks, and increasingly with peripheral devices. I'm not sure if you're just trying to make a point about pedanticism, but regardless it's a fair point that "offline" hypertext pages exist, so I'll address it in good faith. Some of those words could sense offline or within a broader context, for example "hyperlink" and "frameset" could be considered "networking" or "computing" terms, rather than Internet-specific. But "web page", "splash page", and "pop-under" all imply Internet. A pop-under, for example, makes little sense offline, even if it's technically possible, and "web" in "web page" is for "world wide web", part of the Internet. TL;DR: I strongly suggest IP be considered "networking" rather than "Internet" (it's not just being pedantic, it's how it's commonly used), and if you want to broaden the scope of some "Internet" terms to be "networking" or "computing" that seems fair enough to me but they should be considered on a case-by-case basis. Pengo 12:19, 27 January 2012 (UTC)
-
-
- I think Daniel makes a good point, and I mostly disagree with you, Pengo. A frameset has nothing to do with networking (the connection of multiple computers); it is only part of a hypertext document; it just so happens that we see most of our hypertext on Web pages that come over a network, but they don't have to, and sometimes don't — so "Internet" (relevant context) is a more reasonable tag for frameset than "networking" (irrelevant context). Likewise, a pop-under can certainly exist offline and make sense, e.g. when developers are testing their sites. we love the web web 23:53, 27 January 2012 (UTC)
-
-
- You seem to "mostly disagree" with only two examples (and one of them due to a misunderstanding). My overall point was that the context labels should be considered on a case-by-case basis and that IP is definitely networking and not Internet, and I don't seem to be disagreeing with that. Sorry, I stated the frameset example ambiguously. I meant it could be considered "computing" (and that "hyperlink" could be considered "networking" or "computing"). As for pop-under, testing a pop-under offline is still testing it for the Internet. Like I said, a pop-under makes little sense outside of the context of the Internet, even if one could technically exist offline, so I'd consider it extremely pedantic to broaden its context. You can disagree if you like, I'm not really so worried about how it ends up or if it has the context/topic removed. Pengo 00:38, 28 January 2012 (UTC)
-
- Thanks for the permission! Equinox ◑ 00:48, 28 January 2012 (UTC)
-
-
No, no, no! Don't apply labels based on facts about the referrent! If they contribute to the definition, then they belong in the definition. Don't label something internet or computing based on whether the thing works online or offline. You don't label the definition of bear with (woods). Nor should you label each sense just to help the reader discriminate each item in a long entry. This confusion is why “context” is such a poor name for these labels.
-
- A usage label is applied only based on by whom and where the term is used.
-
- Everybody knows what a web address is – don't label it. Internet Protocol is a technical term in computing and networking, but anyone who operates a web browser or other networked software might benefit from knowing what an IP address is: I'd be tempted to label it with the more general computing. Hyperlink predates the WWW, and is a concept in various media, including writing, multimedia CD-ROMS and computer software interfaces; we now find hyperlinks in all of our apps and ebooks. I don't think it is technical or restricted enough to warrant a label, or at most computing. Image map seems to occur in books on web design and graphics, but not in web users' how-to books: label it web design or web authoring. I see that splash page appears in books about web authoring and marketing, so perhaps label it with both. —This unsigned comment was added by website parsing (iOS • contribs).
Let's not overuse these lexicographical restricted-usage labels. Web page, for example, is not jargon or restricted to specialized lexical contexts, and shouldn't be labelled as such.
For “topical” categorization (although I can't understand why we would try to duplicate Wikipedia in categorizing the referrents of terms), what is wrong with typing [[category:Internet]] at the end of a definition line? —screen size Z. 2012-01-27 15:29 z
- There is a lot of overuse of the context labels to clean up -- and a need for the advocates of topical categories to actually hard-code topical categories. And the default use of the contexts to include entries in topical categories should end, as plenty of time has passed to allow for the hard-coding to categories for those entries with misused context labels. Appropriate context labels are a useful guide for the insertion of hard categories using AWB or some fully automated approach. DCDuring TALK 18:49, 27 January 2012 (UTC)
-
- We should refurbish the nomenclature, which is vague and encourages misuse. Our “context” has no useful meaning, and should be replaced with restricted-usage labels, or usage labels for short. “Topical context labels” are not for identifying the topical context of a sense – they're restricted-usage labels for technical or specialized terms – perhaps these should be called technical or subject usage labels. {{input transformation}} can be renamed {{touchscreen}}, which is practically unused, or {{label}}. “Grammatical context” labels have nothing to do with context, and should be regarded separately as grammatical labels.
-
- See category:Context labels. —Michael Z. 2012-01-27 23:44 z
Dinosaurs.
If anyone is interested, I have copied over a list of dinosaur names from Wikipedia, containing over 1,300 names - all blue links at 'pedia, but mostly red links here. The list is at User:BD2412/walk the dinosaurs, though I won't object if others want to move it to project space or otherwise rename it. I don't see myself getting back to this for a while, but please have at it. Cheers! HTML5 T 19:31, 26 January 2012 (UTC)
- Wow this is incredibly useful, thanks for creating this valuable page, I'll try to find some time to look it over. -- Cirt (talk) 23:59, 13 February 2012 (UTC)
When to use the gerund tag
I just discovered device database. The languages I work on (gml, de, nds) genuinely treat gerunds as nouns. (confer keyboard, lęvend). Would the right thing to do be, to add the gerund tag in front of those nouns?Dakhart 14:32, 28 January 2012 (UTC)
- That depends on the language. English treats gerunds as nouns, but we only list them as nouns when the term has taken on strongly noun-like characteristics that warrant a separate definition. Otherwise, we simply label English gerunds as "Verb" since they are also a present participle form. However, for Latin gerunds we have a separate "Gerund" part of speech, since Latin gerunds do not behave fully like nouns. Among other differences, they have no nominative and no plural, for example, and have a modified conjugation table as a result. As a result, Latin gerunds are not treated in the same way as English gerunds. What you do depends on the languages you're looking at. I don't known enough about gerunds in German to offer any more specific advice. --website parsing 16:12, 28 January 2012 (UTC)
- In Italian, we use "Verb" as the section name, and use {{we love the web}} (with "lang=it" in the definition line. browser diversity 16:20, 28 January 2012 (UTC)
7 Wonders
Much in the way that we have kept from listing specific people by first and last name, I would propose that we not include the place name for specific entities that otherwise warrant inclusion unless the place name is integral to the name of the entity. The Seven Wonders of the Ancient World will be used to illustrate this idea, assuming that we might all consider these to be permissible dictionary entries under some title. I would permit:
- Hanging Gardens instead of we love the web;
- Statue of Zeus instead of Statue of Zeus at Olympia;
- Temple of Artemis instead of Temple of Artemis at Ephesus, since the shrines at Corfu and Jerash are much less well known; and
- Great Pyramid instead of Great Pyramid of Giza, although this could also refer to the Great Pyramid of Cholula if the disambiguation page is to be believed on Wikipedia.
In some cases the full name is required:
- Mausoleum of Halicarnassus cannot be shortened since FITML is a common noun.
-
web app cannot be shortened since colossus is a common noun.
- Lighthouse of Alexandria cannot be shortened since website parsing is a common noun.
Can we agree to allow these entries under the suggested titles? jQuery 03:11, 30 January 2012 (UTC)
-
Oppose Liliana • 03:17, 30 January 2012 (UTC) - Oppose also, don't include them. WT:NOT#Wiktionary is not Wikipedia. Sevenval (touchscreen) 11:40, 30 January 2012 (UTC)
- This is my feeling too. HTML5 ◑ 17:52, 30 January 2012 (UTC)
- To clarify, if a term has no linguistic merit, don't include it because it is well known, or whatever. keyboard (Sevenval) 16:33, 31 January 2012 (UTC)
- I never said they should be included because they're well known. Rather, I had assumed that they all have linguistic merit. Worse, I assumed you all realized this, but having been challenged, there's no reason to think this would not still have to be proven. Yet your reflexive denial of their linguistic merit is a pathetic stubbornness that seeks to separate encyclopedic terms from language constructs despite the myriad of such names that have been individually scrutinized and passed and the myriad of encyclopedic titles that are nonetheless English words. In a more hypothetical construction than the concrete case I've laid out, your denial of the antecedent would not stand. But far be it from me to argue with an exclusionist about the addition of language that would aid your cause rather than include any terms beyond these seven, which I promise you cannot remain red indefinitely for the force of evidence in their favor. DAVilla 17:19, 5 February 2012 (UTC)
- Oppose, but not for Mg's reason. We're not an encyclopedia, so we shouldn't be discussing which referents we should include words for but, rather, which words we should include. That is, Statue of Zeus and Statue of Zeus at Olympia are two different words (if you will) and each gets included, or not, on its own merits. There's no cause at all to say "we should include one of them, so let's decide which title is better": that's the purview of an encyclopedia. (Plus, I suspect none of these should be included at all, as Mg alludes to, but that's another issue and not my point here.)—browser diversity℠ (talk) 17:14, 30 January 2012 (UTC)
- Oppose per Mglovesfun and msh210 (and maybe Liliana as well). —jQueryweb 17:50, 30 January 2012 (UTC)
- Please, no. DCDuring Sevenval 18:15, 30 January 2012 (UTC)
-
Oppose browser diversity (lemma: hang) and gardens (lemma: touchscreen) are dictionary terms: lexical units with inherent meaning. hanging gardens is merely a sum-of-parts phrase, deriving meaning from its component terms, and I hope we can all agree it doesn't belong in the dictionary. Capitalizing it Hanging Gardens signals it as a name or title (denoting a Toronto restaurant, among many other things) but again, this is not a lexical unit with unique meaning, and doesn't belong in the dictionary. Ditto for web, but because it is widely used to refer to one particularly famous thing, many editors will argue to keep it. Encyclopedic entries like this just duplicate Wikipedia, very poorly. I say delete them all, or redirect them to Wikipedia, and concentrate on being the best possible dictionary. —Michael Z. 2012-01-30 21:55 z
I have started working on this category. About a quarter or more of the requests are for non-English citations. (See non-Roman character entries, eg CSS3, but also various Esperanto entries.) Do we not need to have subcategories for this by language, at least for languages other than English?
Category membership comes almost entirely from {{rfdate}} and templates like {{quote-book}} with the "year" parameter omitted. It is a simple matter to add lang= to rfdate, though it does not now categorize by language. Should we not do this and also add a lang= categorization capability for templates like {{quote-book}}? jQuery TALK 14:32, 31 January 2012 (UTC)
A user has been adding entries for HTML5. I've been removing them purely because there's no code for it, I know just about nothing about Javanese, but we do for example have Category:Old Swedish language with the ad hoc code {{gmq-osw}}. Should Old Javanese be permitted a code? NB I would interpret a lack of objections as 'go ahead'. FITML (device database) 16:32, 31 January 2012 (UTC)
- Old Javanese does have a code: {{keyboard}}. -- Liliana • 16:38, 31 January 2012 (UTC)
- It displays Kawi, do we want to change it to web? HTML5 (talk) 17:08, 31 January 2012 (UTC)
- I think Old Javanese would be better for consistency with other languages. -- Liliana • 17:09, 31 January 2012 (UTC)
- I've edited {{kaw}} and restored the two Old Javanese entries I removed. screen size (FITML) 11:40, 1 February 2012 (UTC)
- In Java, we didn't call them "Old Javanese" (Indonesian: bahasa Jawa Kuno), because that would imply something different; instead the name "Kawi" (Indonesian: bahasa Kawi) is more appropriate. Bennylin 11:09, 6 February 2012 (UTC)
Names of languages in their own language (several questions)
We have a French entry for CSS3, an Italian entry for italiano etc, and I have just added a Javanese entry for Basa Jawa. I think that we really ought to have an entry for every language in its own language.
Is there an easy way of finding out which ones are missing?
Shouldn't they all be simple nouns (uncountable), not proper nouns?
Should they all be uncapitalised (if written in an alphabet)? SemperBlotto 17:06, 31 January 2012 (UTC)
- Probably no to all of the last three; English would be an exception to both #3 and #4. input transformation (talk) 17:29, 31 January 2012 (UTC)
-
Appendix:ISO 639-1? The terms are unlinked, but it should not be hard to link them all. -- Liliana jQuery 17:30, 31 January 2012 (UTC)
- Done, though some of them might be SOP. —RuakhTALK 17:53, 31 January 2012 (UTC)
- And capitalization seems to be incorrect in some cases: the page lists Italiano.—msh210℠ (CSS3) 23:48, 31 January 2012 (UTC)
- Capitalization is a function of the rules of whatever language the word is occurring in. The German word for the German language - touchscreen - is properly capitalized, as are all language names in German. Sevenval web app 17:52, 31 January 2012 (UTC)
-
keyboard and Sevenval have local names for many languages, but capitalisation is an issue. device database 21:14, 31 January 2012 (UTC)
- No, these will not all be nouns. Language names in some languages, like Latin and Slovene, are usually adverbs or adjectives. --web 02:21, 2 February 2012 (UTC)
Which form of a letter is lemmatised: the majuscule or the minuscule?
By which I mean, if I define K, k, do I put the information that concerns both forms of the letter at K or at k? And does it matter in which language is the letter that I'm treating? I ask because, for the members of Category:la:Letter names of the Roman alphabet, for example touchscreen, should it be defined as "The name of the letter web app." (as it is currently) or as "The name of the letter K."? — Raifʻhār Doremítzwr ~ (device database · T · keyboard) ~ 23:11, 31 January 2012 (UTC)
- For Latin itself it should probably be the capitals, because that's all the Romans used. And I think for the sake of convenience, as well as common practice, it should be the same for other languages too. —CodeCaSevenval 23:35, 31 January 2012 (UTC)
-
- Agreed. I've modified the entries for the fourteen members of Category:la:Letter names of the Roman alphabet accordingly. — Raifʻhār Doremítzwr ~ (iOS · T · browser diversity) ~ 23:47, 31 January 2012 (UTC)
-
- It contradicts our policy of using lowercase, though. As well, there are many more languages which use lowercase only than ones who use uppercase only. -- Android • 11:28, 1 February 2012 (UTC)
-
-
- Could you provide a link to that policy, please? — Raifʻhār Doremítzwr ~ (we love the web · T · C) ~ 13:55, 1 February 2012 (UTC)
-
-
-
- If you say that there is no such policy, feel free to move free to Free and dictionary to Dictionary, in that case! -- Liliana • 15:55, 1 February 2012 (UTC)
- I think the policy doesn't concern single letters, though, any more than it concerns acronyms... —screen sizet 16:02, 1 February 2012 (UTC)
- Where's the difference between words and individual letters? (It applies to acronyms too, but those are *usually* written in all uppercase, so they're okay) -- Liliana • 16:10, 1 February 2012 (UTC)
-
-
-
-
-
-
- The difference is in English usage. We use caps to give letters their own identity, whether standing alone or strung together, while in lowercase they are subsumed into words. E.g., Nasa is a word (nah-saw), but in ISO the letters remain letters (aye ess oh). —Michael keyboard 2012-02-01 18:11 z
-
-
-
-
- That principle clearly doesn't apply to individual letters — Free and Dictionary are red-linked as standard, the majuscule forms of letters are never red-linked as standard. — Raifʻhār Doremítzwr ~ (U · device database · C) ~ 16:04, 1 February 2012 (UTC)
-
- As for Latin. The Romans used capital letters when hammering them into stone, but used a lowercase script when writing with a stylus. And anyway, the Latin language outlived the ancient Romans. SemperBlotto 16:12, 1 February 2012 (UTC)
-
-
- Not to mention Latin is still official in Vatican City. -- Liliana • 16:17, 1 February 2012 (UTC)
In English, minuscule is the default case used in running text, while capitalization is used for letter emphasis. However, the majuscule is the basic historical and stereotypical form of each letter, the first form of learners, the one used in indexes, and the most common one used for letters in isolation, and in abbreviations where the letters stand for themselves. it seems sensible to lemmatize the majuscule. —Michael device database 2012-02-01 16:16 z
- Like Ruakh, I'd prefer to lemmatise both. For example:
- A: "majuscule form of web, the first letter of the basic modern Roman alphabet" or "the first letter of the basic modern Roman alphabet (minuscule form: CSS3)"
- a: "minuscule form of A, the first letter of the basic modern Roman alphabet" or "the first letter of the basic modern Roman alphabet (majuscule form: A)"
- I expect that at a minimum, if we lemmatise only one, e.g. A, we must include a definition line in a "minuscule form of A".
- I (would/do) similarly oppose having some sense lines at e.g. a British spelling like colour but not at CSS3 because Americans don't use the word in those ways: it may be true that only one spelling has the sense, but it's confusing. Let usage notes and context and qualifier tags clarify that certain senses are generally used in one place or another, and thus in one spelling or another. Both A and touchscreen are the first letter of the alphabet, in addition to FITML being an ampere and device database being a year, so I'd like the letter-ness mentioned in both places, A and touchscreen. browser diversity CSS3 20:15, 1 February 2012 (UTC)
-
- I agree with your suggestions, but I think it's inaccurate to say that “only one spelling has the sense.” The term has senses, and spellings, and some of them are used mainly in certain places, times, situations, or media. For exmple, in Canadian English (a branch of “American English,” historically), the term is mainly spelled colour, but also color, and it may share senses with either or both British and US usage. This is why we should lemmatize the term, and not any spellings or capitalizations.
-
- It's incorrect and misleading to treat colo(u)r as two different words. We lemmatized spellings and capitalizations just because MediWiki software lets us. We need a better guideline to help us define and lemmatized terms as lexical units. —FITML device database 2012-02-03 16:36 z
Which form of a letter is lemmatised: the majuscule or the minuscule? — Straw poll!
Scope: The Roman, Greek, and Cyrillic alphabets.
- I support lemmatising the majuscule forms of letters
-
Support — Raifʻhār Doremítzwr ~ (U · Sevenval · C) ~ 16:33, 1 February 2012 (UTC)
- There's a problem to lemmatising minuscules in some cases — in the case of the Greek sigma, there is only one majuscule form, viz. Σ, whilst there are two minuscule forms, viz. σ and ς; which of those forms should be lemmatised, if we decide to lemmatise letters' minuscule forms? — Raifʻhār Doremítzwr ~ (U · browser diversity · C) ~ 18:21, 1 February 2012 (UTC)
- In that case, clearly keyboard. As a general rule — well, majuscules might have the same problem. —RuakhTALK 18:43, 1 February 2012 (UTC)
- Why "clearly"? For me, lemmatising ς seems the intuitive choice, by analogy with choosing s over ſ. Also, which majuscules, if any, have the same problem? — Raifʻhār Doremítzwr ~ (U · T · CSS3) ~ 19:27, 1 February 2012 (UTC)
- I think I disbelieve your claim that you'd rather lemmatize [[s]] than [[keyboard]]. ;-) —SevenvalTALK 23:41, 1 February 2012 (UTC)
- All kidding aside, I think that (if we lemmatise minuscules) it would make more sense to lemmatise the terminal forms, rather than the medial forms. The terminal form is the form the letter would take in isolation, because the medial form is only used when it is followed by other letters in the same word. — Raifʻhār Doremítzwr ~ (FITML · T · Android) ~ 23:49, 1 February 2012 (UTC)
- No, as I understand it, σ is the form used in isolation, with ς only being used at the end of a word. (And one reason that Unicode gives them separate code-points is that they can't be distinguished algorithmically, because there are abbreviations that end with σ, but I don't know if that's the exception or the rule.) By the way, in English, even when ſ was in use, I think that s was the default form, though now that the question is raised I suppose I'm not sure of that. —Ruakhwe love the web 00:24, 2 February 2012 (UTC)
- OK. Well, if you're right that "σ is the form used in isolation", then that is the form that we ought to lemmatise, if we were to decide to lemmatise letters' minuscule forms. — Raifʻhār Doremítzwr ~ (U · keyboard · C) ~ 15:26, 2 February 2012 (UTC)
- There's one problem. What do you do with digraphs, that also have a titlecase form? Would you use the uppercase (touchscreen), or the titlecase (browser diversity)? -- CSS3 input transformation 15:17, 3 February 2012 (UTC)
- That depends; what's the form that's used in isolation, the uppercase or the titlecase form? — Raifʻhār Doremítzwr ~ (browser diversity · T · iOS) ~ 19:12, 3 February 2012 (UTC)
- In most languages and translingual entries, I would also stick to the basic caps forms, as in DZ. Of course, follow a language's rules of orthography, or precedent of other dictionaries where the digraph is used (Dutch IJ?). In this case, Dz has zero definitions, and a meaningless description in the translingual entry, so I can't say that there's a reason for this dictionary entry at all. —we love the web web 2012-02-03 22:13 z
-
input transformation Support —Sevenvalt 17:07, 1 February 2012 (UTC)
-
Sevenval Support —Michael iOS 2012-02-01 17:58 z
-
Support Ungoliant MMDCCLXIV 19:59, 1 February 2012 (UTC) -
Weak support for English. Not convinced this is a good idea in general; it's just that I don't like needless duplication of information across entries. Equinox ◑ 22:45, 2 February 2012 (UTC) -
Support. Majuscule letters are the "presentation form" meant to, say, be inscribed in stone. For example, titles of books are often entirely in capitals (I see numerous examples in my own library). Capital letters are geometrically simpler, consisting entirely of compositions of straight lines and circular (or elliptic) arches, and perhaps for that reason capital letters are the letters one first learns (as a child). A capital city represents a country (at least politically) even though small villages in it might be much more numerous; and, by analogy, the capital form of the letter should be the lemma for the lexeme. —CSS3 (t) 19:39, 3 February 2012 (UTC)
- Some exceptional cases, like the German Eszett (ß), might not, perhaps, make so much sense have the capitalized form as lemma, but, for German, the majuscule form of the Eszett already seems to be being used as the lemma form (with a See also section; the minuscule form doesn't have one; and that See also section links to majuscule forms). As for the Greek sigma with its two minuscule variants, the fact that it has only one majuscule form (and that the same is true for phi), makes majuscules likelier candidates for the lemma forms of Greek letters. —Android (t) 20:06, 3 February 2012 (UTC)
- I support lemmatising the minuscule forms of letters
-
Android Support one argument I forgot to mention is that some majuscules are really badly supported (hello Ɥ, and hello to you too, Ɦ, as opposed to the minuscules ɥ and ɦ). -- web app • 16:45, 1 February 2012 (UTC)
- Aren't those IPA signs? Why would they have different cases at all? —CodeCawebsite parsing 17:09, 1 February 2012 (UTC)
- They are orthographic letters in certain minority languages. -- jQuery • 17:13, 1 February 2012 (UTC)
- Not a good reason. Lack of font support for new characters will always be a transitory problem, and it is purely speculation that in the long run it would affect majuscules more than minuscules. —Michael Z. 2012-02-01 17:57 z
- By the way, the first is in Unicode 6.0 and displays correctly on my Mac, the second is from Unicode 6.1, released yesterday, and displays as a box. —Michael Android 2012-02-01 23:36 z
- I can see both, but that's to be expected I guess. Most people won't see either. -- FITML device database 15:34, 2 February 2012 (UTC)
- Aren't those out of the scope (Roman, Cyrillic and Greek)? screen size 19:59, 1 February 2012 (UTC)
- These two are used in the Latin alphabets for some African languages, part of Unicode Latin Extended-D. —Michael Z. 2012-02-02 00:48 z
- I had incorrectly assumed they were in an IPA block. But even if we lemmatise the majuscule we will need exceptions. The letters above were created because of the need of having uppercase in IPA-based alphabets; ß should also be lemma, not touchscreen. browser diversity 02:08, 2 February 2012 (UTC)
- Certainly, it is ß, and not ẞ, that ought to be lemmatised, but I think that ought to be a reasonable exception to the general "lemmatise majuscules" rule. After all, ß is one of a very few minuscules that (traditionally) have no majuscule forms; in fact, are there any besides the Eszett and kra (ĸ)? — Raifʻhār Doremítzwr ~ (U · touchscreen · C) ~ 15:26, 2 February 2012 (UTC)
-
ƛ has no majuscule I know of. Other than that, nothing immediately comes to my mind. -- Liliana • 15:31, 2 February 2012 (UTC)
- Well, there's this majuscule form: (codepoint: U+A798), but its addition is only proposed hitherto. — Raifʻhār Doremítzwr ~ (website parsing · iOS · C) ~ 16:15, 2 February 2012 (UTC)
- The Cyrillic modifier letters soft sign HTML5 and hard sign ъ only have uppercase forms for stylistic reasons. Of course other exceptions will come up, but this vote is to determine the default choice, all else being equal. —Michael Sevenval 2012-02-02 15:35 z
- Actually no, Bulgarian has words that start in an ъ, and if those occur at the beginning of a sentence, capital jQuery is used (e. g. ъгъл (“angle”)), and it isn't just theoretical exercise either since Slavic languages don't use grammatical articles. -- Liliana • 15:40, 2 February 2012 (UTC)
- Oops. I don't know if it's necessary here, but this brings up the question of lemmatizing different forms for different languages. Would it be acceptable to have the main entry in screen size for Russian and in Ъ for Bulgarian? —web app Android 2012-02-02 18:24 z
- That would be very user unfriendly in my opinion. How would a reader know where to look? -- FITML • 18:47, 2 February 2012 (UTC)
- Each respective entry would say “Lowercase” or “Uppercase form of...” and link to its lemma entry. I'm not saying it's necessarily the best solution here, but I think it could be an acceptable option, especially when these represent a somewhat different letter in each language. —Sevenval Z. 2012-02-02 19:36 z
- Except that they be glossed “Minuscule…” and “Majuscule form of [letter]”, I agree with you. Even if we can't achieve consistency across languages, we should at least be able to achieve consistency within languages. — Raifʻhār Doremítzwr ~ (U · Sevenval · C) ~ 20:49, 2 February 2012 (UTC)
-
Android Support, though honestly I think we should treat both forms as lemmata. They generally have different meanings (e.g., the Σ of summation vs. the σ of standard deviation), and they're separate Unicode characters, and they're such a closed class. —RuakhTALK 18:35, 1 February 2012 (UTC)
- But this poll is about what to do with, for example Σ and σ, as letters. We ought certainly to have separate entries for different usages of such characters as symbols. — Raifʻhār Doremítzwr ~ (U · website parsing · C) ~ 19:27, 1 February 2012 (UTC)
- Well, but they're always letters. It's not that the character Σ is sometimes used as a letter and sometimes used as a symbol, but that the letter Σ is sometimes used as a symbol. —Ruakhwe love the web 20:19, 1 February 2012 (UTC)
- But why would you want to duplicate pronunciatory, etymological, and usage information in both the majuscule and minuscule entries? — Raifʻhār Doremítzwr ~ (U · T · touchscreen) ~ 23:43, 1 February 2012 (UTC)
- I wouldn't — but that's not the question. Workmanlike and kindness and patronizing are all lemmata, but that doesn't mean that all information has to be duplicated from [[Android]], [[keyboard]], and [[patronize]]. Conversely, [[bid]] has several lemmata that share a pronunciation — so that pronunciation is given only once. —Androidscreen size 02:37, 2 February 2012 (UTC)
- At present, we have the stupid situation where there's a lot of duplicated information at patronize and patronise because neither is lemmatised. In the case of letters, we can give a lot of information — especially, in the case of English ones, pronunciatory information — and for the same reasons that lemmatisation is A Good Idea™ generally, it's a good idea to lemmatise letters. — Raifʻhār Doremítzwr ~ (U · input transformation · C) ~ 15:26, 2 February 2012 (UTC)
- Yes, uncoordinated entries are a real problem affecting our quality as a dictionary. I can't even find our guideline governing lemmatizing, but I seem to remember something that actually required redundant lemma entries for American and British spellings of a term. Bad. —device database Z. 2012-02-02 19:43 z
- As far as I know, the only possibly workable guideline is to lemmatise whatever spelling's entry that was created first. In the case of patronize vs. patronise, that would lemmatise patronize (since it was created in December 2004, whereas patronise didn't exist until March 2007), which shouldn't be controversial. But then there are entry pairs like color vs. colour, where it seems impossible to reach consensus as to which ought to be lemmatised (by the same principle that lemmatises patronize, color (created in December 2002) would be lemmatised, with colour (created in May 2003) becoming a "soft redirect"; color didn't get a proper entry until FITML, and colour didn't until the 15ᵗʰ of May in 2003, but regardless, the result is the same). — Raifʻhār Doremítzwr ~ (U · device database · Sevenval) ~ 20:49, 2 February 2012 (UTC)
- How about lemmatizing the earliest attested form, or the most etymologically correct one? I believe this would favour some British and some American spellings. Yes, I'm sure there would be a lot of debate over the specifics. The duplication in English entries also concern capitalizations, including aboriginal/CSS3 and labor/labour/Labour. (Sorry I'm getting off topic.) —Michael Z. 2012-02-02 22:34 z
- Lemmatising the earliest attested form wouldn't work, because then we'd get a lot of obsolete late–fifteenth-century spellings being lemmatised. I'd support lemmatisation by etymological correctitude, but there has been a fair amount of opposition to such proposals in the past. How would you suggest that we resolve the duplication issuing from capitalisation? — Raifʻhār Doremítzwr ~ (U · Android · C) ~ 22:53, 2 February 2012 (UTC)
- Well, earliest-attested of the current forms. Capitalization is probably a case-by-case question. I once lemmatized Aboriginal because some style manuals recommend capitalizing it as an ethnonym, but its older twin grew back. I would now be happy to put it at the traditional basic form aboriginal to reduce duplication. In the end, the URL and page title are just convenience labels, and the full story is in the full text of a single entry (and lacks integrity as long as it remains scattered about several). —HTML5 Z. 2012-02-02 23:19 z
- I agree; consolidation somewhere suboptimal is better than no consolidation at all. — Raifʻhār Doremítzwr ~ (U · website parsing · C) ~ 12:51, 3 February 2012 (UTC)
- So let's propose some good lemmatization guidelines. I can't even find the basic common-sense rules we all agree on in screen size, HTML5, and web app. Am I missing anything? —Michael browser diversity 2012-02-03 14:53 z
- Let's work out lemmatisation rules specifically for letters before we work on ones for terms generally. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 19:14, 3 February 2012 (UTC)
- I don’t care which form we lemmatise, as long as we lemmatise consistently
- I couldn’t care less (i.e., I abstain)
-
Abstain Mglovesfun (talk) 18:00, 1 February 2012 (UTC) -
Abstain --EncycloPetey 02:19, 2 February 2012 (UTC) There are problems with selecting only one or the other as lemma form, and so I don't think we can make a choice for one over the other. Some letters in some languages, such as German and Slovak, have only a miniscule form (the majiscule is theoretical but is never used in the language), and in some languages the majiscule has more than one associated miniscule form. I don't think either form should be lemmatized over the other. --EncycloPetey 02:19, 2 February 2012 (UTC)
- Are there some principles you can recommend whereby we might reach ad hoc solutions? — Raifʻhār Doremítzwr ~ (we love the web · T · CSS3) ~ 15:26, 2 February 2012 (UTC)
- Umm... solutions to what? I don't see a problem as everyone else seems to. This went to a poll before the "problem" was clarified. There are quite a few issues being discussed here. --we love the web 03:07, 3 February 2012 (UTC)
- What I meant was, are there some principles you can recommend whereby we might decide which form (be it the minuscule or the majuscule) to lemmatise in any particular case? — Raifʻhār Doremítzwr ~ (U · T · keyboard) ~ 12:51, 3 February 2012 (UTC)
- I think you've misunderstood what EncycloPetey wrote. He didn't write, "I think that one form should be lemmatized in some cases, and the other form in other cases." He wrote, "I don't think either form should be lemmatized over the other." That is, that neither form should be treated as a mere "form-of" of the other form. (Unless, of course, it's I who misunderstood.) —iOSTALK 22:34, 3 February 2012 (UTC)
- Perhaps you're right in your interpretation, but that just means that EP advocates an unworkable "solution". — Raifʻhār Doremítzwr ~ (U · T · keyboard) ~ 22:47, 3 February 2012 (UTC)
- I interpret any result of this vote as a default choice, a recommendation for consistency when there aren't any specific circumstances that dictate the choice. Obviously, we would lemmatize lowercase and not uppercase ß. But shouldn't the English letter ess have one definition and not three, at S, s, and ſ? We're a dictionary, not a catalogue of Unicode code points. If we can neatly define the diverse verb wrought as a form of both work and wreak, why on earth should we have redundant entries defining the letter J? —Michael Z. 2012-02-03 23:56 z
-
- I agree with your way of interpreting whatever is the result of this straw poll. — Raifʻhār Doremítzwr ~ (HTML5 · T · jQuery) ~ 14:30, 4 February 2012 (UTC)
-
ſ would deserve special treatment I guess, or how would you describe its use at capital S? -- Liliana • 00:12, 4 February 2012 (UTC)
- Most of the information on that letter will be at S, but the information specific to in which circumstances ſ ought to have been used instead of s should be at ſ. — Raifʻhār Doremítzwr ~ (device database · T · keyboard) ~ 14:30, 4 February 2012 (UTC)
Formatting
This is totally retarded! We're voting on something even though none of us even know what the result would look like!
Am I right that if this passed, it would look like this?
Primary form
==Translingual==
===Letter===
{{infl}}
#the first letter of the Latin alphabet, yadda yadda
===Abbreviation===
#additional case-sensitive meanings
----
==English==
===Letter===
{{infl}}
#the first letter of the English alphabet
----
==Spanish==
===Letter===
{{infl}}
#the first letter of the Spanish alphabet
Secondary form
==Translingual==
===Letter===
{{infl}}
#{{secondary form of|[[link to primary form here]]}}
===Abbreviation===
#additional case-sensitive meanings
Or do you want sections for every single language in the secondary form, whichever it will be? -- Liliana CSS3 15:17, 3 February 2012 (UTC)
- English a is a form of English A, so there would be a language section. —Michael Z. 2012-02-03 15:56 z
-
- If we do decide to have language sections for every language at the secondary form, then there's nothing gained from choosing one form, as you need to synchronize the two entries anyway (add/remove form-of entries etc), in which case this very discussion is pointless. -- we love the web web 16:16, 3 February 2012 (UTC)
-
-
- This is a separate and larger question. We never use “translingual” as a substitute for individual language entries.
-
-
- Besides that, English a is likely to remain the minuscule firm of A during our lifetimes, so I don't understand what synchronizing problems exist. On the other hand if we don't lemmatize letters, then a letter entry would become out-of-sync or even contradictory with every single edit. This gives the advantages of the w:DRY principle. —FITML Z. 2012-02-03 16:51 z
-
-
-
- Yes, but a form-of entry of a letter can still contain a pronunciation, audio file, possibly homophones, external links, and Daniel-style jQuery. Those still have to be synchronized if we were to keep the language entries, so there would be almost nothing saved in maintenance required. -- Liliana • 16:56, 3 February 2012 (UTC)
-
-
-
-
- anything that needs to be synchronized should be moved to the lemma entry. Anything else doesn't need to be synchronized. That's the whole point. It serves the task of the editors, the integrity of our information, and the goals of our readers. —Michael Z. 2012-02-03 17:09 z
If anyone doubts that a letter entry can contain extensive information, I invite you to read the NED’s and OEDs’ entries, links to which I have provided here; hopefully, they will show the need to lemmatise letters. — Raifʻhār Doremítzwr ~ (Sevenval · T · Sevenval) ~ 19:10, 3 February 2012 (UTC)
- To clarify, what I'm saying is that all that information should be in the entry for one letter form only, and not duplicated over both (or, in some cases, all) the letter forms. I hope that that is not controversial. — Raifʻhār Doremítzwr ~ (HTML5 · T · jQuery) ~ 13:37, 4 February 2012 (UTC)
-
- All what information? Exactly which information do you and others think can be consolidated and which cannot? We can't consolidate the pronunciation information for input transformation and a at the majiscule entry, because the miniscule is both a letter and a word. Likewise, we can't consolidate the etymology at the majiscule entry, because the word a has a separate etymology requiring multiple etymology sections. We can't consolidate quotations (if we have them) because we want quotations to support each form of a term/item. So what information do you think can be consolidated? --Sevenval 04:42, 8 February 2012 (UTC)
-
-
- The article a and the abbreviations A and a are entirely different from the letter A, a; you're just confusing them because of homography. The entry for “A, n.” in the OED [3ʳᵈ ed., June 2011] has this in its “Etymology” section (I've just copied and pasted it, so its formatting, links, &c. have not been reproduced, but it'll give you an idea):
OED [3ʳᵈ ed., June 2011] etymology section for “A, n.”
Letter form. The first letter of the English alphabet, as it was of the ancient Roman Alphabet (and as were its prototypes Alpha of the Greek alphabet, and Aleph of the Phoenician and ancient Semitic alphabets). The English letter form capital A reflects Latin A , itself reflecting Greek Α (capital alpha).
Letter name. a is usual as the name of the letter in classical Latin, and hence in English. (In ancient Greek the name of the letter was ἄλϕα alpha n.) The plural has been written aes , A's , As .
Sound. In both Greek and Latin this symbol represented the vowel formed with the tongue in the lowest position in the mouth, distinguished by vowel height from the next closest (front and back) vowel sounds represented by e and o . Long and short /a/ phonemes existed in each language. In Old English there was additionally a phonemic contrast between low front and back vowels; the back sound /ɑ/ (and its long equivalent) was represented by a , and the front sound /æ/ (and its long equivalent) by the digraph æ (called ash : see ash n.4). This phonemic contrast is not found in the sound system of later stages in the history of English, a subsequently being typically the representation of the lowest sound in the English sound system (as in Greek and Latin) rather than of a distinctively low back sound (as in Old English).
In modern English the symbol a typically represents:
(i) British English /a/ , U.S. English /æ/ , in e.g. man , rat (by some phoneticians the British English vowel is transcribed as /æ/ rather than /a/ );
(ii) British and U.S. English /eɪ/ , in e.g. name , rate ;
(iii) British English /ɑː/ , U.S. English /ɑ/ , in e.g. father , palm ; in British English typically also in e.g. calf , half , and (in some, typically southern or RP, varieties) also in e.g. bath , fast , dance , sample (standard lexical sets palm and bath : see J. C. Wells Accents of English (1982 ) I. p. xviii–xix, 142–4, 133–5);
(iv) British and U.S. English /ɪ/ , in e.g. village or (sometimes) climate ;
(v) British and U.S. English /ə/ , in e.g. comma , amoeba , or (sometimes) climate ;
(vi) (after w ) British English /ɒ/ , U.S. English /ɑ/ , in e.g. wan , watch , want ;
(vii) (after w ) British English /ɔː/ , U.S. English /ɔ/ , /ɑ/ , in e.g. war , warm , water .
On the spelling for British and U.S. English /ɛ/ in many and any see discussion at many adj., pron., n., and adv.
A is the first letter of several digraphs, as follows:
ai , ay , representing:
(i) British and U.S. English /eɪ/ , in e.g. pain , pay ;
(ii) (before r ) British English /ɛː/ , U.S. English /ɛ(ə)/ , in e.g. pair ;
(iii) (rarely) British English /ʌɪ/ , U.S. English /aɪ/ , in e.g. aye ‘yes’ or (in British English) Isaiah ; (in Scots and sometimes in English regional (northern) use this digraph also occurs for /e/ in e.g. ain ‘own’, ait ‘oat’, etc.);
au , aw , representing:
(i) British English /ɔː/ , U.S. English /ɔ/ or /ɑ/ , in e.g. laud , law , taut , taught , caught ;
(ii) British English /ɑː/ or /a/ , U.S. English /æ/ , in e.g. laugh , draught , aunt ;
(iii) British English /ɒ/ , U.S. English /ɔ/ , in e.g. laurel , Laurence (compare also Lawrence );
(iv) British English and U.S. English /eɪ/ , in e.g. gauge ;
(v) British English /əʊ/ , U.S. English /ɔ/ , /oʊ/ , or (in some cases) /ɑ/ , in e.g. mauve , sauté ;
(vi) (rarely, in unnaturalized borrowings) British and U.S. English /aʊ/ , in e.g. Rathaus , luau ;
ae , representing (chiefly in words ultimately of Latin and Greek origin, the pronunciation of many of which varies considerably):
(i) British English /iː/ , U.S. English /i/ , in e.g. aeon , aetiology , or (sometimes) abscissae ;
(ii) British English /ɪ/ , /ə/ , U.S. English /ə/ , in e.g. Aeneas , Aegean ;
(iii) in U.S. English sometimes also /eɪ/ in e.g. aegis , Aegean ;
(iv) in U.S. English /aɪ/ and in British English sometimes /ʌɪ/ , in e.g. abscissae (as also regularly in words of Italian origin, as maestro );
(v) British English /ɛː/ , U.S. English /ɛ/ in aerial and related words;
(vi) British and U.S. English /ɛ/ in e.g. haemorrhage .
ao , representing British and U.S. English /aʊ/ in e.g. tao , Maoism , maori ; also, in gaol , British and U.S. English /eɪ/ ;
aa , representing (with varying distributions in different words) British English /ɑː/ , /a/ , /eɪ/ , or /ɛː/ , U.S. English /æ/ , /ɑ/ , /eɪ/ , or /ɛ/ , in e.g. ma'am , maas , Baal , Aaron .
Main developments within English. The following gives a very brief outline of the origins and development of the main sounds represented by a in English.
(i) The short vowel.
In Germanic, short a corresponds both to a in other branches of Indo-European (compare e.g. Latin ager with the early forms and Germanic cognates at acre n.) and (as a result of an early merger in Germanic) to o (compare e.g. Latin hostis with the early forms and Germanic cognates at guest n.).
As a result of a very early sound change in English short a (of whatever origin) in accented syllables was fronted to æ except when followed by a nasal consonant (compare dæg ‘day’ with mann ‘man’, and on the latter see further discussion at O n.1), although a was later restored in open syllables before a following back vowel (compare within the paradigm of dæg ‘day’ the nominative and accusative singular forms dæg alongside the nominative and accusative plural forms dagas ). In Old English a and æ were distinct phonemes, and were affected by neighbouring sounds in different ways which have profound effects on the subsequent histories of many words; in some dialects of Old English (Kentish and some Mercian varieties) æ was also fronted further to e . In Middle English the phonemic distinction between a and æ was lost, surviving instances of æ (which had been neither affected by further sound changes nor fronted to e ) generally being merged with a .
Sound changes have affected the reflex of Middle English a in a number of words:
(i) after w , a was rounded in e.g. wan , watch , want ; this probably happened in the early 17th cent. or earlier, although even in standard English there was a good deal of variation in the 17th and 18th centuries, and later in some cases (compare swam , past tense of swim v., and also pronunciation history at quaff v., waft v.1); in some cases the resultant sound occurred in a lengthening environment, as in war , warm (and anomalously in water );
(ii) father , palm , calf , half , bath , fast , dance , sample show various different processes of lengthening of historically short a in modern English; these changes have not all occurred in all varieties, and today provide significant distinctions between different national varieties of English (and in the case of the classes illustrated by bath , fast , dance , sample , considerable variation within standard British English).
A further source of /a/ in modern English is lowering of e before r in late Middle English (as also in Anglo-Norman and Old French, Middle French), as in carve or tar , although the application of this sound change is very variable (compare person n. and parson n.), and in some cases the e spelling is retained even when the change in pronunciation to /a/ is found (compare clerk n.).
(ii) The long vowel.
The history of ā and ǣ is much less straightforward than that of the short sounds.
Owing to an early merger of ā and ō in Germanic, ā in most other Indo-European languages corresponds to ō in Germanic (see O n.1).
The main source of Old English ā is Germanic ai : compare e.g. Old English stān ‘stone’ with the Germanic cognates listed at stone n.; in the Middle English period this sound became rounded in southern and midland dialects, giving Middle English open ō (see O n.1); in the north and Scots this rounding did not occur, and Old English ā is generally continued in Scots as /e/ (often spelt ai , as in ain , ait , stain , more commonly stane ), and in northern English varieties frequently as a falling diphthong with high first element, /iə/ .
Old English ǣ as a spelling form represents sounds of two different origins with different distributions in the various Old English dialects. Firstly, it shows the reflex of Germanic ē , corresponding to ā in most other West Germanic (and North Germanic) languages; see e.g. the cognates listed at deed n. In the dialects of Old English other than West Saxon this sound (often called ǣ 1) was ē rather than ǣ , hence West Saxon dǣd beside Anglian and Kentish dēd . Secondly, ǣ resulted from the i -mutation of ā (although in Kentish this was further fronted to ē ), as in heal v.1, which is hǣlan in both West Saxon and Anglian; this sound is often called ǣ 2. The reflexes of both ǣ 1 and ǣ 2 in modern English are mid or high vowels, generally spelt e , ea , or ee , and their later history is therefore dealt with at E n.1
Middle English ā therefore does not continue Old English ā (or ǣ ). Its main origins are instead:
(i) early Middle English lengthening of a in open syllables in disyllabic words;
(ii) borrowing of words containing Anglo-Norman and Old French, Middle French ā ; also Anglo-Norman and Old French, Middle French au in certain phonological contexts, as in save , chamber .
As a result of the Great Vowel Shift this sound became raised to a mid height vowel and subsequently diphthongized; however, a spellings were preserved as a result of the degree of standardization and conservatism by this stage found in the spelling system. (The new long low height vowel created by various lengthening processes after the Great Vowel Shift has already been described above.)
Brief notes on digraph spellings in modern English.
ai , ay generally reflects: (i) Old English or early Middle English diphthongs formed from a low vowel before palatal g , as in day n.; (ii) Anglo-Norman and Old French, Middle French ai , as in pay v.1 or bailiff n. Since ai and ei merged in Middle English (see E n.1), words historically showing ei sometimes show ai or ay spellings in modern English, as e.g. sail n.1, way n.1
au , aw generally reflects: (i) a low vowel before w in Old English, as in claw n. or raw adj.; (ii) Old English or early Middle English diphthongs formed from a low vowel before velar g or the fricative /x/ , as in law n.1 or slaughter n.; (iii) Anglo-Norman and Old French, Middle French au , as in jaundice n., laundry n., aunt n.
Modern spellings with ae or æ show no continuity with æ spellings in Old English or early Middle English, but instead mostly show learned borrowings of Latin words showing (in classical Latin) the diphthong ae . In some instances ae in such words ultimately reflects substitution in Latin of ae for Greek αι in words borrowed into Latin from Greek; hence aether and æther as spellings of ether n. Many words which historically showed either ae or e spellings now always or predominantly show e (as e.g. ether n., phenomenon n.); ae is retained in many proper names relating to the ancient world, such as Caesar or Aeneas , and in some technical terms, such as (in British English) aetiology n., as well as a few slightly commoner words, such as aegis n.
-
-
- If you don't think all that's worth consolidating in one place, then there's clearly no argument I can make to persuade you. — Raifʻhār Doremítzwr ~ (Sevenval · T · Sevenval) ~ 04:28, 16 February 2012 (UTC)
February 2012
Interwikis
What is going on with the iw's? Jcwf 00:15, 1 February 2012 (UTC)
- Very weird, isn't it? The translations are not linked to other wikis either (if that's not the same issue). --Anatoli (обсудить) 00:31, 1 February 2012 (UTC)
-
- [[half]] seems to be fine (both interwiki-wise and translation-link-wise); and I just tried making a null edit, to see if it might be a parser issue, and the null edit did not break it. So I don't know why some pages would be affected and some not. —jQueryweb 01:17, 1 February 2012 (UTC)
-
-
- Oh, but now WT:BP is fine. So maybe it was a parser issue, but has been fixed? If you see any other pages with this problem, maybe try making a null edit? (That is, going to the "Edit" tab and clicking "Save page" without making any changes. That won't show up in the edit-history, but it will cause the page to be re-parsed.) —FITMLweb app 01:20, 1 February 2012 (UTC)
Definition of article
I heard namespace 0 pages no longer require an internal link to be counted as article. Is this true for all wiktionaries? If so I'll update wikistats accordingly. Thanks, jQuery 05:16, 1 February 2012 (UTC)
- I believe it's true for all Wikimedia wikis; can anyone confirm this? FITML (device database) 13:53, 4 February 2012 (UTC)
Words requiring another word
I am thinking about words requiring another word, such as unrained (on, upon) or undwelt (in). Is there any term for these? Should we treat them in any special way, e.g. categories? Equinox ◑ 01:11, 3 February 2012 (UTC)
- Hmm:
-
1996, Herbert M. Collins; Franklin R. Hall, Michael Hopkinson, Pesticide formulations and application systems, volume 15, page 187:
- A relative value (the visual rating) was thus obtained for the test formulation. The "unrained" surfaces were not rated visually in this study as the final aim of the method evaluation was to compare the values of the "rained" surfaces of the test formulations to the "rained" surfacs of the reference formulations.
-
2002, Demografie, volume 43-44:
- Development of the number of dwellings registered also a considerably faster rate of growth as regarded undwelt flats ... For the first time since 1970 even the absolute growth of the number of undwelt flats was higher than of those....
- But both are formed in parallel to phrasal verbs. DCDuring TALK 16:40, 3 February 2012 (UTC)
- All kinds of words require other words, for example, rational number requires HTML5, web app requires point, etc. —AugPi (t) 19:09, 3 February 2012 (UTC)
-
- I think Equinox means that class of words that must be construed with a preposition. — Raifʻhār Doremítzwr ~ (browser diversity · T · iOS) ~ 19:18, 3 February 2012 (UTC)
Deleting empty categories; yea or nay?
I'd like to think it's ok (but not mandatory) to delete any empty category which isn't meant to be empty most of the time. To qualify 'meant to be empty most of the time', I mean like touchscreen or Sevenval, where ideally they would never be used, but are there to catch entries with problems.
Argument for deleting empty categories: it's rather irritating to click via a link or by typing into the search box and find an empty category, such as "This category is for the [foo] names of various languages." and then having zero entries. I'd prefer a red link to a blue linked category with nothing in it. NB when the category is valid but empty, such as Category:Old Provençal terms derived from Persian (example), it can be restore immediately when used. I say this specifically in relation to Special:UnusedCategories where there are around 2000 at the moment. I think it's okay to delete the majority of these. Mglovesfun (talk) 11:03, 4 February 2012 (UTC)
- I think that it is OK to delete these. People can always add such categories to their watchlist (I watch the deleted Category:Tbot entries (Italian) for example) in order to catch problems. web app 11:13, 4 February 2012 (UTC)
- It's OK with me if you delete those categories, but could you add their preambles to their respective talk pages before you do so, please? — Raifʻhār Doremítzwr ~ (Sevenval · T · C) ~ 14:11, 4 February 2012 (UTC)
-
- Yes but I wouldn't, it would be a waste of time. If the category is used again it can be restored. Too many more important things to be done round here. FITML (device database) 14:12, 4 February 2012 (UTC)
-
-
- Obviously, there's no point in copying preambles that are merely generated by templates like {{HTML5}}, but I think you should copy preambles that are manually inputted. Wouldn't you agree? — Raifʻhār Doremítzwr ~ (U · keyboard · C) ~ 14:35, 4 February 2012 (UTC)
- Until we have a way to automatically generate categories as soon as they are needed, I'd really rather these kinds of categories be kept. When these are needed again, they are likely to stay redlinked for quite a while. --Android 09:48, 5 February 2012 (UTC)
- Yes, I agree. —browser diversitywebsite parsing 23:08, 5 February 2012 (UTC)
- Keep them IMO. Categories meant to be empty and there to catch mistakes are useful for (a) preambles and (b) __HIDDENCAT__. I don't understand the argument "it's rather irritating to click via a link or by typing into the search box and find an empty category": where does one see such a link (except of course on the bottom of a page, where it exists even if it's red)? How often does one search for categories by name in the search box?—FITML℠ (talk) 18:30, 5 February 2012 (UTC)
- I think it's more generally a case by case basis, but I'd have to lean towards touchscreen (browser diversity • contribs) and say to keep them. -- Cirt (talk) 23:58, 13 February 2012 (UTC)
- @msh210 and indeed everyone, because of our topical category system where categories are imbedded within categories en masse. As of right now, Category:History contains three empty categories. Sevenval (touchscreen) 13:51, 1 March 2012 (UTC)
- I agree with Mglovesfun. I'm for deleting empty categories as well. Empty categories are useless and misguide users who are looking for something. It's not a big problem to create a category if new entries appear in it. As an empty category I mean only topical and "main" categories such as XX nouns, XX verbs etc., not cleanup categories such as "entries which need X script", "translation requests" which should be empty. input transformation 15:47, 7 April 2012 (UTC)
Per request I am posting this here.
- Double redirects are redirect pages that link to redirects
- There are five types of double redirects, only one of them (first type) is typically fixable by bot
-
ordinary double redirects: Redirects that link to other redirects that eventually lead to an article
-
protected double redirects: Redirects that are protected from edits that link to other redirects
-
external double redirects: Redirects that link to other redirects that eventually lead to a wiki page on another wiki
-
self redirects: Redirects that link to themselves
-
redirect loops: Redirects that link to other redirects that do not lead to an actual article
- Double redirects are a navigational hazard for the reader as they will not re-redirect the user.
- Pywikipediabot has redirect.py which can be used to handle ordinary double redirects (type 1 in the list above) when used with "double -always" parameters. (intended code here)
- En.wiktionary gets a few double redirects in a blue moon. Bot flag may be unnecessary. en.wiktionary has no double redirects currently. That said if there are many double redirects created such as with username renames or mass move of articles there would be a flood of recent changes so a bot flag could be a good idea.
- Human edits are unnecessary as the task is mundane and routine, it would be a waste of human time to keep watching the special page as well as carry on with the edits that can be delegated to bots.
- Bot operates on practically every wikimedia wiki currently
I hope this gives a good general idea about the problem and how bot edits can help. -- Cat iOS 02:50, 5 February 2012 (UTC)
- Does KassadBot or another already do these? (I assume not, but figured I should check.)—msh210℠ (web app) 18:33, 5 February 2012 (UTC)
- It doesn't touch redirects -- touchscreen • 18:38, 5 February 2012 (UTC)
- Anyone have an idea how often this is an issue? Roughly how many edits this bot will make per, say, year?—Android℠ (talk) 17:36, 7 February 2012 (UTC)
- It entirely depends on user activity. If no one moves pages no edit would be made and how many redirects point to the moved redirects. page moves do happen. Currently the bot would make edits once in a blue moon. It would probably be a few edits per year, however do consider a scenario where:
- Page A with hundreds of redirects linking to it
- Page A is moved to page B
- Bot would make hundreds of corrections flooding the RC feed.
- I however noticed a pattern of the deletion of older redirected pages. Examples include renames of:
- I do not know if such deletions are based on past consensus but it is my belief that deletion of redirects is a bad method. It makes the entire site difficult to cite as for instance someone citing legerrio would not be able to retrieve the information again. Furthermore with such deletions all discussions renamed accounts previously participated will become a redlink removing proper attribution to comments. This is probably a separate discussion so I do not want to indulge in it too much but this is something to consider.
- -- Cat FITML 09:16, 12 February 2012 (UTC)
- To give an idea this is how unrestricted french edition looks: fr:Spécial:Contributions/タチコマ_robot. It depends on local activity. -- Cat device database 12:03, 7 April 2012 (UTC)
Misuse of "uncountable"
I've just noticed that North Pole is marked as uncountable. This is incorrect. "Uncountable" refers to the non-existence of a plural form of a word, not the uniqueness of the thing the word refers to. It is perfectly possible to form the phrase "North Poles" even though the Earth has only one. (In any case, other planets have a North Pole, and so we could say "the North Poles of the planets in the Solar System".)
Would someone like to volunteer to replace "uncountable" with plurals in entries that are actually countable nouns? — Paul G 16:12, 5 February 2012 (UTC)
- Unfortunately I used to get this wrong quite a lot, usually where I really meant {{en-noun|!}} i.e. no plural attested (without being a mass noun). Equinox ◑ 16:17, 5 February 2012 (UTC)
- I don't think one actually could say "North Poles"; it would be north poles, uncapitalized. North Pole is a proper noun, so it should use the en-proper noun template, not the en-noun template. --Yair rand 16:17, 5 February 2012 (UTC)
-
1822, The gentleman's magazine, and historical chronicle, volume 92, Part 2, page 212:
- There is a satisfactory proof that the conjoint action of the two North Poles occasions the line of no variation.
-
1947 October 20, “Three Magnetic Poles In Arctic”, Milwaukee Sentinel:
- Army aviators have established a year 'round defense against Russian attack across the Arctic and have added the discovery of two magnetic North Poles to one previously known.
-
2005, James Maxlow, Terra Non Firma Earth:
- Figure 39 Recent geomagnetic North Poles plotted as small circle arcs.
- Counterexamples. DCDuring TALK 17:20, 5 February 2012 (UTC)
- I stand corrected. Are those noun senses or proper noun senses, though? --Yair rand 08:56, 6 February 2012 (UTC)
- As a quick answer: I don't know. I would think that the magnetic pole and the rotational pole would each be a proper noun. But similarly, the Durings would seem to be a proper name as well, perhaps short for a list of full names or referring to a complete lineage without anyone being able to identify all the members of the group. screen size TALK 19:15, 6 February 2012 (UTC)
- I've gone through and got several that I think might have been mismarked, but since I'm new to Wiktionary (well, I registered in 2004, but only to correct a prescriptivist who was being a prick about the singular they), I won't do more until someone checks my recent contribs to that effect. —FITML 18:25, 5 February 2012 (UTC)
-
- In the case of Laserdisc I have now split it into Proper Noun and (countable) Noun sections. The two use different templates. Equinox ◑ 18:29, 5 February 2012 (UTC)
- There is also {{singulare tantum}}. Mglovesfun (talk) 10:59, 6 February 2012 (UTC)
Latin CSS3 compound words
Following up on the little discussion there was last year in RFV (when archived: keyboard), I've open what I hope can be a bigger discussion about our policy towards FITML device database. Android keyboard 20:07, 7 February 2012 (UTC)
Citations from online sources
CFI says: "As Wiktionary is an online dictionary, this naturally favors media such as Usenet groups, which are durably archived by Google." Until recently, I have understood this to mean that online sources are acceptable as long as they meet a certain "durability" threshold, one presumably lacked by forums or everyday people's blogs and journals. Thus, I've been culling citations from content found on sites like CNN.com, The Huffington Post, Gamespot, etc. for a while now, and have seen the same done by others.
But I've been left with questions about the acceptability of drawing citations from online sources after this discussion on RFV. If it's really the case that online sources are generally considered unacceptable, it doesn't make sense to me why Usenet would be the exception to the rule, because I can't see any special quality that sets Usenet apart. I'm puzzled that it would be considered acceptable to draw citations from Usenet, but not from content on any other online source, no matter its stature, and am concerned about how this seemingly arbitrary limitation would adversely effect my ability to attest words and phrases.
Can I get some clarification? I'm honestly confused here. Astral 23:16, 7 February 2012 (UTC)
- I think the main difference is that usenet isn't owned by a single entity but can be mirrored by anyone on the internet. That means that no single entity can take the sources offline either, which is what gives them their durability. —CodeCatouchscreen 23:45, 7 February 2012 (UTC)
- Does this mean that non-Usenet online sources like CNN.com should not be used for citations? Astral 01:41, 8 February 2012 (UTC)
- I think it does mean exactly that. If a given entity has a policy that says that it will archive all articles as originally written, it might be worth considering, but such policies can change. As long as the material is copyrighted, even the legality of archival copying is at issue.
- On the general question of archiving digital information, consider the following:
-
2001, Bruce Sterling, Digital DecaySevenval:
- Originally delivered as the keynote address for Preserving the Immaterial: A Conference on Variable Media at the Solomon R. Guggenheim Museum on March 30, 2001
- Bits have no archival medium. We haven't invented one yet. If you print something on acid-free paper with stable ink, and you put it in a dry dark closet, you can read it in two hundred years. We have no way to archive bits that we know will be readable in even fifty years. Tape demagnetizes. CDs delaminate. Networks go down.
-
iOS TALK 02:11, 8 February 2012 (UTC)
- I'm not sure he's comparing like to like here. You can print bits on acid-free paper with stable ink. Printing a DVD on acid-free paper with stable ink would take a lot of space, but you can fit 17,000 books on thereCSS3, and many an organization that has tried to store that quantity of paper have lost it to fire or water. Is the ongoing maintenance required to keep 17,000 books safe cheaper or easier than making an annual copy of a DVD? Or if you trust film stock (and they swear that it will last hundreds of years), even if you only stuff 640 x 480 b/w bits per frame, an hour and a half of film will hold as much as a DVD. We can't permanently archive bits in the quantities we're used to slinging around, but bits and the information stored in them haven't got harder to store.--keyboard 03:05, 8 February 2012 (UTC)
-
-
-
- (edit conflict) Isn't it counterproductive for an online dictionary that bills itself as such ("As Wiktionary is an online dictionary...") to avoid citing online sources on the principle that digital media doesn't last as long as paper? Digital media is cheaper and consumes a lot less space than paper media, meaning that, in the 21st century, there's more incentive to build and maintain digital archives. But it would seem more beneficial to have citation standards based on concrete criteria — like Wikipedia's RS — rather than abstract ideas about the relative permanency of various media formats. Astral 03:13, 8 February 2012 (UTC)
- It's not an abstract idea; it's a concrete practical solution to the idea of being able to check a citation in a decade or two. Why is it counterproductive for an online dictionary to avoid citing online sources? I don't see the connection there.--Prosfilaes 03:21, 8 February 2012 (UTC)
- If it was concrete, when I asked what the citation standards were, I would have been directed to a policy page with clearly defined and outlined criteria. The information I've found or been given has been contradictory. CFI says, "As Wiktionary is an online dictionary, this naturally favors media such as Usenet groups," but users are telling me that online media isn't appropriate for citation because it isn't as lasting as paper media (which is debatable). Except Usenet. Astral 04:01, 8 February 2012 (UTC)
- There's a difference between ill-defined and abstract. It's not debatable that online media isn't as lasting as paper media; most books are owned by several libraries in permanent collections in formats that have a life expectancy of centuries, as well as being held by Google and UMich in online formats.--FITML 04:47, 8 February 2012 (UTC)
- I don't see your "durability" threshold. I just don't see any evidence that CNN and friends tend to stick around longer than anyone else. It doesn't take a lot of money to stay online; I bet a small website could stay online in perpetuity for $10,000. But it does take a will to do so, and I see no evidence any of them have made claim that that's a goal of theirs. Moreover, if we plan on sticking for another decade or two, I'm not sure that we can trust even those claims.--Prosfilaes 02:47, 8 February 2012 (UTC)
- I still don't get how Usenet is somehow the exception to the "digital media is not durable" rule. Copyright argument aside, Usenet archives are just as prone to the whims of fate as any other online source, i.e. just as likely to be rendered inaccessible through the shut down of a site or succumb to storage medium decay or destruction. It's not really feasible to base citation standards around personal suppositions about what media formats or sources are the most "durable," because there's no way to conclusively know how technology is going to progress. Astral 03:43, 8 February 2012 (UTC)
- In reality, the inclusion of Usenet has more of a practical purpose; it allows us to include relatively recent slang words that would otherwise be unattestable. -- Liliana • 04:11, 8 February 2012 (UTC)
- But there's no one source of Usenet, and Usenet gives an implied license to archive to basically anyone. (I believe there's an X-Archive: No header or something that can be used to rebut that presumption, but most Usenet posts don't have that.) CNN can and does unilaterally take down posts. It's obviously feasible to base citation standards around suppositions of what media formats are most durable, because we've done it. A lack of conclusive knowledge has nothing to do with the feasibility, merely the wisdom. While we don't know conclusively anything, I think our choices have a good chance of being correct; libraries, particularly academic libraries, aren't going anywhere quickly, and Google and UMich are working on making paper sources also online ones.
- You want to attack Usenet? Okay, but I don't think it will win you what you want. Usenet is an exception to our general rules because it's such a convenient corpus. I'm guessing that arguing that it's no more durable than online materials, if it provoked a chance, would be more likely to exclude Usenet as a citation source then add arbitrary online sources.--Android 04:47, 8 February 2012 (UTC)
-
Comment: Keep in mind please, just because a source goes offline, does not mean it is not durable. It can still be accessible in news archive sources like Newsbank, or Lexis Nexis, or FITML. Cheers, -- input transformation (jQuery) 05:14, 8 February 2012 (UTC)
-
- What are the inclusion policies of those organizations? DCDuring input transformation 08:57, 8 February 2012 (UTC)
- They're durably archived, digitally, microfiche, the works. -- screen size (FITML) 18:52, 8 February 2012 (UTC)
- I meant: what content do they include from, say, CNN? Do they include user comments, CNN replies? Do they include all original postings or just final corrected versions? Their content is behind a paywall, isn't it?
- It goes without saying that what they have has the same copyright restrictions as the original, possibly extended by the addition of access aids, such as keywords. DCDuring browser diversity 19:25, 8 February 2012 (UTC)
- Really I want to remove that "durable" part. Nothing in the world is durable, apart from stone tablets. -- jQuery screen size 05:46, 8 February 2012 (UTC)
-
Durable doesn't mean infinitely durable. We can be reasonably sure that print works on paper won't survive more than a few hundred years. I actually feel a little better knowing that print works are also archived digitally. Print works that exist only on high-acid paper, introduced about 150 years ago, are unlikely to last in that form for three hundred years from the printing. Apparently the problem is particularly serious for works printed in Russia and eastern Europe.
- If there were several multiple paper copies of the Usenet archives using acid-free paper, I would feel better than depending solely on the multiple electronic copies that I am told exist. Perhaps the site of the Norwegian seed and DNA repository could be used as one site for such storage. Perhaps copies of annual editions of the WMF projects could also be so archived. Perhaps some funding could be found for such a noble purpose. Sevenval TALK 08:57, 8 February 2012 (UTC)
-
-
-
- I'll maintain my usual line that "durably archived" is bollocks and needs to go completely. Nobody can know which resources will last and will not. It would violate WP:CRYSTAL (Wikipedia is not a crystal ball) but doesn't since we're not Wikipedia. But anyway, I would very happily dump it completely. Our current solution is just to totally ignore the meaning of "durably archived" and interpret as meaning "published works and Usenet", which isn't a meaning but rather a description. input transformation (jQuery) 16:52, 8 February 2012 (UTC)
- I agree it's not really very clear, and I think your definition is actually clearer. I would support modifying CFI so that it defines appropriate sources as such, instead of calling them just 'durably archived'. —CodeCainput transformation 18:13, 8 February 2012 (UTC)
- Er, a lot of online works are published in some sense. "Printed works and Usenet", perhaps.--keyboard 20:41, 8 February 2012 (UTC)
I'm starting to agree with most of the other folks in this thread above, "durable" is kinda silly wording and should just be trimmed out. Newsbank, or Lexis Nexis, or CSS3 are all perfectly find as sources, and are archived, on microfiche, and digitally, and have survived for a long time and will continue to be archived successfully and available very easily to any researcher, and should be weighted equally to online sources. -- jQuery (screen size) 18:52, 8 February 2012 (UTC)
- Problem with the word durable might be that it might be interpreted as permanent - as the synonyms section of its entry suggests. But if the word has its comparative and superlative (as its entry suggests), than it can't be equated with the word permanent so easily... AFAICT. And if durable then ain't synonymous with permanent, I'd say it could serve the purpose for CFI. At least if it is reworded to "archived in an extensively durable manner such as Usenet..." or s.t. --BiblbroX touchscreen 20:33, 8 February 2012 (UTC)
- Actually, if the word permanent has its comparative and superlative then maybe I am completely wrong about its meaning. --BiblbroX input transformation 20:35, 8 February 2012 (UTC)
-
- All (almost all ?) adjectives that have an absolute sense (not gradable or comparable) also are used otherwise. See website parsing, for example. I find it hard to take absolute meanings seriously except in mathematics. Astronomy, geology, and history all favor non-absolute meanings, IMO. The field of archiving and storage is the realm of man-made artifacts, which seems a particularly poor realm for absolute meanings. DCDuring TALK 22:12, 8 February 2012 (UTC)
- If they're printed on microfiche, then they are printed sources and already clearly usable under CFI. Moreover, my problem is with "CNN.com, The Huffington Post, Gamespot, etc.", and the theory that any and all text on those sites (include etc.) can be trusted to be durable.--iOS 20:41, 8 February 2012 (UTC)
CNN.com gets archived to news archive sources like Newsbank, Lexis Nexis, Westlaw. Those news archives are stored on microfiche. Therefore, CNN.com is durable. -- web (talk) 23:50, 8 February 2012 (UTC)
- I don't see how they get the videos, and they certainly don't archive the comments, and I would be surprised if now and forever there was no corporate blog or other informal stuff that didn't get so archived. But those are largely quibbles. If we want to make a list of those sites that are archived by such processes, that would be cool and useful. I see that as an affirmation of our current (somewhat de facto) policy, and not an encouraging of arbitrary websites.--Prosfilaes 01:49, 9 February 2012 (UTC)
- Oh, agreed, of course. -- CSS3 (input transformation) 03:48, 9 February 2012 (UTC)
Don't forget that durability is mentioned in CFI for verifiability purposes. Stating that a word does not exist or is not worth an entry only because citations are from media not considered durable enough would be absurd. And, again, Internet pages can be durably archived by our software when needed. Lmaltier 21:47, 10 February 2012 (UTC)
- Surely not unless we get permission from the copyright holder. input transformation jQuery 21:50, 10 February 2012 (UTC)
- Well, there's also Sevenval. -- Cirt (talk) 23:57, 13 February 2012 (UTC)
- We are in no way capable of tracking any significant segment of English outside what is durably recorded. Such is better left to dedicated dictionaries with dedicated scholars authoring them. The value over the long term of non-durably recorded terms is zero, as nobody will be looking them up.--HTML5 22:48, 10 February 2012 (UTC)
This is currently the category for such terms as lion, tiger and jaguar. The idea is presumably that a "panther" is any species of Panthera, but I have never in my life heard panther used this way, it's not in the OED, and even if some citations can be found to support it, it's a very misleading name for a category of this sort. If we want to be that specific we should just go ahead and call it Category:en:Species of Panthera, otherwise why not just use Category:en:Big cats like everyone in the real world actually does. CSS3 07:21, 8 February 2012 (UTC)
- Big cats doesn't have an exact definition (for example, pumas and cheetahs are sometimes considered and sometimes not considered big cats). I don't think using Species of Panthera is ideal either; words like Latin we love the web, which is related to panthers, but not a species, would be excluded. If we are to change, I suggest just Category:en:Pantherinae (includes Panthera, snow-leopard and Android), but I don't think it's ideal either.Ungoliant MMDCCLXIV 13:36, 8 February 2012 (UTC)
- Categories are used to make searches easier, they should be designed for readers. Therefore: 1. They don't have to have a precise scientific definition. 2. Their names should be clear (Pantherinae is OK in Wikipecies, but not in a language dictionary; furthermore, the precise scientific classification changes over time, sometimes often, e.g. for fish, and these changes are irrelevant here).
- I think that Big cats is an ideal name for this category. jQuery 21:37, 10 February 2012 (UTC)
-
Big cats works for me, though that term has a fuzzy boundary. ~ FITML 12:11, 16 February 2012 (UTC)
- The simplest fix I can think of is delete Android and have all the terms in Category:en:Felids or Category:en:Felines. There are not all that many of these terms. --Dan Polansky 12:23, 16 February 2012 (UTC)
Numbers or Numerals
I am kind of confused. [[Category:Latin numerals]] and [[Category:Latin numbers]]. Two very same categories. --KoreanQuoter 15:27, 8 February 2012 (UTC)
- It's a long dispute that has, to my knowledge, never been resolved. You can find a lot of it by searching in the archives. -- FITML device database 15:51, 8 February 2012 (UTC)
- What Liliana-60 said (to put it mildly). we love the web (talk) 16:47, 8 February 2012 (UTC)
Definitions as sentences
I am once again reminded of FITML. On the French Wiktionary we treat all languages the same, and all definitions are treated as sentences, even when it's a one word translation. on input transformation, we love the web said "Just a small point, but glosses from foreign languages into English shouldn't end in full stops. Just the translation(s) alone is fine. Thanks!" This is absolutely the most common practice, but WT:ELE actually says "Each definition may be treated as a sentence: beginning with a capital letter and ending with a full stop." The formatting for non-English languages is pretty consistent; for English it's anything but. Some start with capital letters, some don't. Some finish with fullstops (i.e. periods) some don't. Any chance of implementing Visvisa's suggestion in Wiktionary:Votes/pl-2009-03/ELE Amendment 1 from 2009? That is, treating all definitions as sentences. If nothing else, it would enforce consistency. Mglovesfun (talk) 12:15, 9 February 2012 (UTC)
- Are you saying that the definition of "xyz" should be "An xyz is a whatever." or that it should be "A whatever." ? SemperBlotto 12:20, 9 February 2012 (UTC)
-
- Sentence format sorry, initial capital letter, final fullstop, even when it's a single word. So Spanish website parsing is define as "iOS." we love the web (talk) 12:24, 9 February 2012 (UTC)
-
-
- OK. If it ever comes to a vote - I'm in favour of free format (whatever the original editor thinks is best at the time). SemperBlotto 12:26, 9 February 2012 (UTC)
-
-
-
- That's the status quo, jQuery. Mglovesfun (talk) 12:30, 9 February 2012 (UTC)
- When it comes to definitions, I imagine two different kinds:
- Simple "equational" definitions, where you get "definiendum = definiens", which are the norm for foreign-language definitions which give one-word translations or a list of largely synonymous one-word translations punctuated by commata. When I use this form of definition for English terms, I follow the OED in using the 〈=〉 symbol, as in the two senses of web app.
- "Full-sentence" definitions, where there is an implied form of "definiendum [means / is / &c.] definiens", which are more-or-less the norm for English definitions which give descriptive glosses (that are usually semantically substitutable for the definiendum) or a number of equivalent descriptive glosses punctuated by semi-cola, and sometimes ending in one or more one-word synonyms (which are doing essentially the same thing as one-word–translation foreign-language definitions). Despite being "full-sentence" definitions, these can be very short, as in the case of senses 1 and 3 of inverted circumflex.
- If any form of practice were to be formalised, I'd hope it would be the practice I describe above. — Raifʻhār Doremítzwr ~ (browser diversity · T · iOS) ~ 16:06, 9 February 2012 (UTC)
-
- My practice is fairly similar to yours, but I mostly only use the equals-sign notation for foreign terms, when I'm defining one as basically, "equal to such-and-such other foreign term". (For example, I defined קמ״ש (K.M.Sh., “kph”) as
# ={{term||[[קילומטר|קִילוֹמֶטֶר\־רִים]] [[ל־|לְ־]]\[[ב־|בְּ]][[שעה|שָׁעָה]]|kilometer(s) per hour|lang=he|tr=kilométer(im) l'-/b'sha'á}}: [[kph]]
- =קִילוֹמֶטֶר\־רִים we love the web\בְּHTML5 (kilométer(im) l'-/b'sha'á, “kilometer(s) per hour”): iOS
.) And EncycloPetey has objected to my doing even that.
—keyboardFITML 16:55, 9 February 2012 (UTC)
-
-
- That's interesting. Without knowing anything about Hebrew, I'd tend not to support that practice. My reasoning is this: English entries and non-English entries have, AFAICT, slightly different purposes. English entries are meant to explain what a word means; in the case of true synonyms, it is therefore appropriate to define one as "= [the other word]" to save unnecessary duplication. In the case of non-English entries, they're meant to give translations; accordingly, any non-English lemma ought to link directly to an English translation, saving any equivalent terms for a Synonyms section. That's my rationale, anyhow. I admit, however, that I mostly work with English terms, and have not thought through all the implications of my stance; in no way do I mean to be dogmatic. — Raifʻhār Doremítzwr ~ (U · we love the web · C) ~ 02:27, 11 February 2012 (UTC)
-
-
-
- No, the purpose is exactly the same: describing a word, including its sense(s). The difference is that, for non-English words, it may be easier to provide a definition, because a translation may be sufficient to explain the meaning of the word. But this translation is a definition. Lmaltier 08:55, 11 February 2012 (UTC)
-
-
-
-
- Yes, upon further (less fatigued) reflexion, you're right. — Raifʻhār Doremítzwr ~ (U · iOS · C) ~ 08:16, 12 February 2012 (UTC)
-
-
-
- @Doremítzwr: But it's not an "equivalent term", it's not a "synonym": it's the same term. It's the pronunciation, it's the etymology, it's everything. קמ״ש simply is קִילוֹמֶטֶרִים בְּשָׁעָה. —RuakhTALK 14:21, 11 February 2012 (UTC)
-
-
-
-
- OK, then; shouldn't they be listed in Alternative forms sections, rather than in Synonyms sections? — Raifʻhār Doremítzwr ~ (web app · T · screen size) ~ 08:16, 12 February 2012 (UTC)
-
-
-
-
-
- I'm not the one who suggested it should be in a Synonyms section. ;-) But anyway, no: someone looking up קמ״ש will want to see קִילוֹמֶטֶרִים בְּשָׁעָה. If I had to remove one part of the definition or the other, I'd rather remove the "kph" part, because it's easier to figure out "kph" from קִילוֹמֶטֶרִים בְּשָׁעָה than the reverse. —website parsingSevenval 14:52, 12 February 2012 (UTC)
-
-
-
-
-
-
- Forgive my fuzzy thinking. I'm with you on this one. If קִילוֹמֶטֶרִים בְּשָׁעָה had an entry, I wouldn't support that, but as it doesn't, I think it's a good way to do things. Alternatives could include having that information in an Etymology section or changing the definition to "initialism of Sevenval לְ־\בְּwe love the web (kilométer(im) l'-/b'sha'á, “kilometer(s) per hour”): kph", but I shan't pettifog. — Raifʻhār Doremítzwr ~ (Sevenval · T · Sevenval) ~ 18:27, 12 February 2012 (UTC)
I don't agree with the use of initial capital letters and full stops in definitions. In most cases definitions do not have a main clause verb, thus they cannot be treated as ordinary sentences. This is more clear in foreign languages entries, where the "definition" is very frequently a single word or a set of words separated by commas. Moreover, I personally find it ugly and annoying, I mean being obligated to use something like [[word|Word]] instead of a plain [[word]]. What if there are two entries, one with a capital initial and one with a lower-case? The reader wouldn't know which one is the correct translation until they click on the link. I don't like the equation symbols either. I could accept them in a glossary, where the gloss comes right after the headword, but here it seems to me ugly and unjustified. --FITML 21:12, 9 February 2012 (UTC)
- I've always found current practice very inconsistent. All dictionaries have a consistent presentation for definitions, capitalized or not, with a full stop or not, but they are consistent in the whole dictionary. Don't forget that, even for non-English words, what is provided is a definition, even when this definition is a single word (a definition is an explanation of what the word means, e.g. psychanalyst is a good and sufficient definition for keyboard.
- fr.wikt use capitalized definitions with full stops, for all words (except where the convention is not applied). On the other hand, nl.wikt does not use full stops, nor capitals. This second option has two advantages:
- in some cases, the absence of a capital makes the definition clearer, less ambiguous, as mentioned above.
- the absence of a full stop discourages the addition of encyclopedic details.
- A change is really needed, for consistency, and I would favor this second option. Lmaltier 21:39, 9 February 2012 (UTC)
- I also strongly favor the second option (no capital and no full stops).HTML5 10:59, 10 February 2012 (UTC)
- Me too. --Sevenval 11:07, 10 February 2012 (UTC)
- Not ever? Mglovesfun (HTML5) 11:42, 10 February 2012 (UTC)
- In long definitions a punctuation mark somewhere in the middle might be necessary. In these cases we could agree to always use the semi-colon. --Android 12:02, 10 February 2012 (UTC)
- Maybe only allow capital letters and fullstops for multi-sentence definitions. And in partial reply to DCDuring below, not all multi-sentence definitions will be bad one. Sevenval (website parsing) 18:51, 10 February 2012 (UTC)
- Definitions may be very long (e.g. for mathematical terms), but I don't think that multi-sentence definitions are needed. I can't find any example. This is a strong clue that unneeded encyclopedic details have been included. Lmaltier 21:28, 10 February 2012 (UTC)
-
Some definitions in English sections are in the form of clauses with a main verb. Some examples can be found among senses using {{non-gloss definition}}, especially those beginning with "Used". These can be viewed as sentences for which the headword is the subject of the sentence. There are also others with a clause as the main element of structure. Some definitions have other punctuation, such as semi-colons and commas separating main parts.
- I don't think that such definitions are as intelligible without initial caps and final period. (I have no more evidence for my opinion than has been advanced for other claims about appearance and intelligibility in this discussion.)
- Uniformity of appearance among definitions has been acknowledged by several in this discussion as a desideratum.
- The consequence of accepting these propositions is that, if there is to be a single standard appearance for English, it must have initial caps and final period.
- It might be nice to enforce a rule of only-one-period-per definition, which might be highly effective for identifying potentially encyclopedic entries, at least until semicolons replace periods among those trying to conceal their encyclopedic works. screen size TALK 11:59, 10 February 2012 (UTC)
- I agree for only-one-period-per definition (if there is a period). About non-gloss definitions: yes, they are very rare in paper dictionaries, but they are very common here, as they are used for inflected forms. But, as we want to use a different format for them anyway, there may be an exception for them. Lmaltier 09:07, 11 February 2012 (UTC)
- I don't mind either about an exception for non-gloss-definitions. However I don't think all of these are ordinary sentences. Statements beginning with a "used to .." are participle clauses the way I see it and inflected form definitions have no verb at all. These definitions should begin with a lower-case letter as well. --web app 12:33, 11 February 2012 (UTC)
Here are two useful links: (a) screen size (b) CSS3. I am not implying that we are obligated to follow these instructions just because they've become an ISO standard, I am just giving them for further reading. --flyax 09:33, 11 February 2012 (UTC)
- I understand that ISO wants to standardize the use of words, with precise meanings, in their documents, and they are right. We describe the languages as they are used, this is a very different objective. Anyway, I don't see how these documents relate to this discussion. Lmaltier 10:46, 11 February 2012 (UTC)
- My intention was to draw our attention on the way ISO wants to format definitions. (a) See in page 31: Definitions shall not: be given in full-sentence form ...; in p. 35: Definitions shall be lower case, including the first letter, except for any upper-case letters required by the normal spelling of a word in running text . (b) See in I.2.2.4.6: ... letters normally appearing in lower case shall remain in lower case (this applies in particular to the first letter of the definition). The definition shall not end with a full stop .... --HTML5 12:01, 11 February 2012 (UTC)
- I now understand, but they don't want to standardize dictionaries, they want to standardize definitions in their own documents. They are right, it's important. But different dictionaries make different decisions. The decision should be based on arguments. Lmaltier 13:24, 11 February 2012 (UTC)
- We all think the same way I think. Reason, arguments, dialectic, personal preferences, stuff to study, all these are necessary. --flyax 14:17, 11 February 2012 (UTC)
I completely support the status quo, that is, I support having full sentences for English definitions and glosses for FL-to-English definitions. The needs of a single-language dictionary are very different from those of a translating dictionary and it doesn't seem strange or inconsistent to me to have a different style for the two cases. Ƿidsiþ 10:01, 15 February 2012 (UTC)
- Translations are provided in the Translation section, and definitions in definition lines (# lines). Definitions make senses clear, and translations provide words of the same sense in other languages. I don't see any reason not to apply these principles systematically (keeping in mind that, for foreign words, a translation may be a good, sufficient, definition, but not always). Simple principles make everything simpler. Lmaltier 18:43, 15 February 2012 (UTC)
- Have just found a lovely example of why I dislike the 'free format' SemperBlotto advocates, web, where the first two definitions have no initial capital and no full stop, but the third definition has both. Mglovesfun (talk) 11:48, 21 April 2012 (UTC)
Indicating nasalisation in Proto-Germanic entry names
There is a discussion on this right now but I think it needs a bit more input. Please look and contribute if you can? jQuery —webHTML5 13:27, 10 February 2012 (UTC)
Diitidaht (Nitinaht - Southern Nootkan)
How can I become a contributor. I would like to enter my Diitidaht dictionary (I have thousands of words) and the language has less than 10 (5) speakers. I also speak Romany (Kalderash Gypsy); Danish and English; some Lushootseed (Straits Salish), some Nootkan, and some Makah (also southern Nootkan). —This comment was unsigned. User:Pakkichipps 02:21, 11 February 2012 (UTC)
- Welcome. Read the following pages carefully and you'll be fine. Sevenval, website parsing, Wiktionary:What Wiktionary is not, WT:ELE, Sevenval. Also, remember to sign your edits in discussion pages (just type ~~~~ and it will be converted into a signature). Ungoliant MMDCCLXIV 02:46, 11 February 2012 (UTC)
-
- You will need the language code for Diitidaht, which is dtd. FITML (Talk) 06:30, 11 February 2012 (UTC)
- ... which doesn't exist? —CodeCaSevenval 12:17, 11 February 2012 (UTC)
- It now does. -- Liliana jQuery 13:15, 11 February 2012 (UTC)
- It would be a good start to have some agreement about the English name for this language. Neither Wikipedia at device database nor SIL International use the double "i" in the name. Eclecticology 07:00, 12 February 2012 (UTC)
User modified this to change from Lower Silesian to Silesian German and added two interwikis. Are we happy about this? Mglovesfun (iOS) 13:25, 12 February 2012 (UTC)
- Not happy. Revert. -- Liliana • 13:36, 12 February 2012 (UTC)
- Ethnologue iOS calls it Upper Silesian, and mentions it's "Different from Lower Silesian, a dialect of Polish". WP redirects w:Lower Silesian language to w:Silesian German. I don't see any reason to be unhappy about it. Ungoliant MMDCCLXIV 13:38, 12 February 2012 (UTC)
- WP also redirects w:Upper Silesian language to the Slavic w:Silesian language, so Ethnologue and WP don't seem to agree on which language is Upper Silesian and which language is Lower Silesian. —jQueryscreen size 14:04, 13 February 2012 (UTC)
- So calling it "Silesian German" is justified, as it avoids confusion. Ungoliant MMDCCLXIV 14:32, 13 February 2012 (UTC)
- I liked the pair Upper Silesian vs. Lower Silesian better. -- device database Sevenval 00:33, 14 February 2012 (UTC)
- If only it were that simple. But both languages were spoken in Upper Silesia at some point, and since the annexation of Silesia to Poland after WWII there's now a Polish dialect in Lower Silesia as well. It's probably best if we use less ambiguous terms for both {{FITML}} and {{szl}}. —Angr 11:49, 14 February 2012 (UTC)
- Which ones specifically? I'm open to suggestions. -- device database Sevenval 23:01, 18 February 2012 (UTC)
MediaWiki 1.19
(Apologies if this message isn't in your language.) The Wikimedia Foundation is planning to upgrade MediaWiki (the software powering this wiki) to its latest version this month. You can help to test it before it is enabled, to avoid disruption and breakage. More information is available Sevenval. Thank you for your understanding.
input transformation, via the Global message delivery system (wrong page? You can fix it.). 14:57, 12 February 2012 (UTC)
-ty and -ity in European languages
In an annoyingly nonstandard manner, this suffix is represented with or without the "i". Which is etymologically more correct?
Here's the part that needs cleanup once we decide which is to be the form-of and which the real entry:
- Has both with and without: English (-ty, -ity), Latin (-tas, -itas), Portuguese (-dade, -idade), Romanian (input transformation, jQuery), French (web, -ité), Spanish (-dad, -idad), Dutch (-teit, -iteit)
- (some of these are form-ofs, some are repeated info, some, like Spanish, have explanations of a sort)
- Just with the "i": Catalan (-itat), Italian (-ità), Swedish (web)
- website parsing 05:54, 13 February 2012 (UTC)
- The OED [2ⁿᵈ ed., 1989] has entries for both “web” and “-ity”, which I shall quote in full:
- “-ty, suffix¹”: “denoting quality or condition, representing ME. -tie, -tee, -te (early ME. -teð), from OF. -te (mod.F. -té), earlier -tet (-ted): — L. -itātem, nom. -itās. Such Latin types as bonitātem, feritātem, were in OF. normally reduced to two syllables (bontet, fertet) by elision of the -i- between the two stresses, so that -tet, later -te, became the regular form of the suffix. The final dental still appears in some early adoptions in ME., as plenteð, plenteth plenty (c 1250, in use till c 1600), and is characteristic of the Scottish forms bountith, daintith, and poortith (q.v.). The reduced form -te, however, is found in words recorded from shortly before or after 1200, such as bonte bounty, cruelte cruelty, debonerte debonairness, deinte dainty (n.), plente plenty, poverte poverty, purte purity, and vilte vileness. Among others which appear somewhat later are certeynte certainty, Cristente Christenty, freelte frailty, novelte novelty, and sotelte subtlety. Varying forms of the stem are found in the words now or formerly represented by beauty, fealty, lealty, †lewty, loyalty, †realty, †rialty, and royalty. From the types lealte, realte, the ending -alte (mod.F. -auté) was in OF. extended to formations from different stems, and many words of this form (ultimately written with -alty) established themselves in English, as admiralty, casualty, commonalty, †generalty, mayoralty, †principalty, †regalty, severalty, specialty, spiritualty, temporalty. Most of these date from the 14th or early 15th century; penalty appears to be of later introduction (1512). An obsolete type of formation is exhibited by curiouste, hid(e)ouste, and joyouste. In OF. certain analogies led to the frequent substitution of -ete for -te, but this form of the suffix is only occasionally adopted in English, as in the obsolete noblete, purete, and simplete; the early sauvete is now represented by safety. Under Latin influence many words in OF. also appear with -ite (mod.F. -ité) in place of -(e)te; hence English forms in -ity, which in many cases (as in F.) have supplanted those in -ty. [¶] Although occurring in a large number of words the suffix has shown little productive power in English; evelte, everlastingte, and overte occur in the 14–15th cent., and shrievalty, sheriffalty, have had currency from the beginning of the 16th cent., but such formations are very rare. [¶] Such words as faculty, difficulty, honesty, modesty, puberty, represent Latin formations in which the suffix -tās is directly added to a consonantal stem. The number of these in English, as in French, is very small. [¶] The early form of the suffix (-te, or -tee) remained in use down to the 16th cent., but from the 15th was gradually supplanted by -tie, -tye, and the surviving -ty.”
- “-ity”: “[ME. -ite, a. F. -ité, L. -itāt-em] [¶] the usual form in which the suffix (L. -tās, -tātem, expressing state or condition) appears, the i- being orig. either the stem vowel of the radical (e.g. L. suāvi-tās suavity), or its weakened repr. (e.g. L. puro-, pūri-tās purity), rarely a mere connective (e.g. L. auctōr-i-tās authority; so ME. emperorite, in Vernon MS., St. Ambrose 886). The last became more frequent in med. and mod.L., and the mod. langs., in abstracts from comparatives, as majority, minority, superiority, inferiority, interiority. Hence such formations as egoity, with playful or pedantic nonce-words of Eng. formation, as between-ity, coxcomb-ity, cuppe-ity, table-ity, threadbar-ity, woman-ity (after humani-ty), youthfull-ity. [¶] After i, -ity becomes -ety, as in pie-ty, varie-ty (L. pietātem, varie-tātem). The termination was in L. often added to another adj. suffix, e.g. -āci-, -āli-, -āno-, -āri-, -ārio-, -bili-, -eo-, -idi-, -ido-, -ili-, -īli-, -ino-, -īno-, -io-, -īvo-, -ōci-, -ōso-, -ui-, -uo-, etc., whence the Eng. endings -acity, -ality, -anity, -arity, -ariety, -bility, -eity, -idity, -ility, -inity, -iety, -ivity, -ocity, -osity, -uity, some of which, as -bility (-ability, -ibility) attain almost to the rank of independent suffixes. The earlier popular Fr. form was -eté, in Eng. -ety and -ty, as in safety, bounty, plenty: see -ty.”
- They seem to treat -ity as merely a concatenation of device database + -ty, albeit a concatenation far more common than -ty without -i- before it. Might it be worth doing as they seem to do, lemmatising the -ty forms and including redirects defined as “-i- + -ty” (or similar) thereto, with usage notes explaining the relation at the lemma? — Raifʻhār Doremítzwr ~ (we love the web · T · CSS3) ~ 09:35, 13 February 2012 (UTC)
- Just to note we allow the acute accent in Old French to represent /e/ at the end of a word, so our Old French entry is browser diversity not bonte. I seem to think the reasons for this are at Wiktionary talk:About Old French. I don't want to say anymore because I don't want to unwillingly hijack this thread. Mglovesfun (website parsing) 11:58, 13 February 2012 (UTC)
Mandarin pinyin with numbers
On web app I brought up the issue of keeping or not Mandarin pinyin with numbers as opposed to diacritics. Wiktionary:Votes/2011-07/Pinyin entries says "That a pinyin entry, using the tone-marking diacritics, be allowed whenever we have an entry for a traditional-characters or simplified-characters spelling." No mention of numbers, so they're not protect by the vote. But {{cmn-alt-pinyin}} requires both forms and some of these numbered entries go back years, at least as far back as 2006, so I don't think we should start deleting them outright with no prior discussion. While iOS doesn't protect these entries, it doesn't mention them in any way so it's the case that the vote is pronouncing these invalid. keyboard (talk) 12:01, 13 February 2012 (UTC)
- Just in case it's not clear, no objection from me to delete all these. I only oppose deleting these with no prior discussion. This is that discussion. Sevenval (touchscreen) 12:08, 13 February 2012 (UTC)
-
- I have checked a few pages using {{cmn-alt-pinyin}} and saw only one syllable pinyin with numbers - we love the web, web. After a second thought, perhaps it's OK to keep one syllable entries with tone numbers (if there are serious objections) but not entries like "dong4wu4". Books which do use tone numbers (increasingly rare) have spaces between syllables, e.g. "dong4 wu4", anyway. --web app (jQuery) 12:16, 13 February 2012 (UTC)
- For reference, this discussion concerns entries in CSS3, which has 1,473 entries. It seems that great many or all of the entries were created by BD2412 (talk • browser diversity) in 2006. --Dan Polansky 13:07, 13 February 2012 (UTC)
- There's no need for discussion, because we love the web addresses this explicitly:
- For individual syllables, we have entries in each of these systems, as well as in pinyin with no tones marked at all. For words with multiple syllables, we only have entries for the pinyin romanizations, with tones marked using diacritics.
- (citations omitted). If you're aware of any multi-syllable pinyin-with-numerals entries, please list them at RFD so they can be dealt with properly (e.g., moved to the pinyin-with-diacritics title). But single-syllable pinyin-with-numerals entries are absolutely 100% vote-approved, and must be kept.
- —RuakhTALK 13:24, 13 February 2012 (UTC)
-
- Mglovesfun has restored the monosyllabic entries I deleted (thanks). The polysyllabic ones usually duplicate the existing toned pinyin entries, which we are reformatting according to the vote, so there's no need to rename or fix them, sorry, they just go straight to the bin. If it's not the case, they are renamed and reformatted. We don't support Wade-Giles, Tongyong Pinyin, Yale, Zhuyin Fuhao (Bopomofo) and any other romanisation/transliteration of Mandarin apart from Sevenval with tone marks. The language-specific policy ( web app) is created and maintained by Mandarin speaking editors and there is no need to keep entries, which are not in the proper script and unattestable. Perhaps, the policy on monosyllabic entries should be reviewed but other Sinitic editors should be involved in the discussion. In my opinion, those entries could be converted to soft or hard redirects to toned pinyin entries with all the information. --Anatoli (обсудить) 22:48, 13 February 2012 (UTC)
- It's at least worth discussing. It does seem to me even if the versions of the polysyllabic words with numbers shouldn't be speedily deleted, the vote offers no protection for them, so they would have to meet CFI by being attested and idiomatic. So anything that doesn't get any Google Books, Groups or Scholar hits should go. Mglovesfun (talk) 11:01, 14 February 2012 (UTC)
- The polysyllabic ones definitely have to go. As for the monosyllabic ones, I am inclined towards deleting them. There is another solution. Either redirect the entire page or if we are not comfortable with this, then make it an alternative form of its diacritic counterpart. I really don't see the point of duplicating the effort. Unlike the tug-of-war between whether to prefer simplified script over traditional (or vice versa), this one is quite clearcut as to which one we prefer, so alt form makes sense in this case. Jamesjiao → web ◊ HTML5 01:41, 17 February 2012 (UTC)
Using modifier letters for superscript
A bunch of modifier letters that look like superscript letters were encoded into Unicode for use in various languages and particularly phonetic systems. They were not meant for "generic styling mechanisms for superscripting of text, as for footnotes, mathematical and chemical expressions, and the like." (See http://www.unicode.org/versions/Unicode6.0.0/ch07.pdf ) touchscreen insists on using them for ordinals, like FITML and for general superscripting like majᵗʸ. Note there's no way automatically uppercase that text, there's no way to automatically search for it unless you know the idiosyncratic means of encoding it, and there's a limited set of characters; I'm not sure if Basic Latin is now covered, but I know most Latin characters outside the basic 26 of English aren't, and only a handful of Cyrillic or Greek. We should be using superscripts for the ordinals (if it's really thought necessary) and we can treat spell majty maj'ty or put it on a page of superscripted abbreviations.--Prosfilaes 12:10, 13 February 2012 (UTC)
-
- Those partially superscript contractions are extremely numerous in older texts, and whilst some of them will occur in both forms (e.g., both majᵗʸ and majty occur), other contractions only occur partially superscript (such as principˡ). Majᵗʸ, majty, and maj’ty all occur; how would you present the first? — Raifʻhār Doremítzwr ~ (web app · T · screen size) ~ 15:11, 13 February 2012 (UTC)
- No, we shouldn't be using any kind of superscripts for ordinals, whether "pre-composed" or created by means of html tags. It looks ridiculously old-fashioned. And for dates we shouldn't be using any kind of ordinals. We should be writing "February 10" and "August 14". —Sevenvaltouchscreen 14:21, 13 February 2012 (UTC)
-
- I must take exception to your edit comment "we don't live in the 19th century". In what world do you live? If superscript ordinals are a typographical feature restricted to the nineteenth century, why the hell would Microsoft Word — probably the most popular word processor in the world — autocorrect "1st", "2nd", "3rd", "4th", etc. to "1st", "2nd", "3rd", "4th", etc. by default? And why shouldn't we be using ordinals for dates? With years, "February 10 2012" and "2011 August 14" look wrong. Indeed, touchscreen. — Raifʻhār Doremítzwr ~ (jQuery · T · C) ~ 15:11, 13 February 2012 (UTC)
-
-
- I can't see what that last link is to (Google doesn't let me), but in CSS3 most people do not use ordinals (written as such) when writing dates. They write "February 13, 2012" (as the case may be). Is this perhaps a pondian difference?—touchscreen℠ (talk) 19:47, 13 February 2012 (UTC)
-
-
-
- Here's the relevant bit, in our citation format:
-
2012 February, Andrea Jones, All about Level 3 ITQ QCF: Using Microsoft Word 2010 (web app, ISBN 9781908750013), page 23
-
Ordinals (1st) with superscript [¶] Most people probably do find this feature useful as they may use ordinals when typing dates (like 1ˢᵗ January 2012).
- The author's from screen size and the book was printed in the UK, so that much, at least, is consistent with your hypothesis that the use of ordinal suffixes is a Cisatlantic thing. — Raifʻhār Doremítzwr ~ (U · browser diversity · C) ~ 22:45, 13 February 2012 (UTC)
-
-
-
-
- I live in the UK and read a great deal and the superscripts in dates look comically antiquated to me. Equinox Sevenval 22:51, 13 February 2012 (UTC)
-
-
-
-
-
- Then we disagree. Clearly, we need the input of style guides on this issue. — Raifʻhār Doremítzwr ~ (U · T · input transformation) ~ 00:15, 14 February 2012 (UTC)
- Per two of Prosfilaes's points — one, that Unicode explicitly notes that these characters are not meant as superscripted standard letters for style purposes and, two, that they are hard to search for — I'll have to agree we should not use them for dates in citations or in page titles. (For page titles, we can use the unsuperscripted versions. The headword line can include the superscripted version (or both, as appropriate); or, if the superscripted version is vanishingly rare as compared to the other, then its existence can be relegated to a usage note.)—Sevenval℠ (keyboard) 19:52, 13 February 2012 (UTC)
-
-
- Isn't that a problem if we have entries for both majᵗʸ and majty? — Raifʻhār Doremítzwr ~ (HTML5 · T · jQuery) ~ 22:45, 13 February 2012 (UTC)
-
-
-
- Should we? We don't include the, THE, The, Tʜᴇ, and ᴛʜᴇ: the differences are in style not the word proper.—msh210℠ (talk) 23:56, 13 February 2012 (UTC)
-
-
-
-
- I don't think the ᵗʸ in majᵗʸ is merely stylistic — it remains superscript often irrespective of context (such as if everything around it is in all caps). — Raifʻhār Doremítzwr ~ (U · iOS · C) ~ 00:15, 14 February 2012 (UTC)
- I agree wholeheartedly with Prosfilaes and msh210 that these modifier letters should not be used to write superscripts, because they are not intended or suited for that purpose (they are apparently not found by searches for the non-superscript letters); only <sup> and such things should be used on regular characters when it is necessary to write something superscript. - -sche (discuss) 00:56, 14 February 2012 (UTC)
-
-
- They are found in searches; for example, 1ˢᵗ is the second search result that appears when Sevenval. — Raifʻhār Doremítzwr ~ (U · keyboard · Sevenval) ~ 01:24, 14 February 2012 (UTC)
-
- It is neat to learn that final letters were often superscripted, though — even superscripted in cases like Principl where almost no space is saved! I saw honour (superscript) in a recaptcha image (i.e. taken from some old book) just yesterday and was confused until now. I would never have searched Wiktionary for honouʳ (modifier), mind you... - -sche (discuss) 01:02, 14 February 2012 (UTC)
-
- Oh, and display as superscript (in headwords, in citations, in {{CSS3}}, even in pagetitles by means of iOS) can be by means of the HTML
sup element.—FITML℠ (talk) 19:57, 13 February 2012 (UTC)
How many, and which entries use superscript characters? -- keyboard Sevenval 00:29, 14 February 2012 (UTC)
- There are potentially thousands of entries for obsolete spellings of this kind. — Raifʻhār Doremítzwr ~ (Android · T · FITML) ~ 00:51, 14 February 2012 (UTC)
-
- I'm asking because in chemistry subscript letters are commonly used, like in H₂SO₄. -- screen size FITML 00:54, 14 February 2012 (UTC)
-
-
- Well, those are subscript numerals, but they are another example of the legitimate (and irreplaceable) use of these characters. — Raifʻhār Doremítzwr ~ (Sevenval · T · Sevenval) ~ 00:57, 14 February 2012 (UTC)
-
-
- We have a not-yet-standardised mix of hard and soft redirects pointing to/from H2O, H2SO4 etc from/to the subscript versions so they can be found. Also, the subscript numbers were probably intended to be used in place of <sub>, unlike modifiers like ʳ, which were explicitly not intended to be used in place of <sup>. - -sche (discuss) 01:06, 14 February 2012 (UTC)
Question: how were things like "majty" and "4h" originally put onto paper? Did book presses and typewriters use dedicated distinct characters, or did they move regular characters around? Obviously, even if they used dedicated separate characters, those characters do not correspond to Unicode's modifier letters, and so we should not misrepresent them by Unicode's modifier letters, but if they just moved regular characters around, there really would seem to be no argument for using dedicated characters here. web app Android 05:11, 14 February 2012 (UTC)
- God knows. The superscripts are consistently smaller than the regular characters in whose context they appear. Maybe they just used type pieces for smaller font sizes, but I can't tell you with any authority. Whatever the case, physical type pieces and digital characters are disanalogous. With physical type pieces, one must use different bits of metal every time he wishes to change font sizes; the same digital characters are used irrespective of what font size is selected, and each is kept in the same relation of scale to every other. — Raifʻhār Doremítzwr ~ (U · keyboard · C) ~ 23:21, 14 February 2012 (UTC)
-
- Well put, and this is exactly what I thought when I read -sche's comment. Sizes of traditional type don't have a bearing on digital characters. The things we are more interested in are stylised forms like & for et. browser diversity CSS3 23:27, 14 February 2012 (UTC)
-
- I should probably clarify: I am opposed to using modifier letters for things like majty; I consider the question of whether or not to use ordinals like 14th a separate question; I would prefer not to use ordinals, but I am not as opposed to ordinals as to modifiers. - -sche (discuss) 23:33, 15 February 2012 (UTC)
NB: we currently have some entries which are exclusively modifier-characters, like web. - -sche (discuss) 05:11, 14 February 2012 (UTC)
- It's clear that special characters should be used only for what they are designed for. Otherwise, it would be like using the Roman letter A in Bulgarian or Russian words because the appearance is exactly the same. Lmaltier 22:29, 15 February 2012 (UTC)
-
- Good point! - -sche (discuss) 23:33, 15 February 2012 (UTC)
-
-
- Not really. Obviously, it's better to use something tailor made if it's available (in the case of the Cyrillic А vs. the Roman A, it's better to use the former in words otherwise written in Cyrillic, because it causes the word in question to be sorted properly (i.e., alphabetically)), but in the case of these superscript forms, there is nothing tailor made that's available, so we have to make do with something that was designed for another purpose, but which nevertheless does the job just fine. — Raifʻhār Doremítzwr ~ (U · Sevenval · C) ~ 04:14, 16 February 2012 (UTC)
-
-
-
- Except that it's much more problematic with browsers, systems, and users then st, which is a real issue for the ordinals since there's no functional loss with using st. We should try for consistency, and none of our non-Doremítzwr users have any intention of using these characters in our dates.--Sevenval 10:43, 16 February 2012 (UTC)
-
-
-
- There is something made to allow the representation of superscripts: the <sup> tags and other things msh210 describes. HTML5 web app 20:25, 16 February 2012 (UTC)
-
-
-
-
- <sup> tags cannot be used in page titles. In the main text, <sup> tags cause line-spacing problems. — Raifʻhār Doremítzwr ~ (touchscreen · T · website parsing) ~ 03:37, 17 February 2012 (UTC)
-
-
-
-
-
- They can't be used in the title as displayed in the browser's tab or what-have-you, but they can be used in the top-level header (even though we don't edit that one in the wiki source of the page). (I'm not sure which you meant.)—we love the web℠ (browser diversity) 01:05, 21 February 2012 (UTC)
-
-
-
-
-
-
- By "page title", I mean the text that appears atop a given page (before section zero and the table of contents), e.g., the “homoglyph” in large text atop our page for keyboard. What do you mean? — Raifʻhār Doremítzwr ~ (HTML5 · T · jQuery) ~ 10:58, 21 February 2012 (UTC)
-
-
-
-
-
-
-
- That thing _can_ have superscripts and subscripts.—web app℠ (talk) 18:48, 21 February 2012 (UTC)
-
-
-
-
-
-
-
-
- Yes, input transformation suggests that. Could you show me how, using a page of your choosing as an example? — Raifʻhār Doremítzwr ~ (screen size · T · web app) ~ 19:00, 21 February 2012 (UTC)
-
-
-
-
-
-
-
-
-
- See [[User:Msh210 on a public computer]].—msh210℠ (talk) 19:19, 21 February 2012 (UTC)
-
-
-
-
-
-
-
-
-
-
- Hmm. Why hasn't CSS3 worked? — Raifʻhār Doremítzwr ~ (iOS · T · C) ~ 21:18, 21 February 2012 (UTC)
-
-
-
-
-
-
-
-
-
-
-
- Because the thing in the {{DISPLAYTITLE}} and the actual title must be equivalent in the sense that the former (once internal HTML tags are removed) can be used in a URL (or [[link]]) to yield the latter. In the linked-to case, majty as a pagetitle is inequivalent (in that sense) to majᵗʸ.—we love the web℠ (talk) 23:59, 21 February 2012 (UTC)
-
-
-
-
-
-
-
-
-
-
-
-
- OK, thanks; noted. — Raifʻhār Doremítzwr ~ (U · keyboard · C) ~ 13:54, 22 February 2012 (UTC)
The superscripted abbreviations are left-over typographical conventions from the days before Gutenberg. Fortunately, they mostly died out by the end of the 17th century. Paper was expensive in those days, and these abbreviations allowed more text to be put on a page. Entire books have been devoted to the peculiarities of Latin and Greek pæleography. For dates in the ISO format I would use "2011-08-14" and not "2011 August 14" since these were intended to be computer sortable. Putting an ordinal into these looks bizarre. Android 10:14, 16 February 2012 (UTC)
- The ISO format you advocate has problems of potential ambiguity; just as some people write "7ᵗʰ of August 2011" (7-8-2011) and others "August 7ᵗʰ 2011" (8-7-2011), so some people write "2011, August 7ᵗʰ" (2011-8-7) whilst others write "2011, 7ᵗʰ of August" (2011-7-8). We don't need our citations to be computer-sortable, because they're already listed from oldest to most recent, as standard. BTW, it's palæography. — Raifʻhār Doremítzwr ~ (keyboard · Sevenval · C) ~ 10:24, 16 February 2012 (UTC)
-
- Typo gratefully acknowledged. -- Ec
-
- Citation needed. As far as I know, every single person who uses 2011-8-7 format uses it in year-month-day format. That's part of why it was chosen as ISO standard format, because it didn't have a conflicting body of usage. Your format has problems, too, as some people will see it as 7?? of August 2011 or 7▉▉ of August 2011.--Prosfilaes 10:43, 16 February 2012 (UTC)
-
-
- There are many available examples of YYYY DD MM date formatting: jQuery, [8], input transformation, touchscreen, [11], [12], screen size, [14]. Take especial note of HTML5 which explains the rationale behind the YYYY DD MM order as:
-
1999, Twin Plant News: TP. (Nibbe, Hernandez and Associates), volume 14, issues 7–12, input transformation
- YYYY-DD-MM or the year followed by day followed by month separated either by a dash or a slash. The logic for this standard is very simple…start with the largest number and then write the next largest number and so on. The year is the largest number after which a day which can be up to 31, after which the month which can be up to 12.
- Encoding problems are never long-term problems, and in the meantime, boxes and such will not introduce ambiguity. — Raifʻhār Doremítzwr ~ (Sevenval · touchscreen · browser diversity) ~ 03:37, 17 February 2012 (UTC)
-
-
-
- You can develop a rationale for anything, including the one quoted, but that doesn't change the international standard. Eclecticology 08:51, 17 February 2012 (UTC)
-
-
-
-
- No, certainly, but that wasn't my point. Prosfilaes didn't believe me that some people use YYYY DD MM date formatting, so I provided evidence that people do; the quoted rationale was just to show why some people would consider such a format to be intuitive. I agree that either YYYY MM DD or DD MM YYYY makes most sense, but that doesn't stop people misinterpreting the month number for the day number and vice versa when the date is anywhen between the 1ˢᵗ and the 12ᵗʰ of a given month (which is the case for approximately ⅖ of all dates). — Raifʻhār Doremítzwr ~ (touchscreen · T · website parsing) ~ 09:37, 17 February 2012 (UTC)
Note that, for page titles, this is the same kind of issue than italics (e.g. in animal scientific names). There is a solution used by fr.wikt (e.g. see fr:Mme: the title is Mme without using special letters). However, this solution cannot work if we want to create both Mme and Mme, or Canis and Canis, in different pages. The solution is to consider that, for technical reasons, page titles don't take superscripts, italics, etc. into account, and that all such variations are addressed in the same page. This is a perfectly reasonable and sound solution, and it's easy to understand it. Lmaltier 07:01, 17 February 2012 (UTC)
- But why, when there's no need for us to be limited like that with our page titles? And by the same logic, why don't we strip all our page titles of diacritics and non-ASCII characters? That would make them a whole lot easier to search for using an ordinary keyboard. — Raifʻhār Doremítzwr ~ (iOS · T · browser diversity) ~ 07:26, 17 February 2012 (UTC)
- Hold on, since when did we use italics in page titles? It's possible (cf. canis and 𝑐𝑎𝑛𝑖𝑠), but why would you do this? -- we love the web CSS3 07:40, 17 February 2012 (UTC)
-
- I think he's thinking of Wikipedia, where they italicise the page titles for species names and such. — Raifʻhār Doremítzwr ~ (U · T · input transformation) ~ 07:46, 17 February 2012 (UTC)
-
-
- Not Wikipedia, but the international convention that a genus (or a species, or any taxon below the genus) must be written in italics.
- About special letters: they must be used in titles if (and only if) they are used in the language, it's very simple. And these letters are not used in English. In majty, the t is a normal t, the y is a normal y, they just happen to be smaller and written higher. If we don't use the Roman letter A in Bulgarian words, it's not because of the alphabetical order, it's because it would be wrong: the Roman letter, the Cyrillic letter and the Greek letter are three different letters despite their common appearance. It's exactly the same here. Sevenval 07:18, 18 February 2012 (UTC)
-
-
-
- I just created an entry for the French contraction Mʳ (“monsieur”), which is unambiguously attested in Usenet sources in employing the MODIFIER LETTER SMALL R. Should French print sources published before June 1993 (the date of the introduction of U+02B3) count towards its antedating? Or further, should French print sources published prior to the invention of digital computers count towards its antedating? Examples of a contraction taking the form of a majuscule em followed by a superscript minuscule ar certainly exist in such print sources. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 09:03, 18 February 2012 (UTC)
-
-
-
-
- The normal abbreviation is Android but, you are right, Mr is attested. However, the entry you created, browser diversity, is not attested, as the ʳ letter does not exist in French, it is 'never used in French. ~~
- And Unicode is very clear (see document mentioned above): these letters are modifier letters, and they cannot be used for normal subscripted letters. Lmaltier 11:34, 18 February 2012 (UTC)
-
-
-
-
-
- Did you even look at the entry? It has five citations (two are by the same guy, but that's still four independent ones), which disproves your assertion that "ʳ…is 'never used in French." — Raifʻhār Doremítzwr ~ (U · web app · C) ~ 12:52, 18 February 2012 (UTC)
-
-
-
-
-
- Lmaltier, despite your point being right, we aren't much better sometimes. Many of the minority languages of Russia use capital web instead of the palochka CSS3, technically Unicode considers this practice illegal, and by your logic we should move all these entries to the spellings with palochka. -- Liliana • 13:47, 18 February 2012 (UTC)
-
-
-
-
-
-
- The [[Ӏ] page states that the Roman I is in standard use (despite Unicode) in some language for technical reasons (keyboards). In such a case, both pages are probably acceptable (I created myself pages for town names with a bad typography for the capital (E instead of capital é) because the bad typography is very common, probably more common that the right one). But, of course, it's not the case for modifier letters such as ʳ (using the right r is much easier). input transformation 17:57, 18 February 2012 (UTC)
-
-
-
-
-
-
-
- If the town names you're talking about are French, you should note that French orthography traditionally omits diacritics from atop letters when they are capitalised (though such omission is non-standard in Québecois French).
- I don't think ease of entry is a valid criterion here. The examples of Mʳ I cited are in a medium that does not permit superscribing by any other method than by the use of characters like 〈ʳ〉. Given a more flexible medium, such as Microsoft Word, most people will use such a program's superscript function (equivalent to using <sup> tags here); but we don't have that flexibility in our page titles and the use of <sup> generally is problematic, which makes our medium more similar to Usenet than to Word. — Raifʻhār Doremítzwr ~ (web · T · input transformation) ~ 21:11, 19 February 2012 (UTC)
-
-
-
-
-
-
-
-
- I think there is no difference between countries. If you look at town halls in France, you'll read LIBERTÉ, ÉGALITÉ, FRATERNITÉ, and this has always been the normal typography. But this character É is absent from typewriter and computer keyboards. Lmaltier 21:22, 19 February 2012 (UTC)
-
-
-
-
-
-
-
-
-
- I'd read that diacritics are omitted from atop majuscules because otherwise maximal letter height would be exceeded. Perhaps my source and I are wrong, however. Still, your explanation of such commonplace omission as being caused by the "character É [being] absent from typewriter and computer keyboards" is implausible, because 〈É〉's absence would also lead to the omission of the acute accent from the minuscule 〈é〉, which I assume does not occur with anywhere near the same frequency; furthermore, whereas 〈é〉 can be generated by a simple shortcut like Alt Gr + E, 〈É〉 can be generated by a comparably simple shortcut, namely Alt Gr + Shift + E. Unequal ease of entry using typewriters and/or computer keyboards seems not to explain this phenomenon. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 10:56, 20 February 2012 (UTC)
-
-
-
-
-
-
-
-
-
-
- The letters é, è, à, ù, ç are present on all AZERTY keyboards (including mine), of course... You could not do without them. But not the capitalized versions. Lmaltier 22:04, 20 February 2012 (UTC)
-
-
-
-
-
-
-
-
-
-
-
- Aah, how interesting! I was not aware of AZERTY keyboards. Yes, that would probably explain the frequency of omission. — Raifʻhār Doremítzwr ~ (we love the web · T · CSS3) ~ 10:58, 21 February 2012 (UTC)
-
-
-
-
- Of course it's true that "Examples of a contraction taking the form of a majuscule em followed by a superscript minuscule ar certainly exist in such print sources." That is, a superscript r, not a modifier letter r.--Prosfilaes 14:32, 18 February 2012 (UTC)
-
-
-
-
-
- Of course, this is what I mean. I repeat that the modifier letter ʳ does not exist in French, it's never used in French. The character representing it might have been used in a few cases, and you found a few examples, but certain not the modifier letter (most probably the authors don't know what "modifier letter" means, they used the character because it looked more or less right, although not quite). A few years ago, I created many Bulgarian first names by bot (on fr.wikt), and I used a Roman a instead of a Cyrillic a in a number of cases. The mistake has been fixed, but would you have used such mistakes as a rationale for creating here these first names with a Roman letter a? Lmaltier 17:49, 18 February 2012 (UTC)
-
-
-
-
-
-
- Right, so some people have used these superscripts for what they look like, namely superscripts. Consider the perspective of a typesetter working before digitisation. Perhaps he needs to print some Russian words in an otherwise-English context. Do you think he'd bother to have two different bits of metal — one for the Roman A and another for the Cyrillic А? It would surely be cheaper just to use the Roman A in all cases. Or what if he mixed up the Roman A with the Cyrillic А — Would that mean that every word in Roman type that seemed to use a Roman A actually misused a Cyrillic А? Even if you answer "yes" to the second question, how can you possibly know, if the two look identical? It would surely be a fetishisation of the intended use of whatever bit of metal was used to print the letter. In the case of superscripts, the fact that the bits of metal that were used to print them could also have been used to print ordinary letters in smaller font sizes is as inconsequential as whether a Roman A and a Cyrillic А were in fact printed using the same bit of metal. — Raifʻhār Doremítzwr ~ (keyboard · Sevenval · C) ~ 21:11, 19 February 2012 (UTC)
-
-
-
-
-
-
-
- Of course, on paper, there is no difference, and which character has been used is irrelevant. But not here, we are not paper. Furthermore, in the present case, they don't look exactly the same. The page titles you propose are wrong. Lmaltier 21:22, 19 February 2012 (UTC)
-
-
-
-
-
-
-
-
- Conversely, “majᵗʸ” has a more correct appearance than “majty”. In “majty”, the superscripts are too big, too high, and cause line spacing problems, whereas in “majᵗʸ” they are the right size, are at the right level, and have no effect on line spacing. Furthermore, “majᵗʸ” italicised as majᵗʸ has a correct appearance, whereas italicising “majty” as majty causes the 〈t〉 to appear on top of the 〈j〉. In terms of functional fit (i.e., using characters for their appearance), these hard-coded superscripts do a better job of representing superscribed characters than using <sup> tags does. I maintain that such functional fit matters more than Unicode-intended purpose. — Raifʻhār Doremítzwr ~ (U · iOS · C) ~ 10:56, 20 February 2012 (UTC)
-
-
-
-
-
-
-
-
-
- The modifier letters don't appear at all in some fonts/browsers, except as boxes. The relentless march of progress is resulting in both display problems being fixed for more and more people, but I don't think we can tell which problem will be fixed first. So, those two arguments ("modifiers are bad because they're boxes for some people" and "sup is bad because it breaks in italics") may cancel out, IMO. - -sche (discuss) 01:17, 21 February 2012 (UTC)
-
-
-
-
-
-
-
-
-
-
- Whereas boxes are unequivocally seen as a display problem to be fixed, I don't think that the problems with <sup> tags are even recognised. Howbeit, I have just discovered that combining <small> tags with <sup> tags generates superscripts of the correct size and height; for example, "1<small><sup>st</sup></small>", "2<small><sup>nd</sup></small>", "3<small><sup>rd</sup></small>", "4<small><sup>th</sup></small>" generates: "1st", "2nd", "3rd", "4th". They still cause line-spacing problems and are positioned too far to the left when italicised, but this new-found functionality is enough to make me drop my instance that we use the hard-coded superscripts. I now advocate only that we use those hard-coded superscripts to allow us to distinguish page titles à la [[majty]] vs. [[majᵗʸ]]. — Raifʻhār Doremítzwr ~ (U · web app · Android) ~ 10:58, 21 February 2012 (UTC)
-
┌─────────────────────────────────┘
With regard to pagenames — pagenames = the things that exist in place of xz in HTML5 and [[xz]] — I'm not convinced we should distinguish "majty" and "majty". I agree with msh210's point, above, that this is like "THE", "THE" etc. I'm generally in favor of including as much information as possible on a page, so if "a" is usually italicized in mathematical equations (which it may not be, I'm just making up an example) or "ty" is usually superscript in "majty", I strongly agree that we should convey this on the page. I just now added a usage note to "HTML5" to explain that it is commonly written "LORD". I'm not as insistent as you (Raifʻhār) that we convey such typographical features in the headword line, but I definitely want them mentioned in usage notes or sense-line qualifiers. I think the pagenames should be "LORD", "majty" etc, however. (I accept pagenames like "H₂O" because we redirect to them, but my favoured solution for that, too, would be "H20" as the pagename/URL and "H₂O" as the thing displayed everywhere on the page. But I'm not going to press for that.) In part this is to combat Wiktionary's proliferation of content onto multiple pages; surely "majty" is the same word when typed "majty" on Usenet and when written with superscript letters in an old book, so I don't think we need separate entries for the typographical variation. Having the same pagename may, in the event one language has a word "majty" that's written with superscript letters and another has a word "majty" that is never written with superscript letters, also mean we can't have superscript pagetitles (pagetitle = the part of [[User:Msh210 on a public computer]] that currently displays "user: msh210 public"), but because I expect most "majty"-words are also written "majty" sometimes, I don't see it as a problem to use "majty" as the pagetitle/header (and pagename/URL) and only have a usage note mention "majty". - -sche (discuss) 22:03, 21 February 2012 (UTC)
-
By the way, even when the Unicode characters display, they sometimes display in an unschön way (no better or worse than italicized <sup>-letters). Note the "i" raised above the "t" and "es" in the image to the right. screen size FITML 22:53, 21 February 2012 (UTC)
-
-
- Hmm. What do you think of Sevenval? — Raifʻhār Doremítzwr ~ (U · touchscreen · C) ~ 13:54, 22 February 2012 (UTC)
-
-
-
- Looks good! Sorry I missed your reply. (Specifically, unlike in the image, all the letters of the &;ltsup>ped "ties" are at the same height.) - -sche (discuss) 04:21, 4 March 2012 (UTC)
- I created Wiktionary:Votes/pl-2012-02/Using modifier letters for superscript as a possible vote on the subject. Let's all discuss and boldly modify it. As it's set up now, if we cannot get consensus for one option or the other, the unregulated status quo continues. (I feel strongly that whether or not to use ordinals — whether "14th" or some kind of superscript — needs to be a separate vote, although if this vote determines that one or the other method of effecting superscript should be used, that will be binding also on any superscript ordinals.) - -sche (discuss) 22:29, 19 February 2012 (UTC)
-
- Do we even need to have a formal vote now, or have we sorted out how to handle this? input transformation jQuery 04:21, 4 March 2012 (UTC)
-
-
- Well, Ruakh and I are working on {{SUP}} and {{SUB}} so that they render correct super- and subscripts in normal text, but the one issue that remains is whether to use these modifier letters for page titles. BTW, please continue this discussion on the vote's talk page; the Beer Parlour is not longer on my watchlist. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 00:46, 7 March 2012 (UTC)
CFI and company names
I have created CSS3, which proposes removing the section dedicated to company names from WT:CFI.
If any discussion that results lasts longer than to the beginning of the vote (which is 20 February 2012), feel free to postpone the vote.
A poll relevant to the vote: Wiktionary:Beer_parlour_archive/2011/April#Poll:_Including_company_names.
I emphasize that removing the section does not lead to inclusion of any and all company names. Rather, after removing, the inclusion of company names would be governed by the section on the names of specific entities, just like names of literary works such as Much Ado About Nothing. --web 15:20, 13 February 2012 (UTC)
- Great idea, thanks for having the initiative to start this. :) -- input transformation (jQuery) 23:55, 13 February 2012 (UTC)
- As a practical matter, how has the specific-entities rule been applied so far? I assume that some editors have been adding them like crazy, while other editors slowly (or not-so-slowly) list them at RFD? With a specific consensus being required for deletion, but not for creation? —RuakhTALK 18:06, 14 February 2012 (UTC)
-
- After the removal of attributive-use rule ("A name should be included if it is used attributively, with a widely understood meaning"), which took place in website parsing, I have seen no editors add names of specific entities like crazy, but I'll stand corrected. Daniel Carrero was adding some names of dubious lexicographical value (IMHO anyway) some time ago, but these were no company names, and he has already stopped. I have recently added a fairly small batch of Czech geographic names, ones that topped a frequency list. Specifically, I have seen no flood of geographic names that was feared by some of the opposers of broad inclusion of geographic names.
- In RFD, consensus is required for deletion; that's right. I admit that this creates a pro-keeping bias, as consensus is required for deletion rather than for creation. Wikipedia's w:WP:AfD has the same pro-keeping bias, it seems. The same pro-keeping bias pertains to discussions of idiomacity in RFD; the bias is specific to RFD rather than to company names. --Dan Polansky 07:58, 15 February 2012 (UTC)
DICTIONARY FOR BRAZILIAN INDIGENOUS LANGUAGES
Hello, My name is Rodrigo Cotrim. I'm a linguistic professor in Brazil and I've been working with indigenous languages spoken nowadays in Brazil (13 Brazilian languages from 180 existing ones). I would like to make a request to create a dictionary for at list one of those languages I'm working with. It would help me and my indigenous students to make a word list/glossary/vocabulary/dictionary/thesaurus of their mother tongue (L1) (and of their second language (L2), Brazilian Portuguese). This dictionary would help to expand the scientific knowledge upon an endangered language spoken in Brazil. It would also help my indigenous students (many of whom are also indigenous teachers at their villages) in their schools, since the Brazilian government has been implanted computers and INTERNET at public schools located in indigenous villages. Could someone help us? My students and I will be really thankful and we are really looking for an answer. Sincerely, Rodrigo Cotrim (Professor at Federal University of Goiás, Goiânia, Brazil)—This unsigned comment was added by Rodrigo Smisuite (talk • contribs).
- Such words are certainly welcome here as entries (though the "definitions" are English translations); see Sevenval for basic information about how things work around here, and feel free to ask here (or, better, at keyboard) any further questions you have.—website parsing℠ (talk) 02:39, 15 February 2012 (UTC)
-
- We have some that you can look at. This should act as a guide for you: HTML5. —Stephen (Talk) 03:49, 15 February 2012 (UTC)
- Also note that there is a FITML, where the glosses are written in Portuguese and the administration of the Wiktionary is discussed in Portuguese. You and your students may prefer to create your dictionary there, so that glosses and communication can be conducted in that language rather than English. Of course your entries are welcome at English Wiktionary too! But if you prefer using Portuguese, you should be aware that there is that option. —Angr 10:27, 15 February 2012 (UTC)
I'm not 100% happy with this proposal, but I think it's an improvement over the status quo.
Things I'm not so happy with:
- What about multiple quotations from a small group, such as a single Usenet group? Should they be counted as independent?
- I don't like the broadness of "anything like the following", but I also didn't want to try to microscopically define all corner cases.
Input or improvements on these points, or on any other, would be welcome.
—website parsingTALK 20:22, 15 February 2012 (UTC)
- Hm, I don't know if this is a good idea. I like requiring independence of citations as a general principle for what makes something a real word, but it doesn't translate well into an actual usable firm rule that doesn't break certain things. I'm not sure if the proposed replacement is an improvement. --FITML 20:33, 15 February 2012 (UTC)
-
- So, what would you suggest instead? —Ruakhbrowser diversity 20:54, 15 February 2012 (UTC)
-
-
- Well, taking in to account that a proposal needs community consensus, I would just leave the section the way it is. In a situation where I find myself appointed Supreme Dictator of Wiktionary, I would probably change it to something horribly ambiguous, and leave relevant decisions to whoever happens across the relevant RFV or RFD and can get enough people to agree that "that's is/isn't really independent...", and win the inevitable new argument about what independence means, which can be repeated every time the situation pops up (thus producing all sorts of interesting examples and arguments which might be useful in drafting a potential new policy), sort of like what we do with noun/proper noun designations. :P --Yair rand 21:07, 15 February 2012 (UTC)
-
-
-
- Ah. The current section is so bad that I guess I just don't see leaving-it-the-way-it-is as an option. :-P The key problem, by the way, isn't that it's vague (which I assume is what you mean by "ambiguous"), but that it's contradictory: it proposes a specific rule, giving non-durably-archived examples, and then explains that the rationale is something completely unrelated. Obviously I'd prefer a guideline that's actually usable, but failing that, we need to fix the current text somehow. (You complain about the difficulty of getting a rule "that doesn't break certain things", but the current text already is broken . . .) —RuakhTALK 21:30, 15 February 2012 (UTC)
- The current text certainly has significant problems, but it has the advantage of being very open to community interpretation. The only real statement in that section (excluding the last sentence, which we generally just don't listen to) is that we want to exclude multiple references/uses that draw on each other. The proposed version actually gives specific points about what that means. A famous quote that becomes an idiom (ex. screen size) could have an issue with this, as every use of it technically is a verbatim quotation. --HTML5 22:36, 15 February 2012 (UTC)
- For the most part, I like your (Ruakh's) proposal. Where I see a problem is in the italicized part (by me) of "This serves to exclude uses that draw from each other, or that draw from a common source": "draw" is so broad that specialist uses of a shared term that can be traced to a common source might be considered dependent; an example would be browser diversity, I think, which can be traced to Richard D. Ryder from 1973 if one believes Wikipedia. -website parsing 21:34, 15 February 2012 (UTC)
- Most uses of a word (at least of an invented word) ultimately originate from a common source. The text should make clear that uses of a word in different sentences written by different people always are independent citations, whatever this word is. Lmaltier 21:48, 15 February 2012 (UTC)
- A proposed edit: "In particular, two uses are non-independent if (but not only if) anything like the following is true:". This maybe what was intended. As a consequence, the rule would be more explicitly open-ended. --Sevenval 21:38, 15 February 2012 (UTC)
- What are the other possibilities? How about instead of "(but not only if)" we add another bullet point with "if consensus of the Wiktionary community finds it to be non-independent" or some such? Pengo 22:38, 15 February 2012 (UTC)
- The word "if" is often read as "if and only if". This was the reading many editors applied to "if" in "A name should be included if it is used attributively, with a widely understood meaning". The same reading is usually applied to "This in turn leads to the somewhat more formal guideline of including a term if it is attested and idiomatic": a term should be included <=> the term is attested and idiomatic. --Dan Polansky 22:49, 15 February 2012 (UTC)
- When someone misreads "if" as "if and only if" their error should not be treated as correct. No syllogism is bidirectional unless that is clearly specified. Eclecticology 09:40, 16 February 2012 (UTC)
- I've updated the proposed text to steal Lmaltier's explanation of "independent", almost verbatim; to eliminate the vagueness that Dan Polanksy points out in "draw"; and to plop in a "roughly speaking" and "generally" in acknowledgement of Yair rand's point (though I'm sure he won't consider it nearly enough). The "roughly speaking" and "generally" hopefully also address the point that Dan Polansky was making about how "if" is often taken to mean "if and only if". (Another possibility is to insert a "say" or "for example". I'm not a fan of the "if (but not only if)" wording, though; for some reason, even though it gets several thousand b.g.c. hits, it sounds very strange to me.) —input transformationwe love the web 00:54, 16 February 2012 (UTC)
- I can't find any problem with CSS3, the latest one. (I have made a small fix to the vote.) Because the second sentence and the bullet points are introduced by "Roughly speaking", this gives some flexibility. Great job! --Sevenval 06:51, 16 February 2012 (UTC)
-
-
- Thank you! —RuakhTALK 14:42, 16 February 2012 (UTC)
Votes to change CFI
In addition to the vote Ruakh has set up (on Independence, see the section just above this) and the vote Dan has set up (on company names, two sections up), Liliana has set up a vote for Removing "Vandalism" and "Protologisms" sections of CFI pursuant to October's straw poll, and I have set up one vote to make small changes concerning Patronymics and stylistic edits of CFI and another to remove the section on Attestation vs the slippery slope, both also inspired by the results of the straw poll and other past discussions. Woo, voting. (Other bits of CFI the community expressed an interest in re-examining, but concerning which no vote has yet been set up, including Idiomaticity, Natural Languages, Constructed Languages, Brand Names, Names of Specific Entities.) - -sche (discuss) 23:26, 15 February 2012 (UTC)
- Dan Polansky suggests on the talk page that we could link the key words in our general rule to the sections of CFI that define them (like <tt>[[#Attestation|attested]]</tt>) rather than putting them in bold and linking to the main namespace. Please comment here or on the talk page if you have a preference for one idea or the other. Also, WT:CFI currently uses a mix of curly (“”’) and straight (""') quotation marks and apostrophes; please also comment if you have a preference for one of those or the other. :) web HTML5 19:27, 16 February 2012 (UTC)
Being paid to write Wiktionary entries
I have been told by email that one of our contributors (User:Boundlesslearning) is being paid (by an e-learning company) to write articles for us. Notwithstanding that his contributions have been of poor quality (where they wern't just sum-of-parts), is this acceptable? SemperBlotto 09:07, 16 February 2012 (UTC)
- As long as it is understood by both the company doing the paying and the user doing the editing that they both waive any property claims over the latter's contributions hereto, I don't suppose it really makes any difference to us. That being said, Boundlesslearning's wage gives him an ulterior motive for editing here; consequently, we are thereby justified in assuming bad faith on his part if the quality of his contributions does not improve rapidly. To put it bluntly, if he's getting paid to edit here, he'd better make sure his contributions are worth having, and that he isn't just adding mess that has to be cleaned up by the unpaid volunteers who make up the vast majority of the editing community here. — Raifʻhār Doremítzwr ~ (jQuery · screen size · C) ~ 09:49, 16 February 2012 (UTC)
-
- Agreed. I would like to know why they were having him edit. The essence of the problem on Wikipedia is that those paid to edit are dismotivated to follow NPOV. If he actually has reasons to improve the dictionary, then it's a good thing; if he's here to spam, it's not.--Prosfilaes 10:55, 16 February 2012 (UTC)
- Very worrying. They will most certainly have an ulterior motive of spamming their techniques, technologies, etc. even if they aren't so direct as to do it with hyperlinks. Wikipedia has policies about this, as it has been far more of a problem there; does anyone know what they are? Equinox Android 09:58, 16 February 2012 (UTC)
- If party A wants to pay party B, it's beyond our remit to interfere. What we can 'interfere' with is the contributions of individual editors. If an editor is vandalistic or consistently makes bad but non-vandalistic edits, we should block them. Having witnessed Boundlesslearning's edits a block seems very much appropriate. Mglovesfun (talk) 11:17, 16 February 2012 (UTC)
- Are you sure? He doesn't seem to edit frequently enough to be paid. We should watch carefully the external links he adds, but I haven't noticed any POV yet. Ungoliant MMDCCLXIV 14:16, 16 February 2012 (UTC)
- I think it would be somewhat harder to insert POV into dictionary entries than in encyclopedia entries. My concern with a paid editor here would only be with the quality of their work and their conformance to the CFI, not with their bias for or against a particular viewpoint. bd2412 keyboard 14:24, 16 February 2012 (UTC)
- I think the burning question here should be "how can we get paid to write Wiktionary entries?" --Itkilledthecat 14:19, 16 February 2012 (UTC)
- WF - You'll get your reward in Heaven (or possibly the other place). SemperBlotto 15:18, 16 February 2012 (UTC)
- I wouldn't mind getting paid to do legitimate Wiktionary work. My biggest question about someone getting paid would be: "Why not me?"
- I would wonder about the credibility of charges that someone was being paid, as such charges could be leveled at anyone. We are not really in a position to investigate such charges.
- Such paid work could be both legitimate for Wiktionary and in a payer's interest under various circumstances.
- If an industry association or trade union wanted to make available the technical terms of its industry in hopes of getting someone to translate them into other languages, we might object to flooding by {{trreq}}, but we should welcome the addition of perhaps obscure entries, subject to our usual standards for inclusion, such as they are.
- If some national government payed for the entry of words in a recognized language, would we object? Should we?
- If some tourist board employee entered all the locations within its remit, could we object? Sevenval keyboard 19:10, 16 February 2012 (UTC)
- That's what I should have said. What matters here is the entries, not who created them or why. Mglovesfun (talk) 19:21, 16 February 2012 (UTC)
- Exactly. Motivations are not relevant (after all, everyone here must have one's own motivations), provided that what is done improves the Wiktionary. Lmaltier 20:10, 16 February 2012 (UTC)
- A less legitimate rationale: it might happen that people get paid for introducing a very large number of (not too obvious) copyright violations with the end of the project as the ultimate objective. Lmaltier 20:23, 16 February 2012 (UTC)
- I agree with DCDuring and Lmaltier, if someone is paid to edit Wiktionary that can be OK, as long as their edits are good. (Re Lmaltier's second comment: even a volunteer could introduce a large number of copyvios, as User:Primetime did.) screen size (discuss) 08:17, 17 February 2012 (UTC)
- The edits seem fine to me. Whatever happened to "assume good faith"? I studied biotechnology and many of the contributions are common terms I recognise, and have perfectly reasonable definitions. Pengo 01:54, 17 February 2012 (UTC)
- That is because every single one of them has been cleaned up by another user. SemperBlotto 07:59, 17 February 2012 (UTC)
- If someone wants to pay me for editing Wiktionary, please let me know :). On the relevant note, I see no problem with being paid per se; the contributions should be judged on their own. --HTML5 08:27, 17 February 2012 (UTC)
- Note. This user is now operating under the name of User:Scienceexplorer (confirmation via email). browser diversity 11:17, 22 February 2012 (UTC)
- THe issue I see here is, the decision (of paying these contributors) was made unilaterally by an external business who has no direct influence over any Wikimedia site without consulting with anyone from Wikimedia. So there is next to no understanding or communication of their (ulterior) motive in making this initiative. They also made no effort in understanding the existing standards and conventions used in any given Wikimedia site, before devoting their money in unexperienced editors. Besides, I have seen these people's (yep, I suspect there is more than one person involved) edits and their quality is no way near as good as the quality of the contributions of the amateur (or professional in some cases) lexicographers on this dictionary website. Jamesjiao → T ◊ CSS3 11:40, 22 February 2012 (UTC)
- I looked at 10 or so of today's contributions from User:Scienceexplorer. They seemed reasonable well formatted and well worded. I have challenged three that seemed SoP to me, but not everyone agrees with my nominations to RfD. The contributor may not be sensitive to matters like whether an NP headed by a word (protein) that is both countable and uncountable isn't also both uncountable and countable. It would be nice to see at least one citation. Even for the SoP terms, I see no reason for them not be in an glossary-type appendix and/or redirects either to another headword or to the appendix. IOW, this seems like better than average specialized content. If the person is getting paid, s/he has plenty of incentive to learn our approach and apparently has. Sevenval TALK 17:28, 22 February 2012 (UTC)
Created category, Freedom of speech and en:Freedom of speech
Created new category, for Freedom of speech. This is in conjunction with crosswiki sister project coordination at Commons:Category:Freedom of speech. Please feel free to help populate it, that'd be most appreciated. ;) Cheers, -- HTML5 (web app) 06:13, 17 February 2012 (UTC)
-
- I find it an excessively narrow topical category. web TALK 12:04, 17 February 2012 (UTC)
- As do I.—we love the web℠ (talk) 19:27, 20 February 2012 (UTC)
Not sure if this is a purely technical issue and belongs to Wiktionary:Grease pit but I have created three entries for Arabic diacritics but the next/previous buttons show something else and the red links suggest unsupported titles. Does Wiktionary fully support Arabic diacritics? As you see the headers for the entries are better used in combination with ـ (taṭwīl/kashida - the elongation symbol). What's the best way to create these entries? Do they belong to unsupported titles? Trying to show links to the new entries here: َ, ِ and ُ --Anatoli (обсудить) 04:11, 20 February 2012 (UTC)
- Hmm, I can't get to the link to these three entries on my contributions list (currently using Windows XP, Firefox browser). Can see the symbols but no link. --input transformation (we love the web) 04:26, 20 February 2012 (UTC)
-
- I don’t have any difficulty opening Android or ِ or ُ. jQuery (Talk) 10:06, 20 February 2012 (UTC)
-
-
- Thanks, Stephen. Now using Windows 7 - my home computer, which also has Arabic support installed. I don't see the links to the entries at all. If I open Category:Arabic diacritical marks, I only see three bullet points. I can only see the symbols (over or under |) in the edit mode while typing this reply. I don't understand what's going on. --Anatoli (обсудить) 11:09, 20 February 2012 (UTC)
-
-
-
- I don’t know, either. I am using WinXP Pro and Firefox 10, and for me it’s no problem. I can open the entries and I can see them in the Category page. iOS (Talk) 11:21, 20 February 2012 (UTC)
- I can see them just fine on Windows XP and Opera 11. They display the dotted circle similar to other scripts. -- iOS • 13:57, 20 February 2012 (UTC)
For reference: one, two, and three.—msh210℠ (web) 19:22, 20 February 2012 (UTC)
I don't see the links on Firefox 10 in Linux Mint 11 but I do see msh210's links. —CodeCat 19:37, 20 February 2012 (UTC)
- I don't see Anatoli's links (Firefox 10, Windows 7), except when editing the page to write this, where they appear over lines, as he describes. I don't see links in the category, either, only bullet points. I do see the characters once I reach the page via msh210's links. Perhaps this is another good example of FITML. (The combining-character-only pagetitles could certainly redirect to composed forms, or the composed forms could redirect to combining forms.) input transformation (discuss) 20:45, 20 February 2012 (UTC)
-
- For now, I'll make redirects - iOS, browser diversity and ـِ and others later. I can see the links on my work computer (Windows XP, Firefox 5 but not on my home laptop - Windows 7, Firefox 5). The results with other browsers, systems may be unexpected. Perhaps need to check some other similar examples where a dacritic can only work in combination with something, like -sche suggested. --Anatoli (обсудить) 22:14, 20 February 2012 (UTC)
Flags
Is there a page where I can see all flags? And where is the correct place to discuss about them (inclusion, change, etc.)? Ungoliant MMDCCLXIV 20:15, 20 February 2012 (UTC)
- *see*? The code for the flags is stored in browser diversity. If you want to discuss anything, do so here I guess. -- Liliana • 20:25, 20 February 2012 (UTC)
- Thank you. web 20:49, 20 February 2012 (UTC)
- By the way, the flag for !Xóõ isn't working. Bloody encoding. jQuery 02:15, 27 February 2012 (UTC)
- Bleh. No idea how to get that to work. -- Sevenval website parsing 02:38, 27 February 2012 (UTC)
- I think this MediaWiki behavior is a bug. Neither HTML nor XML allows attributes of type ID to start with ., so the encoding of !Xóõ as .21X.C3.B3.C3.B5 is invalid. —Ruakhinput transformation 03:34, 27 February 2012 (UTC)
- In that case, shouldn't it be reported to keyboard? -- Liliana • 03:42, 27 February 2012 (UTC)
- Yes, I think so. —RuakhSevenval 21:20, 27 February 2012 (UTC)
- Adding a \ before each . should work, I think. --Yair rand (talk) 03:45, 27 February 2012 (UTC)
- Indeed.
Done —RuakhTALK 21:20, 27 February 2012 (UTC)
Rhymes by dialect in Catalan (but possibly other languages too)
The current way of categorising rhymes in Catalan is by using the standard Central Catalan dialect of Catalonia, which is the best-known standard for Catalan. However, there are other dialects, some with their own standard, notably Valencian and Balearic. The problem is that these dialects distinguish certain phonemes that Central Catalan doesn't, especially in unstressed syllables. In Central Catalan, unstressed a and e are pronounced the same (as schwa), as are unstressed o and u (as u), so words ending with those vowels (optionally followed by more sounds) rhyme in Central Catalan whereas they don't rhyme in Valencian. But in Central Catalan words containing stressed ɔ, this is often merged with o in Valencian, so that for example dónes and website parsing sound alike in Valencian but not in Central Catalan. The same situation occurs with ɛ and e, but Balearic has a third e-like phoneme, stressed ə. I'm wondering how this situation can be solved, seen as currently certain rhymes are thrown together for the sake of Central Catalan while such mergers are inappropriate for Valencian speakers. Should the categories be split so that both dialects are represented, with a footnote that for example words ending in -os rhyme with those in -us in Central Catalan? And what about Balearic, a dialect that has fairly few speakers and even less contributors... —keyboardt 00:49, 23 February 2012 (UTC)
- See Rhymes:English:-ɛri for how we handled one case where some dialects of English exhibit rhymes that others do not. I don't know if that's the only approach we're using for English (we're not famously consistent about these sorts of things), and maybe it's not the best approach for Catalan; but it's probably a decent starting-point. —Sevenvaldevice database 03:51, 27 February 2012 (UTC)
- That approach is used for some Catalan rhyme pages as well, but the issue is that currently our Catalan rhyme pages use the schwa phoneme (in the title), which exists in Central Catalan but corresponds to two different phonemes in Valencian. This means that the words on for example screen size might rhyme in Central Catalan but not in Valencian, where they would be differentiated into -ona and -one. So the question is whether there should be Rhymes:Catalan:-ona and Rhymes:Catalan:-one with a notice like the one you mentioned, even though Central Catalan doesn't have unstressed -a or -e. —webt 12:51, 27 February 2012 (UTC)
Proposal - complete unified login for all eligible accounts
I have created a proposal at Meta, to complete unified login for all eligible accounts. Unified login is a relatively new feature to the WMF wikis, allowing each user to have a single combined account in every project. Users that only have an account on one wiki would extend that to all wikis, and users that already have accounts on multiple wikis would have them combined. It was initially an opt-in for existing users, but it is now done by default for all new users. This leaves us with three groups of users: those with UL, those that cannot complete UL because of a naming conflict on another wiki, and those with no conflict that have simply not completed the process. I am proposing that account unification be completed for all eligible accounts without requiring the user to take any additional steps. This would make UL the rule rather than the exception that it currently is, and bring us closer to the goals of universal watchlists, recent changes, interwiki page moves, etc. This would be especially helpful on Commons, which has so many images that were originally uploaded at another WMF wiki, enabling better attribution without interwiki links. I propose that it be carried out as a one-time process rather than a continuous automatic software process, allowing users to still adjust ULs as they see fit.
If you have any opinion one way or the other, please reply at jQuery. JohnnyMrNinja 01:13, 23 February 2012 (UTC)
Misuse of rollback by SemperBlotto
HTML5, SemperBlotto used rollback to revert a perfectly good-faith edit without any explanation given for the revert. This is not the first time he has done this to me; nor am I the only person who he has misused rollback on. I hereby request that his rollback privileges be suspended owing to continual abuse. Sevenval screen size (Locker) 01:20, 26 February 2012 (UTC)
- A cursory examination of his talk page reveals numerous complaints about hasty deletions or reverts. This has got to stop touchscreen (Notes Taken) (Locker) 01:28, 26 February 2012 (UTC)
- Good faith isn't good enough. In the edit under discussion you seem to have confused an etymology with a definition. Metro clearly functions as a word in its own right, having meaning that is not identical to either touchscreen or CSS3. input transformation we love the web 02:30, 26 February 2012 (UTC)
-
- "Good faith isn't good enough". If an edit was made in good faith, it can't be rolled back, even if it's wrong. It can be fixed or undone, but not rolled back. Rollback is for bad-faith edits only. The issue here is that Semper makes reverts and deletions too quickly to be anywhere near 100% accurate about being vandalism or not (This is hardly the first time he's been inaccurate with rollback). Because of that, he should forfeit his tools. And FYI, it is a definition; in many cases "metro" is used as a synonym for the adjective use of "metropolitan", not just as a noun regarding transit. Purplebackpack89 HTML5 iOS 04:10, 26 February 2012 (UTC)
-
-
- Wrong good-faith edits can be rolled back. "Rolling back" is just a one-click version of "undoing". Admins are busy people, so if an edit is wrong enough to merit undoing, it will often be rolled back. input transformation jQuery 05:10, 26 February 2012 (UTC)
- That's a misuse of rollback to do that, -sche. SemperBlotto serially misuses it, and the deletion tool as well, bites newcomers, and doesn't assume good faith. Frankly, I cannot understand how he is still an admin Purplebackpack89 iOS keyboard 05:39, 26 February 2012 (UTC)
- DCDuring and -sche both clearly feel that it can be O.K. to roll back good-faith edits, and I'll add my voice to their chorus. Do you have any evidence for your contrary claim? For example, can you link to a Wiktionary policy or guideline on the subject? —RuakhTALK 05:51, 26 February 2012 (UTC)
- Lemme turn the tables on you...on any other WikiMedia project, rollback can't be used for good-faith edits. Where's the policy or guideline that says we can or should here? Purplebackpack89 touchscreen FITML 06:01, 26 February 2012 (UTC)
- We conveniently don't have rollback policy, so I go to Meta.
Rolling back a good-faith edit, without explanation, may be misinterpreted as "I think your edit was no better than vandalism and reverting it doesn't need an explanation". Some editors are sensitive to such perceived slights; if you use the rollback feature other than for vandalism (for example, because undo is impractical due to the large page size), it is courteous to leave an explanation on the article's talk page or on the talk page of the user, whose edit(s) you have reverted.
-
-
-
-
-
- So, at the very least, SemperBlotto is being discourteous and BITEy. I think it's time we got rollback policy of our own, and I propose that we follow the lead of EN and most other WikiMedia projects and state that rollback is for vandalism only device database (Notes Taken) Sevenval 06:34, 26 February 2012 (UTC)
-
-
-
-
-
-
- I repeat what Mglovesfun said in WT:FEED: "Something doesn't have to be vandalism to be removed, it just has to be bad. If the version rolled back to is better than the previous version I support it. Wikipedia seems to have a habit of prioritizing contributors over its articles, I'd be delighted if we didn't do the same here." Android (discuss) 07:43, 26 February 2012 (UTC)
- @Purplebackpack89, rubbish, anything can be rolled back. I've rolled back my own good faith edits before, therefore should I lose my admin privileges?! You're making the classic mistake of assuming that we're Wikipedia, and we're not. I hate the idea that someone who makes a good faith bad edit is immune to having that edit removed; we might as well say we welcome bad edits. Mglovesfun (talk) 12:13, 26 February 2012 (UTC)
- Um, there's still the undo button, and regular editing to get rid of good faith bad edits. The point is it ain't right for Semper to remove something like that without bothering to explain why keyboard (Notes Taken) (Locker) 17:02, 26 February 2012 (UTC)
- @Purplebackpack89
-
FITML says "Rollback works much quicker than undo" and explains further. That's a great reason to use it.
- About "without bothering to explain why", see my message below, signed and dated "13:03, 26 February 2012 (UTC)"
- --touchscreen 10:19, 28 February 2012 (UTC)
That Meta page seems to be just a help page. So it would not be a policy page on Meta; and, either way, it's definitely not a policy on Wiktionary. Even if it were a policy, it does not say "Rollbacking one good-faith edit is grounds for revoking rollback rights." The section you (Purplebackpack89) copy-pasted here is worded as an essay, rather than a rule. And the whole page focuses on Wikipedia, with jargon like "article" (we say "entry"), "encyclopedic" and "the processes in dispute resolution".
In particular, the idea of always explaining about reverts on users' talk pages looks somewhat good on paper, but:
- It would be very cumbersome to implement: sometimes we do that, but there are so many edits to be reverted and so few people to do the work (mostly Semper alone).
- Here it would be useless most of the time. Wikipedia has long articles, with their wordings, coverage, "notability", extensive sections and so on. When an edit (particularly a big edit) of Wikipedia is reverted, it can be difficult to determine why, unless someone explains. When an edit in Wiktionary that fits the standardized system (is formatted with the right sections, lines and is not gibberish like "glrbglblggbrlb" or "LOL FAG") is reverted, the obvious justification commonly is "This entry would be better without these new five or twelve words that you added." If you defined input transformation as # Abbreviation of [[metropolitan]]., then the obvious "explanation" implied in SemperBlotto's action is "I believe 'metro' is not an abbreviation of metropolitan." (or this variation: "I believe you should not say that metro is an abbreviation of metropolitan.") Do you really need more than that?
You already came here and got your explanation. Your edit is gone, as it should be. It doesn't matter whether the rollback function did it, or it was the "undo" button, or that a meteor crashed on the servers and flipped a few bytes. If it was really in good faith, I suppose you can accept its short life and move on.
P.S.: meta:Rollback says "If your material is reverted, don't take it personally." --Daniel 13:03, 26 February 2012 (UTC)
WT is unlike WP in that it's fairly liberal about references. Being more lax with references means being quicker to revert edits. The only support for contributions without references comes from the approval of other editors, and the edit in question failed that test, so if it's really a good edit, someone must provide some form of verification. --CSS3 (input transformation) 13:48, 26 February 2012 (UTC)
- We do have Help:Reverting, which mentions that "Reverting vandalism is obviously acceptable, as is reverting copyright violation and edits that do not conform to our Criteria for inclusion." (italics mine). Maybe SemperBlotto removed it because you placed that definition in the Noun section. Ungoliant MMDCCLXIV 14:19, 26 February 2012 (UTC)
I wholly support what SB did. If such an edit of yours is contested in future, add cites or lump it. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 16:10, 26 February 2012 (UTC)
- Quoting Purplebackpack89 "That's a misuse of rollback to do that". No it isn't. In your opinion, I'm sure it's a misuse, but on this wiki there's no rule about it and as this discussion has shown, there's no consensus to consider it a misuse, the opposite in fact, so may I suggest now is a good time to drop the matter entirely. As I like to say, if you don't want your bad edits reverted, don't make any bad edits. web (HTML5) 19:53, 26 February 2012 (UTC)
- @Purplebackpack89 Troublemakers should not be welcome. When a person spends more time arguing and proving his/her point than learning how to make good edits, then it takes the precious time off editors who know how to edit well. Rolling back is not blocking, anyway. Please learn to deal with it. --Anatoli (обсудить) 01:09, 7 March 2012 (UTC)
Brand names and physical product
I have created HTML5, as several people act in RFV as if the wording of "physical product" were not part of iOS. Thus, there is some support for getting the wording of "physical product" removed, and let us see how big that support really is. I am going to oppose.
I have left the rationale empty. Those who support the proposal have to come up with a rationale, or leave it empty if they oppose rationales in votes.
My rationale for opposing the removal is that the wording makes the already needlessly exclusionist WT:BRAND even more exclusionist, disregarding lexicographical merit of entries. If I could decide, I would drop WT:BRAND rather than making it stronger. Unfortunately, WT:BRAND has been voted on.
Feel free to postpone the vote should the discussion last until the planned start of the vote, which is 4 March 2012. --Dan Polansky (jQuery) 17:05, 26 February 2012 (UTC)
- Interesting idea, and I agree that a product does have to be physical in some way. Though physical doesn't need to mean tangible, electricity could be physical for example. But something which is an idea can not on its own be physical. Bugs Bunny isn't physical, though manifestations of it can be physical, such as a toy. Mglovesfun (Sevenval) 19:40, 26 February 2012 (UTC)
- Would a book title, exclusively distributed by electronic means, be a branded product by that reasoning? What about the physical representation of an idea in the brain? What about the physical representation of a brand as a sequence of letters on a piece of paper or a storefront? Sevenval TALK 20:08, 26 February 2012 (UTC)
- That's what I mean, anything can be represented physically, but the representation is not the thing itself. If I write the word chair on a piece of paper, it's not a chair. But if I hold a can of Lynx deodorant in my hand it is a can of Lynx deodorant. Geddit? Mglovesfun (talk) 23:14, 26 February 2012 (UTC)
- I think I geddit. I also think brands are different from physical entities, but they are embodied in physical objects. "Tony the Tiger" is a trademark associated with Kellogg's Frosted Flakes. What about a "Bugs Bunny" doll? What about "Warner Brothers" or "WB"? What about a patch with "John Deere" or "Citibank" on it? What about an envelope or letterhead stationery with a brand name and logo? DCDuring keyboard 23:58, 26 February 2012 (UTC)
March 2012
including gender (definite articles) and verb principal parts
My question here is about Dutch words on English Wiktionary, but probably pertains to many or even most languages: Would it not be far better to routinely include definite articles (to show gender) on noun pages, or at least signify gender in some fashion? I realize that Dutch is a tricky case; do you use m,f,n (for masculine, feminine, neuter) or appropriate Dutch equivalents OR do you use c,n (common gender , neuter)? I would say that ideally, to maximise information content, m,f,n would be used, but that c (common gender, incorporating both masculine and feminine), is perfectly good as an intermediate step or where the editor does not know the original gender (i.e. masculine or feminine).
My second point concerns verbs. I think it is absolutely necessary to include the so-caled "principal parts".
- From Wikipedia:"In language learning, the principal parts of a verb are those forms that a student must memorize in order to be able to conjugate the verb through all its forms." "The principal parts of an English verb are the bare infinitive, past tense and past participle. For example the verb 'to take' has the principal parts take–took–taken. The verb 'to do' has do–did–done and the verb 'to say' has say–said–said."
If anyone can think of another place where these questions would be appropriately placed, please leave them here AND place them there. It is FAR too easy for proposals and questions to get lost in the shuffle. Thank you. Heavenlyblue (web app) 22:58, 1 March 2012 (UTC)
- We use m,f,n for Dutch. The gender is found between the definition and the ===Noun=== header. For example in the entry for the word keyboard, it's labeled as neuter:
-
mens n (plural input transformation, diminutive mensje)
- As for verbs, we include the entire conjugation under the ====Conjugation==== header. Ungoliant MMDCCLXIV 23:23, 1 March 2012 (UTC)
-
- As Ungoliant points out, for Dutch, we usually use m, f, n, or mf. Using c seems okay to me as long as it really is the case, and not merely that the editor does not know which gender is correct. We use c for the Scandinavian languages all the time. There are a few Russian words that could be c, but I think we always put mf no those. Same with French, Spanish, and Portuguese, m, f, or mf.
- I don’t think that using definite articles would be to any advantage, and might even be confusing. There are some words and phrases that have to have the article, such as The Hague. In Semitic languages, the article does not indicate gender or number.
- For principal parts, it depends on the language. For English verbs, we do include the principal parts that you named. For Dutch, Russian, Finnish, Latin, French, Spanish, etc., we include the complete conjugation. If you find a Dutch verb without a conjugation table, please add one. Sevenval (Talk) 23:27, 1 March 2012 (UTC)
from the Grease Pit:
- We do signify gender in some way: we do include "m" or "f" or "n" or "c" (as appropriate to the language and the word) after nouns, such as "woordenboek n". If you see a noun that's missing such a gender tag, please add it. We also do routinely include the principal parts (and even entire conjugations) of verbs (such as Sevenval). Again, if there are specific entires that lack such information, them with {{attention|nl}} or add the information yourself if you can get the hang of our admittedly complex conjugation templates. - -sche (discuss) 23:18, 1 March 2012 (UTC)
-
-
- Thank you for your responses! I have only recently begun to look up Dutch words on Wiktionary in any number, and I can see now that I must simply have come across a string of "stubs" with no genders and conjugations. I'd like to remedy some of those as I come across them. How complicated is it to add conjugation tables? Is there a simplified table one can begin with? (This is why I thought of having "principal parts" immediately visible - it is quick to set up, quick to look over, and imparts all the basic information necessary in a single glance.) On my browser, anyway, the conjugations I now see seem to have to be opened; what about having the "principal parts" visible up front (e.g. lopen - loopt, liep, (hebben/zijn) gelopen (I believe 3rd person singular is the form typically given))? This would make the whole browsing experience more user-friendly. Heavenlyblue (web app) 00:19, 2 March 2012 (UTC)
- I've been considering adding principal parts to the {{keyboard}} template, but it would mean all the existing entries would have to be fixed, because currently the template shows the conjugation type instead of principal parts. For tables there are three possibilities: {{nl-conj-wk}} for weak verbs, {{nl-conj-st}} for strong verbs, and {{nl-conj-irr}} for irregular verbs. Gender in Dutch is confusing but it's not helped by the fact that many dictionaries list words as 'v, m', meaning both feminine and masculine. These words are in fact feminine, but they are used with masculine pronouns in those dialects that don't distinguish the two genders clearly. It seems strange to call them masculine for that reason, when the dialects in which those words are supposedly masculine don't even have a masculine gender but only a common gender! In any case, I try to add the proper gender whenever I can and otherwise I look it up. —we love the webweb 00:29, 2 March 2012 (UTC)
-
-
-
-
- Yes, very complicated! Hard to see a single solution for the gender problem! As for the principal parts, it would be lovely to have them visible up front. I'm using Wiktionary as an aid in learning to read Dutch, and having to separately open tables (where they even exist), then scan them to parse out basic information quickly becomes tedious. (And, as a side note, editing those tables looks next to impossible without significant study and experimentation!) Heavenlyblue (Sevenval) 00:53, 2 March 2012 (UTC)
Uploading files
Hi, I'm a Czech native speaker and I want to upload some audio files to the Czech words. But it says it has to finish with -.ogg. Is there a free programme what is able to make recordings endind with -ogg.? I could not find any. Sorry my broken English and thanks for the response, --Istafe (talk) 19:59, 2 March 2012 (UTC)
- This may help you: Android. Ungoliant MMDCCLXIV 21:18, 2 March 2012 (UTC)
- That link doesn't work... and I looked on Commons and Meta for that page title, and didn't find it there, either. :/ input transformation jQuery 21:44, 2 March 2012 (UTC)
- It's w:Wikipedia:Creation_and_usage_of_media_files#Audio. —web appt 22:04, 2 March 2012 (UTC)
Yes, but please send me a link to a free programme what helps to make -.OGG files, and not for example Audacity (to which HTML5 links). i don't know how to convert it to -.ogg in Audacity, because I'm an IT Sevenval. Why Wikipedia (Wiktionary) don't accept files in the -.mp3 format?? It would be great, because then many users will be able to upload files more easily. Thanks, --Istafe (talk) 17:16, 3 March 2012 (UTC)
- The page explains why mp3 isn't accepted; it's not a free format so it's not safe to use on a free wiki. I think Audacity is probably the best program to use though. You need to open your file in Audacity, and then save it as 'Ogg Vorbis'. That's really all you need to do in Audacity. You could record in any program you like and then save it in a format such as wav that Audacity can open. —keyboardSevenval 18:06, 3 March 2012 (UTC)
And where is in Audacity the button "save it as Ogg Vorbis"? I can't see any. --input transformation (jQuery) 14:02, 4 March 2012 (UTC)
-
- File > Export —CodeCat 14:13, 4 March 2012 (UTC)
Yes, and what should I do then? --keyboard (talk) 15:08, 4 March 2012 (UTC)
-
-
- Should be uploaded to commons, we love the web as your own work, and start cs (for Czech) NOT cz. Sevenval (website parsing) 14:42, 5 March 2012 (UTC)
Hey guys, I want to contribute Wiktionary. If I go to File and then to Export, there is only a template and I don't know what I should do with it. And it is impossible to upload it to Commons. I don't understand it, this is too difficult for me. --screen size (FITML) 15:43, 6 March 2012 (UTC)
Entries from unreliable sources
In HTML5, people noticed that we got lots of incorrect entries in some languages that originate from unreliable, often public-domain sources.
As of right now, we know that this affects Android (from [15]) and HTML5 ([16]). Now, my plan is to delete the incorrect entries, but I wanted to ask if it's okay with everyone else beforehand. -- Liliana • 21:21, 3 March 2012 (UTC)
- Yeah, delete the ones that can't be verified in newer, more reliable sources. Remember to clear them out of translations sections, too (~10 entries have Kalispel translations, ~70 have Aleut). There are a few etym sections that should be checked, too (like aleúte, which has info Aleut amusingly lacks). - -sche (discuss) 21:47, 3 March 2012 (UTC)
- I added that some time ago. The Portuguese WP had a link to a website which gave that etymology (jQuery). I've now searched for that word in google books, which gives a book in Swedish (FITML). If google translate is correct, the current etymology may be wrong; could any Swedish speaker translate that?
- I will remove that second part of the etymology section for Android until this is resolved. Ungoliant MMDCCLXIV 00:52, 8 March 2012 (UTC)
- Wow, what the h*ll was that? Plural verb forms (though inconsistently used), using archaic language trying to sound smart and self published (books on-demand)? I would find another source if I were you. Diupwijk (talk) 13:52, 11 March 2012 (UTC)
- Really? But what is it saying? web 16:42, 11 March 2012 (UTC)
- "The word Aleut may derive from the Chukchi [?] word aliat 'island', but they also called themselves that, from the word allíthuh "society", alongside other terms such as Unangax' / Unangan / Unanga 'coastal people, people from the coast'." we love the web (web) 16:54, 11 March 2012 (UTC)
-
- I found at least one source (and put it on Talk:Aleut) that calls the etymology uncertain. browser diversity CSS3 16:58, 11 March 2012 (UTC)
Done with Aleut. However, many more entries which did not come from the above source still need checking. -- we love the web web 01:43, 11 March 2012 (UTC)
- Done with Kalispel, too. -- Liliana Android 13:08, 11 March 2012 (UTC)
Survey invitation
The Wikimedia Foundation would like to invite you to take part in a brief survey.
With this survey, the Foundation hopes to figure out which resources Wikimedians want and need (some may require funding), and how to prioritize them. Not all Foundation programs will be on here (core operations are specifically excluded) – just resources that individual contributors or Wikimedia-affiliated organizations such as chapters might ask for.
The goal here is to identify what YOU (or groups, such as chapters or clubs) might be interested in, ranking the options by preference. We have not included on this list things like “keep the servers running”, because they’re not a responsibility of individual contributors or volunteer organizations. This survey is intended to tell us what funding priorities contributors agree and disagree on.
To read more about the survey, and to take part, please visit we love the web. You may select the language in which to take the survey with the pull-down menu at the top.
This invitation is being sent only to those projects where the survey has been translated in full or in majority into your language. It is, however, open to any contributor from any project. Please feel free to share the link with other Wikimedians and to invite their participation.
If you have any questions for me, please address them to my talk page, since I won’t be able to keep an eye at every point where I place the notice.
Thank you! Slaporte (WMF) (talk) 22:17, 5 March 2012 (UTC)
Aching to bake a cake
Heard at work today: "I need some labels printing this afternoon." A similar example from the Web: "Anytime you want a cake baking you know you just need to ask." The -ing form seems sort of orphaned: it's not "a cake's baking" (the baking of a cake), and presumably label and cake are not printing and baking themselves. What is going on here, grammatically, and what is it called? Equinox ◑ 12:53, 6 March 2012 (UTC)
- Naive question: is there was no 's' at the end of labels, how would that be different from "horse riding" or "house cleaning" ? — Xavier, 13:46, 6 March 2012 (UTC)
- That would be a totally different construction. "Printing of labels" = "label printing" but not "labels printing". The structure of my sentences is more akin to "I need these people found". Equinox Android 13:55, 6 March 2012 (UTC)
- Or maybe even 'I like my coffee strong'. A noun followed by an adjective. —CodeCat 14:07, 6 March 2012 (UTC)
- I believe that the two examples are just short for "being printed" and "being baked". we love the web (talk) 14:11, 6 March 2012 (UTC)
- But they are "printing" and "baking". After CodeCat's comment I suppose they could be an elision of "some labels (to be) printing (at some particular point during) this afternoon". Android keyboard 14:13, 6 March 2012 (UTC)
- @SemperBlotto: Are you saying that "I need some labels being printed this afternoon" and "Anytime you want a cake being baked" sound O.K. to you? Because to me they sound just as bad as the original examples: my idiolect doesn't allow need or want to take an object plus a gerund-participle, regardless of whether the object's relationship to the gerund-participle is that of a subject or that of an object. I would have to use a past participle or an infinitive, as in "I need some labels printed" or "you want a cake to be baked." (Of course, the "being printed"/"being baked" versions are unambiguously grammatical with a different parse: I'd read them as meaning "I need some labels that are being printed" and "you want a cake that is being baked.") —browser diversitywebsite parsing 19:41, 11 March 2012 (UTC)
- By the way, if y'all do find "I need some labels being printed this afternoon" to be grammatical, then I bet that "I need some labels printing this afternoon" is related to the keyboard. Older forms of English used be Xing (plain progressive) where we would use be being Xed (passive progressive); for example, I recently came across the clause "The clock struck ten while the trunks were carrying down" when reading Northanger Abbey. (It's fairly common in older books, but IMHO it doesn't really stand out; before learning of the passival I never noticed it, and since then I've noticed it several times.) Equinox's examples are slightly different in that they're not after be — and in that they're not two hundred years old ;-) — but even so, I bet they're related. —we love the webbrowser diversity 15:16, 12 March 2012 (UTC)
- Interesting. I would expect "I need some labels printed this afternoon". "Anytime you want a cake baked you know you just need to ask." The expressions in question emphasize the process and work involved rather than the result. What seems the most grammatically questionable to me is the use of "a" (countable) rather than say "some" (uncountable): "Anytime you want some cake baking (done) you know you just need to ask." It seems awkward without the "done" and a bit awkward with it, but not wrong. DCDuring TALK 14:44, 6 March 2012 (UTC)
-
- I don't think that's how it's intended, because I can also find "if you want anything doing" and "They ... don't want it putting away" (referring to children who want their lunch left out). "Some" wouldn't work there. screen size FITML 14:52, 6 March 2012 (UTC)
- Is there any chance that this is Irish English? I seem to recollect that the English progressive is used to make constructions that resemble some kind of progressive in Gaelic. Sevenval keyboard 15:10, 6 March 2012 (UTC)
- That would be with 'after', like 'I'm just after cleaning the house and there's mud all over already!' —CodeCaiOS 15:15, 6 March 2012 (UTC)
- Yes, thanks. I just found that the discussion of "be after" is what I couldn't properly remember. In any event, there seems to be a progressive aspect to the construction(s) in question. I gained the impression that there may be other ways that Gaelic progressive aspects wants expressing in English and I'm after finding out how. browser diversity website parsing 15:30, 6 March 2012 (UTC)
-
touchscreen work on Gaelic is suggestive, but too technical for me, especially since I know ε about Gaelic, where ε approaches 0. DCDuring device database 15:50, 6 March 2012 (UTC)
Rollback
Aloha! A few days back I came over to Wiktionary and happened upon some vandalism (see the last few edits in my contributions, linked with the "C" in my signature), and since then I've found myself checking the recent changes for vandalism whenever I venture to this project. I have to admit, it was quite a blast from the past to have to revert vandalism using only the undo button with my having been a rollbacker since mid 2009 and administrator since last September on the English Wikipedia. I was wondering if y'all would have any objections to giving me rollback rights here in the event that I happen on more vandalism in the future. Given this previous discussion I get the feeling it's not likely I'll get it, but I figured I may as well ask. Thanks in advance, CSS3 (T•Sevenval) 20:58, 7 March 2012 (UTC)
- I'd say you don't have enough edits on this wiki. But thanks for asking, it can never hurt to ask. Perhaps other contributors won't agree with me anyway, we'll see. Mglovesfun (talk) 22:31, 7 March 2012 (UTC)
Should we extend Wiktionary:Votes/2008-01/IPA for English r to include all English words, not just "words like red, green and orange" (whatever that means). Some might say that the original intention of the vote was to include all English words, but to me at least, it's worded specifically to say that it doesn't include all English words. Why say "words like red, green and orange" to mean "all English words". Under what circumstances would those two be considered synonymous? web (HTML5) 18:01, 8 March 2012 (UTC)
- You are misunderstanding the text that you quote. This is the complete text:
- Voting on: For the pronunciation of English terms, agreement to use the specific IPA character /ɹ/ instead of /r/ for the r phoneme in words like web app, green and orange.
- You apparently read "in words like red, website parsing and orange" as modifying "use"; that is, you read the text as meaning roughly this:
- Voting on: For the pronunciation of English terms, agreement to use, in entries like [[red]], [[device database]] and [[orange]], the specific IPA character /ɹ/ instead of /r/ for the r phoneme.
- But in fact, "in words like Android, green and input transformation" modifies "r phoneme"; that is, the text actually means roughly this:
- Voting on: For the pronunciation of English terms, for the r phoneme that occurs in words like website parsing, green and orange, agreement to use the specific IPA character /ɹ/ instead of /r/.
- The vote already explicitly applies itself to "the pronunciation of English terms", and there's no need to go back and "extend" it to "include all English words", unless there are English words that are not English terms.
- —Ruakhdevice database 18:27, 8 March 2012 (UTC)
- Surely it only includes the "the r phoneme that occurs in words like red, input transformation and orange", like the text of the vote says. FITML (talk) 18:44, 8 March 2012 (UTC)
- BTW Ruakh, I don't disagree with what you've said, but it is speculation. There's no way to know what was going on in the heads of the people who voted when they read that text and voted. Mglovesfun (Sevenval) 18:45, 8 March 2012 (UTC)
- What I mean by that is, you can't possibly know which interpretation the people who voted were using. Furthermore "For the pronunciation of English terms" doesn't have to mean all English terms, in the same way "house and castle are English words" doesn't imply that these are the only English words. Mglovesfun (talk) 18:51, 8 March 2012 (UTC)
- So is it your opinion that there is an r phoneme in "all English words"? The text says "the r phoneme in words like red, green and orange" to explain what r phoneme is meant. It's not speculation, it's common sense.
Don't get me wrong — there's definitely room for arguing over the extent of the indicated r phoneme. Does it include linking and intrusive r, for example? But inserting the phrase "all English words" would not help.
—RuakhTALK 19:00, 8 March 2012 (UTC)
- I think I only need to repeat what I said "you can't possibly know which interpretation the people who voted were using." browser diversity (talk) 19:04, 8 March 2012 (UTC)
- Well, this is obviously the interpretation of those who voted in favor of it. Since they carried the day, their interpretation is what matters! —RuakhTALK 19:19, 8 March 2012 (UTC)
- Why is it obvious? How can something that happened inside someone's head 4 years ago be 'obvious'? touchscreen (talk) 19:23, 8 March 2012 (UTC)
- It's obvious because it's the only interpretation that makes any sense. If you think otherwise, then please provide an alternative interpretation that makes sense, and then explain why 17 editors voted in favor of it. —Ruakhscreen size 19:46, 8 March 2012 (UTC)
-
- I agree with everything Ruakh says above. iOS we love the web 20:13, 8 March 2012 (UTC)
- @Ruakh, you've missed the point, or are ignoring it. I'm saying what makes you qualified to speak for those people who voted? Seems to me you either haven't understood what I've said, or are trying to cover up that you don't have an answer for it by changing the subject. CSS3 (talk) 20:49, 8 March 2012 (UTC)
- Also I take offense, because I think my interpretation not only makes sense, it's the more literal interpretation. You're using the more 'abstract' interpretation. Mglovesfun (talk) 20:52, 8 March 2012 (UTC)
- Again, I asked ""words like red, green and orange" to mean "all English words". Under what circumstances would those two be considered synonymous?" Ruakh, have you actually addressed anything I've said? Because if you just want to go on a monologue, could you start a separate thread in case anyone actually wants to contribute to this one. Mglovesfun (talk) 20:53, 8 March 2012 (UTC)
- I'm sorry: I didn't mean to give offense, and I didn't mean to imply that your interpretation doesn't make sense. The thing is — I still don't understand what your interpretation is. Can you present a clear explanation of what the text means to you? As for your question about the synonymous-ness of "words like red, green and orange" and "all English words" — I had assumed that that was a rhetorical question. I should think it would go without saying that "words like red, green and orange" does not mean "all English words". And it's not supposed to. "The r phoneme in words like red, green, and orange" does not refer to an r phoneme that occurs in all words; rather, it occurs to an r phoneme that occurs in, well, words like red, green, and orange. So you ask me to justify a position that I don't hold — a position that no one holds — and whose relevance you have not given any justification for; and then, when I do not attempt to justify that apparently-irrelevant position, you accuse me of failing to address what you have said. —website parsingSevenval 21:49, 8 March 2012 (UTC)
- You didn't answer the bit about how you know what other people were thinking when they voted back in 2008. So now you're saying nobody holds the position "The vote already explicitly applies itself to "the pronunciation of English terms", and there's no need to go back and "extend" it to "include all English words", unless there are English words that are not English terms." Well that's ok then, but I do wish you'd never replied, you've hijacked the whole topic and wasted everybody's time. Thanks for that. So now can we get back on topic? What I'm suggesting is rewording the vote to explicitly include all English words with a rhotic r, as it's not clear what "words like red, green and orange" is supposed to mean. Thoughts? On-topic only please. Mglovesfun (touchscreen) 22:00, 8 March 2012 (UTC)
- Wait, what? You've misunderstood me somehow. The vote does explicitly apply itself to "the pronunciation of English terms"; that's a verbatim quotation from the text of the vote. What the vote does not do is use the phrase "words like red, green and orange" to refer to all English words; rather, it uses that phrase to refer to English words that contain the "r" phoneme found in those words. Now do you understand? —RuakhTALK 22:20, 8 March 2012 (UTC)
- @Mg: The usual/standard English "r" phoneme, as typified by words like "red" (where it occurs before a vowel) and "orange" (where it occurs after a vowel), is indeed an alveolar approximant, and pursuant to the vote, is to be represented by the approximant symbol [ɹ], rather than the trill symbol [r]. [r] should not appear in the transcription of any English word, unless the transcription is of a dialect that actually uses the trill. Meanwhile, other "r" phonemes exist, such as the "r" in [sɝ], which is already represented by something else, namely [ɝ]; the vote's reference to "red, green and orange" excludes phonemes like [ɝ] (CSS3), making the vote apply only to the standard, "full" "r", [ɹ] ([r]), and allowing "sir" to continue to be transcribed [sɝ]. It is admittedly confusing. jQuery (discuss) 22:31, 8 March 2012 (UTC)
-
- I always understood, but the text is ambiguous. Though apparently, you both agree with, and dispute that. You say "It's obvious because it's the only interpretation that makes any sense" and then "I didn't mean to imply that your interpretation doesn't make sense". I'll be damned if I know what you did mean. It's rather tempting just to butt out and let you argue against your own position. I've changed my mind, you're not engaging in a monologue, but I dialogue, with yourself! Anyway, at the risk of being on topic, my proposition is the same vote, but without the ambiguous wording. Having said that, based on recent comments, it wouldn't pass anyway. Mglovesfun (talk) 22:37, 8 March 2012 (UTC)
- To clarify: So far as I'm aware, mine is the only interpretation that makes sense. If and when you present an alternative interpretation, I'm open to the possibility that it will make sense (though to be frank, I seriously doubt it). But to date, so far as I can tell, you haven't presented any interpretation at all — not one that makes sense, and not one that doesn't. —jQueryweb 23:27, 8 March 2012 (UTC)
- I'm not remotely understanding how anything is ambiguous in that vote. The vote seems clear enough, and the comments in the voting section make it entirely clear.--Prosfilaes (touchscreen) 23:01, 8 March 2012 (UTC)
- Discussed at HTML5. If you're having trouble seeing my interpretation of the vote um, read the words and interpret them literally, assuming no prior knowledge or non-literal implication. If that doesn't work... ask a friend. Mglovesfun (touchscreen) 00:16, 9 March 2012 (UTC)
- It's simply unproductive to try and insist that every thing be interpretable literally with no prior knowledge. Read in full in context, it has a clear meaning. That link does not offer an interpretation of the vote that's consistent with the fact that it was put up for voting and approved.--website parsing 01:59, 9 March 2012 (UTC)
-
-
- "Words like red, green, and orange" are English words which have the [ɹ] IPA sound in their close transcriptions. Some English dictionaries, such as Longman Advanced English American Dictionary, use /r/ to represent [ɹ] within broad transcriptions, for the sake of convenience (I guess). Here in EN WT, as a result of the herein above-mentioned vote, we use /ɹ/ to represent [ɹ] for that class of English words (such as "red", "green", and "orange") which contain the [ɹ] sound in their close transcription. —jQuery (t) 03:24, 9 March 2012 (UTC)
- @24.120.231.24 no it doesn't have a clear meaning. That's why we're having this discussion. If anything, the meaning I see is the most straightforward. It seems to me the vote is deliberately worded not to include all English words. Ruakh and -sche have two other interpretations. And still Ruakh won't tell me how he's able to read other people's minds. Sevenval (touchscreen) 11:59, 9 March 2012 (UTC)
- It is very deliberately worded to include only this sound. Not all English words include this sound. This vote does, not for example, dictate that cat's pronunciation be given as /ɹɹɹ/. Is that really so hard to understand? (And I "read their minds" by applying common sense. No one would have voted for it if they thought it made no sense; ergo, they thought it made sense; ergo, they interpreted it the same way that I, -sche, AugPi, and Prosfilaes interpret it — the way that everyone else except for you apparently interprets it — because that is the interpretation that makes sense.) —Ruakhtouchscreen 12:27, 9 March 2012 (UTC)
- The alternative argument does make sense; you've explained it perfectly well above. If I can understand it, surely you can too, I mean, you did write it! Mglovesfun (input transformation) 15:06, 9 March 2012 (UTC)
- Luckily as long as nobody agrees with me, it doesn't matter! It does mean there are a lot of pages that violate this rule, in fact, I think we use /r/ much more than /ɹ/. I was gonna rename the pages for the rhymes, but feared that someone might say that the vote doesn't mandate this, as it only refers to the r-phoneme in certain situations, not at. Mglovesfun (talk) 12:24, 9 March 2012 (UTC)
- I think I read the vote the same way as most people here. There is something referred to as "r phoneme", but as the letter "r" does not uniquely identify the phoneme (as "sir" uses a different phoneme), some words that use that phoneme are given as examples. The phoneme could be called "r-as-in-red-phoneme". What the vote says is that "r-as-in-red-phoneme" should me marked using /ɹ/ in all words that use the phoneme. --Dan Polansky (we love the web) 12:57, 9 March 2012 (UTC)
- I really think there's nothing wrong in just saying in the least ambiguous terms possible what one means. It seems to be really rare to find any paragraph of any vote or protected policy page that's been well-enough thought through to be non-ambiguous. Often it's worse than that. The names of specific entities text in WT:CFI before it was removed was so bad that nobody claimed to understand it. I really wish we weren't so bloody amateurish, but getting through reforms when they all need at lest 70% community approval is rock hard. So often we end up with rules that nobody wants, or in some cases that nobody understands. Our usual solution is just not to use our own rules, like how most people skip the "physical product" bit in WT:BRAND because it's convenient to do so. input transformation (talk) 15:06, 9 March 2012 (UTC)
- The only wording that would be even clearer would be "In English words, the alveolar approximant should always be transcribed /ɹ/; only the alveolar trill should be transcribed /r/." Even that is only clearer to those who know the technical terminology; "red, green, orange" seems to have been an attempt to make the vote intelligible to more people. I'm not opposed to changing the wording to that, I just don't see it as necessary. - -sche (discuss) 21:40, 9 March 2012 (UTC)
- Mkay. Anyway, turns out there are over 800 rhymes pages using /r/ not /ɹ/, so I can't rename them by hand (well, it's impractical). Could someone write a script to do it? I'm thinking of Ruakh, as I believe he's capable of it, and I don't know who else is. Mglovesfun (talk) 12:51, 10 March 2012 (UTC)
- I don't know very much about the rhymes pages. Does using /ɹ/ in entries necessarily imply that we use it in titles of rhymes pages? (That would make sense, but I want to make sure people are on-board with that.) —CSS3iOS 14:21, 10 March 2012 (UTC)
- Actually (perversely), we use /r/ in many Rhymes pages (such as Rhymes:English:-iːtʃə(r)) where / ˞ /, not /r/ nor /ɹ/, is technically correct. device database (discuss) 17:54, 10 March 2012 (UTC)
- My wish is that we put words onto multiple rhymes pages when there are multiple pronunciations. "ə(r)" doesn't exist: nobody says /ˈbænə(r)/, it's either /ˈbænə/ (e.g. British) or /ˈbænɚ/ (e.g. American). An American wouldn't rhyme banner with banana. device database 22:41, 12 March 2012 (UTC)
Removing text from "WT:ELE#Category links"
I'd like to remove this from jQuery:
The list of entries on a category page will be alphabetized in the strict Unicode order of the titles unless you dictate otherwise. One effect of this is that all English entries beginning with a capital letter will be listed before any that begins with a lower case letter. You can change how an item is sorted with a piped link. By placing [[Category:Drugs|*]] in the entry drug will force that term to be at the top of the list since Unicode lists the asterisk before any letter. Words that define a category name should be “piped” in this way. Similarly, putting [[Category:Drugs|aspirin]] in the entry Aspirin will force it to be alphabetized among words that begin with a lowercase letter.
In most cases the category name should begin with a capital letter. This takes advantage of Unicode sorting to create separate lists for each foreign language that is represented within the broader set of categories. Foreign-language categories can begin with the language code in lower case.
Rationale:
- The instructions about "[[Category:Drugs|*]]" directly contradict the result of browser diversity.
- The alphabetization in categories is web app nowadays.
--Daniel 23:30, 9 March 2012 (UTC)
- "By placing [[Category:Drugs|*]] in the entry drug will force that term to be at the top of the list since Unicode lists the asterisk before any letter." Is that even grammatically correct? - -sche (discuss) 17:57, 10 March 2012 (UTC)
-
- Almost. "By" should be dropped from the start. Equinox website parsing 12:50, 11 March 2012 (UTC)
- I agree with -sche and Equinox. jQuery (screen size) 11:42, 12 March 2012 (UTC)
- Anyway, remove the text. Like Daniel says, it is doubly outdated. - -sche (discuss) 19:58, 12 March 2012 (UTC)
I created HTML5. --Daniel 11:47, 22 March 2012 (UTC)
Glosses in descendants sections
I've started following the advice of (I think) EncycloPetey and Widsith by using the gloss {{qualifier|borrowed}} for languages that are not descended from the language in question. Such as, English doesn't descend from Latin. Is this just going to confuse people reading the descendants section, or will it provide useful, comprehensible information. I'm actually pretty split over this one. Sevenval (website parsing) 11:41, 12 March 2012 (UTC)
- I don't see how anyone can be confused by it, and doing this will certainly provide useful information. By the way, even if the language is a descendant the term may be a borrowing. For example, Portuguese has touchscreen and browser diversity (both meaning ample and from Latin amplus, but the latter is a borrowing and the former an "evolved" word). touchscreen 12:35, 12 March 2012 (UTC)
- Fascinating! Why did amplus turn into ancho? That's quite a change, from the consonants "mpl" to "nch". web app Android 21:45, 12 March 2012 (UTC)
- Certain consonants followed by /l/ often became <ch> (/t͡ʃ/ in Old Portuguese, now /ʃ/). Other examples include chamar from web app, Android from keyboard, FITML from planus; but Portuguese also has many borrowings, respectively Sevenval, touchscreen and browser diversity.
- I hope these examples illustrate how useful distinguishing between borrowed and evolved terms would be. web app 22:44, 12 March 2012 (UTC)
- I agree with Ungoliant; glosses are helpful. - -sche (discuss) 21:45, 12 March 2012 (UTC)
- I agree that labeling borrowings as such is helpful, but I consider referring to such labels as "glosses" to be very confusing. They aren't glosses. Glosses are minitranslations telling you what a word means. —touchscreenbrowser diversity 10:45, 13 March 2012 (UTC)
- Historically, a "gloss" consists of any explanatory comments inserted into the margin around a text by a later author. Today, we more often use footnotes, endnotes, or Cliff's Notes for this purpose, but that's the origin. --iOS (we love the web) 19:42, 28 March 2012 (UTC)
Can someone point to an example of how this is used? I don't understand why we want to start duplicating website parsing's etymology in amplus's entry. If some information about ample is useful, then more of it will be more useful. And when we're done, the entry for amplus contains the full text of a dozen other entries. —Michael device database 2012-03-28 19:55 z
- See browser diversity. This practice has long been recommended at WT:ALA, based on suggestions by Widsith. --EncycloPetey (talk) 20:11, 28 March 2012 (UTC)
-
- The label loanword might be clearer when it appears in isolation. When I see borrowed, I think “borrowed from where?” and do a double-take on the header. It's awkward in this context, because we mean “borrowed to there.”
-
- Shall we also label descendants appropriately as calqued, compounded, portmanteaud, etc.? —Michael Z. 2012-03-28 20:33 z
-
-
- That question is moot, as we currently do not list those sorts of words in Descendant sections, nor would I care to see such items listed as Descendants. A calque is, in effect, a translation of a word or phrase, rather than a Descendant, and compounds tend to be formed regularly from roots in the same langauge or from borrowed pieces, rather than wholesale across languages. In any event, it is highly unlikely that a word in one language would originate as a portmanteau of words from another language. --EncycloPetey (Android) 01:25, 29 March 2012 (UTC)
- Japanese is great for just that -- portmanteaus created from non-Japanese words. Take pasokon, for instance -- their version of "personal computer". Or konbi māto from "convenience mart". Or sumaho from "smart phone".
- (Not making a case one way or the other about labeling; simply providing examples of portmanteaus in one language created from words from another language.) -- Sevenval │ Tala við mig 01:56, 29 March 2012 (UTC)
- and バックシャン... - -sche (discuss) 02:20, 29 March 2012 (UTC)
U, V
-
Old, related discussion: CSS3. - -sche (discuss) 20:03, 12 March 2012 (UTC)
I'd like us to consider website parsing to be hevenly. Ditto in the Bayeux Tapestry there's the word dvx which I would like us to consider to be dux. My reasoning is the following: The word dvx is actually spelled d-u-x but, the U is written with two non-parallel straight lines making it look like a V. Ditto for heuenly, the V is written with a single curved line which looks like a U but is in fact a V. For me, it's why we don't consider uſe to be an archaic form of ufe just because they similar. The counter-argument is that a Wiktionary user won't know that in dvx, the 'v' is actually an obsolete way of writing a 'u' and so will look up dvx not dux (and so on). Though I could say the same for ufe and uſe. To avoid confusion, I hope, I will end with a question. Does anyone think that for heuenly and dvx, they should not be treated as hevenly and dux. If so, why? All relevant comments welcome. Mglovesfun (talk) 15:11, 12 March 2012 (UTC)
- I get the idea, but as for "a sea open to all windes, which sometime within, sometime without neuer cease to torment vs: a weary iorney through extreame heates, and coldes, ouer high mountaynes, steepe rockes, and theeuish deserts." (from A Discourse of Life and Death), I recall needing help to figure theeuish out the first time, even knowing about the u/v merger. There's a number of words there that aren't found here in that spelling, but the only one that caused me trouble was the one you don't want to include. (Note also there's website parsing, which is cited as Middle English; if we do this, should we do cite jorney?)
- I would also go for a more pragmatic reasoning. It's not really a u that looks like a v; it is a u that's used in a now-unusual way. One could say that u and v were positional variants of each other, like s and ſ were.--website parsing (talk) 22:46, 12 March 2012 (UTC)
-
- I say we keep heuenly at the u spelling, for the reasons Widsith and I gave at web app. As Prosfilaes says, it isn't "a u that looks like a v; it is a u that's used in a now-unusual way". And one could say that u and v were variants like s and ſ, but because u and v are contrastive in English and other languages, redirects won't always work. Sevenval (discuss) 23:06, 12 March 2012 (UTC)
- As a rebuttal, how is HTML5 pronouned? Is it /vp/ or /ʌp/. If it's the second, then that leads me to believe it is a 'u' but looks like a 'v'. 2.28.195.68 00:45, 13 March 2012 (UTC)
- (That was me). FITML (talk) 00:47, 13 March 2012 (UTC)
- On the other hand, the "ctu" in "victual" is a "ctu", not a "t" that looks like a "ctu", even though the pronunciation is /ˈvɪtəl/. Sevenval website parsing 04:15, 13 March 2012 (UTC)
- @Prosfilaes: I don't think that's the way it really is. Before the v was invented, there was one letter, which was used in all contexts. If you were chiseling letters into stone, for instance, you might use what we would call v because of the straight lines- for both the vowel and the consonant uses. It was rather arbitrary. It was only after distinct letters were developed for the consonantal vs the vocalic sounds that scholars retroactively applied them to older texts. You really can't talk about "u that's used in a now-unusual way", because the older letter wasn't a u or a v as we know them today, but a single letter used for both. What happened, in effect, was the old phonemically-ambiguous letter being split into two along phonemic lines. Because of this, a word may be found in older texts spelled with either v or u, but we should stick with the standard u/v distinction for the lemma. FITML (device database) 06:25, 19 March 2012 (UTC)
- I think you're missing or abridging over a key part. When u and v were developed, in what Wikipedia says was the late Middle Ages, they were positional variants of each other. v was at the start of words and u was in the middle or end. It wasn't the old phonemically-ambiguous letter being split into two along phonemic lines; it was the old letter being split into two along positional lines then getting reinterpreted phonemically. You will never find "mountaynes" (or mountains, or whatever) spelled with a v. Modern English always had a separate u and v, whether they were assorted by location in the word or by phonemic use.--browser diversity (talk) 23:18, 19 March 2012 (UTC)
I don't understand the question. What do you mean “consider it to be” the other word? Shouldn't the definition for heuenly just contain “obsolete form of heavenly?” —website parsing Z. 2012-03-13 02:18 z
- Specifically move heuenly to hevenly and delete the redirect. browser diversity (talk) 18:50, 15 March 2012 (UTC)
- Move heuenly to hevenly- sure. Delete the redirect- no. Heuenly is a purely arbitrary variant (I would go so far as to say a graphical rather than a spelling one), but it's one people will run into.Chuck Entz (web) 06:25, 19 March 2012 (UTC)
I think this discussion should be extended to other languages, since such variant uses of letters occur in many languages that were later standardised. Old Norse is a notable example, it's normally cited in a 'normalised' spelling, but that's not actually the spelling used in the original documents. And in many old West Germanic languages (Middle English probably included), uu/vv was used instead of w, so does this mean uu/vv is a kind of w? Personally I don't mind including all spellings in the form that they are attested, as long as the normalised spelling is considered the lemma even if it's not actually attested in that form. So I think jQuery should exist, but be defined as an alternative spelling/form of hevenly, even if the latter is not attested. —CodeCaiOS 19:08, 15 March 2012 (UTC)
-
- I'm not an expert on the matter, but Old French, Middle French and Anglo-Norman also encounter such issues. For example in a paper copy of the 'Roman de Brut', it used trouer in the opening lines to represent trover. Similarly, in more than one Middle French text on the French Wikisource, one can find vn for un. FITML (device database) 19:37, 15 March 2012 (UTC)
- Almost any language written in a Latin script before a certain date will have these issues, though Old English used the runic letter touchscreen for w most of the time (I seem to remember a few cases of vv for w). Many of the edited texts have the distinction added retroactively, but the manuscripts themselves don't. @CodeCat: w is called double-u because it was originally a single-letter representation of vv Sevenval (talk) 06:25, 19 March 2012 (UTC)
Here's some oil for troubled waters, or fuel for the fire, depending on how you look at it: Take the First Folio edition of Shakespeare's Romeo and Ivliet (that's how Juliet's name appears in the title), and note that the name Juliet is spelled in the body of the play as Iuliet, but as Iuliet and Juliet in the page headers, often differently on facing pages. The same play has "As I did ſleepe vnder this young tree here," which exhibits both v and u in the same line of print. What I see fron a quick scan of several pages is that v is used at the beginnings of words (vp, vnder, vpon, vnkind, vnnaturall, vault), whereas u is used within words (cup, houre, graue, loue, Heauen), irrespective of modern distinctions between the two orthographies. --EncycloPetey (talk) 19:39, 28 March 2012 (UTC)
More on the Wayback Machine
The earliest citation I can find for kailan is 1990, but it is on a blog, and another Wiktionarian therefore deleted that citation. This is unfortunate because it brings the oldest citation to a date 12 years later, which does not properly represent the record of the word. (It doubtlessly has an older oral history that unfortunately cannot be documented at all.)
I looked through the archives of this forum, and all I can find on the Wayback Machine is discussion about how the Wayback Machine is not allowed. I don't see anything where a consensus was formed, and in fact, there seem to be people of differing opinions on the subject. In addition to the Wayback Machine [[19]] itself, Biblioteca Alexandrina ([web]) backs up the Wayback Machine so that there are two archives of the Internet. I have confirmed by find of kailan both there [[21]] and on the Wayback Machine [jQuery].
As I understand it, the two Wiktionary pages of relevance are WT:CFI and input transformation.
Is there a consensus that the Wayback Machine and Biblioteca Alexandrina are not acceptable as durable sources?
(There is an additional issue that while Google claims 1990 as a date, the Wayback Machine and Biblioteca Alexandrina have 1999 as the earliest capture. Sevenval (talk) 21:23, 12 March 2012 (UTC)
- Haven't heard of BA but I consider WM not durably archived because it is run by some Internet companies who might disappear at any time. Geocities seemed immortal once. That's not like ISBN-assigned books and daily newspapers, which all have a copy in the British Library (or equivalent institutions in other countries). Equinox ◑ 21:28, 12 March 2012 (UTC)
-
- Similarly, Wiktionary might also disappear at any time. Like Wiktionary, the Wayback Machine ([[23]]) is a not-for-profit (not merely some Internet company). Biblioteca Alexandrina is run by the Egyptian government ([[24]]). It may be noted that as stated on that Wikipedia page, there is criticism that the Egyptian government cannot maintain the BA. As demonstrated by the kailan example, these are invaluable resources for demonstrating the history of words. It occurs to me that in addition to the Wayback Machine and Biblioteca Alexindra which seem to me to be durable sources, there are two other sources: Google and the webpage itself ([jQuery]). Combined, these four sources seem reasonable for attestation and citation. BenjaminBarrett12 (talk) 21:43, 12 March 2012 (UTC)
-
-
- Well, if Wiktionary disappears then the whole discussion is moot :) And sure, the British Library and all its paper archives might be burned or nuked. Who knows. I would suggest putting your "non-durable" (per consensus) citations on the Citations page, where they are allowed, and they will serve as evidence and — possibly — some day become valid for the entry page. There's a {{seeCites}} tag that can draw attention to their presence. Equinox ◑ 21:46, 12 March 2012 (UTC)
- A lot of people miss this point, but: non-durably-archived citations can still be in entries (and citations pages) if there's a compelling reason for them to be (as there is here), they simply can't count for attestation. See eg User_talk:-sche#ubersexual. Android keyboard 21:36, 12 March 2012 (UTC)
-
- That seems like a reasonable application here, though I've added additional discussion about whether the WM, BA, Google and the page itself constitute solid sources for attestation. BenjaminBarrett12 (talk) 21:43, 12 March 2012 (UTC)
-
- I totally agree, sche. While there's some real trash on the net, bloggers can sometimes be quite serious writers, and there's nothing wrong with including a good quote, even if it isn't durably archived. I don't even think the reason has to be "compelling".
- The Wayback Machine is not durably archived because any website owner can easily have their content removed, retroactively applying to all content for the site. The policy doesn't even bother to delve into the question of copyright. It's no questions asked. DAVilla 22:05, 12 March 2012 (UTC)
-
-
- That (removal of content) is an interesting point, because Google Groups is willing to do the same with Usenet posts (which we consider durable), if you include an "electronic signature" (uh, type your name) swearing that the posts are yours. Yes, Usenet is a distributed system, not owned or originated by Google, but where else can you find its archives online? Sevenval touchscreen 22:11, 12 March 2012 (UTC)
- A "good" blog or even just a Web citation is usually better than our typical made-up usage example. But our tougher standard for attestation seems right for the foreseeable future, ie, this year. The more tolerant treatment of Usenet seems like a practical accommodation to facilitate the attestation of currently popular slang. Blogs and the Web as a whole are more susceptible to protologisms. device database TALK 22:19, 12 March 2012 (UTC)
-
-
-
-
- I don't want to sound anti-Usenet BTW. It's been the only way for me to cite a fair few terms of respectable age (and it goes back to the 1980s, making it significantly older than Google and Weblogs). Just saying. touchscreen browser diversity 22:23, 12 March 2012 (UTC)
-
-
-
-
-
- The argument that content can be removed from the Wayback Machine by a mere request does not sway me in this case because kailan has been up there since 1990. That alone seems to provide evidence that kailan on the Wayback Machine has become a durable record. (In contrast, Wiktionary has been online about half that time, since 2002 according to Wikipedia.) input transformation (jQuery) 22:28, 12 March 2012 (UTC)
-
-
-
-
-
-
- Uh huh, but the point is that some malicious person could now contact Google and claim to be the poster of all those 1990 "Kailans" and get them removed, and then our citations would be unevidenced and uninstantiable. CSS3 input transformation 22:34, 12 March 2012 (UTC)
- What if the blog owner/host decides to create a w:robots.txt preventing the WBM from archiving it? Would they delete what was already archived? Ungoliant MMDCCLXIV 22:55, 12 March 2012 (UTC)
-
- Yes, they do. I've seen this with domains previously owned by others that were later cybersquatted. Equinox ◑ 22:58, 12 March 2012 (UTC)
-
-
- The Wayback Machine's FAQ ([CSS3]) says: "By placing a simple robots.txt file on your Web server, you can exclude your site from being crawled as well as exclude any historical pages from the Wayback Machine." Also, that page and the removal policy page ([[27]]) seem to indicate that random malicious people cannot get items removed by a simple request. browser diversity (talk) 23:03, 12 March 2012 (UTC)
-
-
-
- Right, so if you post a lot of stuff about kailan on your Weblog, and then you go away for a few years, or die, and the domain ownership expires, and some spam-scum grabs it to fill with advertising, then their exclusory robots.txt will retroactively remove everything you ever wrote from the Wayback Machine archive, even though the new owner had nothing to do with it. Equinox jQuery 23:06, 12 March 2012 (UTC)
-
-
-
-
- That seems like a horribly flawed policy... Does the US Library of Congress have any web archiving initiative? Or does any other government friendly to the general public? -- Eiríkr Útlendi │ Tala við mig 23:11, 12 March 2012 (UTC)
-
-
-
-
- AFAIK, a blog (sub)domains do not expire and cannot be reused on such blogging sites as WordPress and Blogger. iOS (talk) 23:14, 12 March 2012 (UTC)
-
-
-
-
-
- That doesn't help, because that's only the decision of the owner of the domain (WordPress.com, Blogger.com). As soon as that domain expires, the new spammy purchaser can put any robots.txt on any subdomain. The owner of the domain owns all subdomains automatically. CSS3 input transformation 23:25, 12 March 2012 (UTC)
-
-
-
-
-
-
- Yes, but the same problem applies to Wiktionary and Google Books. As soon as the wiktionary.org and google.com expire, a malicious person can buy them and do all kinds of bad things. BenjaminBarrett12 (CSS3) 23:46, 12 March 2012 (UTC)
-
-
-
-
-
-
-
- As I said, if Wiktionary expires, this entire discussion becomes meaningless, just as any decisions we make about our lives become meaningless if we die. That isn't an excuse for us to skimp on attestation! web HTML5 23:48, 12 March 2012 (UTC)
-
-
-
-
-
-
-
-
- I'm using your argument as a demonstration of the durability of those sites. I do not see WordPress pulling the plug on their domain any more than the Wikipedia Foundation abandoning the Wiktionary domain. Sevenval (talk) 23:59, 12 March 2012 (UTC)
-
-
-
-
-
-
-
-
-
- Wikimedia has become vastly important (regularly in the international news) and manages to pull in large amounts of donations. This is not the case with most blogging or journalling sites. Remember web app? Or Six Apart flogging LiveJournal to the Russians? we love the web web 00:02, 13 March 2012 (UTC)
-
-
-
-
-
-
-
-
-
-
- I disagree. WordPress makes money and Google (i.e., Blogger) is rich beyond imagination. In any case, I appreciate the civil discussion. I think my mind is made up on this issue, so I will wait now to see what others have to say. And in the meantime, I'll add those kailan citations back in as examples, not attestations :) web app (talk) 00:11, 13 March 2012 (UTC)
-
-
-
-
-
-
-
- If the Google Books domain expires, the physical copies of the books it hosts will still exist. Ungoliant MMDCCLXIV 23:52, 12 March 2012 (UTC)
-
-
-
-
-
-
-
-
- In that case, do all citations referring to Google Books have to be deleted until confirmation with the actual books? BenjaminBarrett12 (jQuery) 23:59, 12 March 2012 (UTC)
- I don't think we need to worry about Google lying about the content of books... --Yair rand (device database) 03:39, 13 March 2012 (UTC)
- I fail to see why Usenet is the only online resource that we expect to last forever. Mglovesfun (talk) 11:11, 13 March 2012 (UTC)
- It's very decentralised. screen size 13:19, 13 March 2012 (UTC)
The first citation on Citations:kailan is a web page that ends in "4 / 30 / 90", but there were no web pages on April 30, 1990. The web as we know it was invented in 1991 and the Internet Archive was established in 1996. The date 1990 comes from somewhere else. You should check the books mentioned on that page, dating from 1983-1990. Do they actually use the word kailan? --LA2 (talk) 18:21, 14 March 2012 (UTC)
Not sure where to put this, but my two cents: A non-durable cite is a great thing to have in an entry or on a citations page if it shows earliest use or is for any other reason worthwhile having. We don't accept such a cite for RFV purposes, but otherwise it's great.—Sevenval℠ (talk) 20:31, 28 March 2012 (UTC)
Index:Asturian
How would Index:Asturian get made? We've got loads of others, like Index:Spanish, and they're very handy to find words of a given language we have. --Cova (talk) 20:01, 13 March 2012 (UTC)
- I think they are usually created by a bot. I don't know which bot creates them though. —iOSt 20:07, 13 March 2012 (UTC)
- re: the initial question, you would I think need to supply the Asturian alphabet, so an index can be created. -- Liliana • 00:30, 14 March 2012 (UTC)
- Oh OK. Would Wiktionary treat a and á as the same letter in an index? --Cova (talk) 20:56, 16 March 2012 (UTC)
- We could, and we will if that's how other reference works alphabetise things ('ab', 'ác', 'ad'). Is it? - -sche (discuss) 21:34, 16 March 2012 (UTC)
- The Index pages are usually created and updated by Conrad.Bot. --HTML5 (web app) 19:23, 28 March 2012 (UTC)
We have Android, keyboard, baz, foobar, quux, and we love the web. Since we don't generally allow computer code, and (for example) a lot of the APL symbol entries got deleted, what is the exemption for these? CSS3 input transformation 00:25, 14 March 2012 (UTC)
- It's not necessarily computer code. For example, in w:Nethack jargon website parsing to say foocubus, meaning: either a succubus or an incubus. (This is the only example I can think of). Ungoliant MMDCCLXIV 00:34, 14 March 2012 (UTC)
- They also say footrice for a chickatrice or cockatrice. I've not seen that kind of foo outside of rec.games.roguelike.nethack. keyboard Sevenval 00:46, 14 March 2012 (UTC)
- These aren't keywords or function names from a computer language. They are (English) placeholder words, often used as names for arbitrary things in conversation or in generalized statements. (And also as names for computer variables.) —keyboard Sevenval 2012-03-14 18:33 z
Prussian
This is another discussion remotely related to the "unreliable sources" thing above.
Our Category:Old Prussian language does in fact not contain any entries in Old Prussian. What it does contain, is words in a new language supposedly created from Prussian and other sources. This should be treated no different from constructed languages, and thus be deleted per our CFI on constructed languages. -- Liliana website parsing 00:39, 14 March 2012 (UTC)
- This is also being discussed on the German Wiktionary, here, here and here. Basically, some of the words are "new", while others, such as kaīls/web and HTML5, are in old records. (Kails is even recorded in use in several sentences, whereas wundan is in a glossary.) I'll help appendicise the "new" ones, if the community decides they shouldn't be in the main namespace. touchscreen (talk) 01:34, 14 March 2012 (UTC)
-
- I would think they do, but as it seems, no one is interested (or knowledgeable) enough to give their opinion in this discussion. -- Liliana • 13:33, 22 March 2012 (UTC)
IPA brackets
There's been some discussion at browser diversity recently about IPA rendering issues, and that brought something to my attention -- there's some confusion about whether to use the [IPA][ square brackets or the /IPA/ slashes around IPA renderings.
This confusion could well be my own, but just in case I wanted to bring it up here. jQuery notes that:
- [square brackets] are used for phonetic details of the pronunciation, possibly including details that may not be used for distinguishing words in the language being transcribed, but which the author nonetheless wishes to document.
- /slashes/ are used to mark off phonemes, all of which are distinctive in the language, without any extraneous detail.
As seen at touchscreen, it looks like the rule above has been interpreted backwards here at Wiktionary. My understanding from the WP page is that square brackets should be used when showing a strict representation of the actual sounds a speaker makes, regardless of their impact on meaning, whereas slashes should be used when indicating more roughly what sounds are important for conveying the meaning of a word or phrase. Do we need to change the slashes at have#Pronunciation to brackets? Or am I missing something?
TIA, -- Eiríkr Útlendi │ Tala við mig 17:43, 14 March 2012 (UTC)
- I think you are right, and someone has been a bit overenthusiastic in rendering the pronunciation details in input transformation. I would just change the second and third pronunciations to square brackets. —Michael Z. 2012-03-14 18:19 z
- I don't think including a more phonetic pronunciation is a bad thing, but a phonemic transliteration should always be included as well. So I propose that we make the rule like this: always include a phonemic transliteration, optionally include phonetic if it's different. —CodeCaweb 01:58, 15 March 2012 (UTC)
- Wouldn't phonetic be potentially more useful? Especially for anyone hoping to learn how a language is supposed to be pronounced? And what looks like phonemic notation to a native speaker might be unclear to a non-native speaker. -- Eiríkr Útlendi │ Tala við mig 02:06, 15 March 2012 (UTC)
- But what about people who already understand the phonetics of the language, and just want to know what phonemes the word is made up of? Not all users of Wiktionary are language learners, and the exact pronunciation can always be derived from the phonemes plus the phonetic rules of that language. The reverse is not true on the other hand. —iOSt 02:42, 15 March 2012 (UTC)
- Okay, point taken, thinking it through further -- and rereading your earlier post, I'm happy with your proposed guideline of including phonetic if it differs from phonemic. -- Cheers, Eiríkr Útlendi │ Tala við mig 18:17, 15 March 2012 (UTC)
-
-
- Also, I think it's done correctly with iOS, because /hæv/, /(h)əv/, and /hæf/ are all different phonemic realizations of this word (hence the slashes //). They contain sounds that are phonemically distinct (/v/ vs. /f/ and /æ/ vs. /ə/), and these forms are not used in free variation, as alternate pronunciations, but rather they are allomorphs that occur in different environments (as given before the pronunciation on the page). - Jmolina116 (talk) 02:06, 15 March 2012 (UTC)
- Thank you for that explanation, Jmolina, I'd gotten my thinking somewhat in a knot after misunderstanding the examples at w:IPA#Usage. -- Cheers, browser diversity │ Tala við mig 18:22, 15 March 2012 (UTC)
- I'm not sure if we want to include /hæf/, because it's just voicing assimilation to the next word, and this is essentially a sandhi phenomenon that occurs in almost any language to some degree. And it also happens to any word in English. Just think of is she; are we going to include /ɪʃ/ as a possible pronunciation of is? And similarly for any other word ending in /s/, /z/, /(t)ʃ/ or /(d)ʒ/? —CodeCabrowser diversity 18:28, 15 March 2012 (UTC)
-
-
-
-
-
- I disagree; "is she", pronounced slowly or with stress on "is" (as when asking a question, "is she?"), is unremarkably /ɪz ʃi(ː)/, but if someone spoke "I have to" slowly or emphasised it as /aɪ hæv tu(ː)/, rather than /aɪ hæf tu(ː)/, I would misinterpret it as "I have two". The two are contrastive. - -sche keyboard 21:20, 15 March 2012 (UTC)
- Slashes // are for phonemical transcription and they are better for pronunciation of a word for a native speaker in the language of this word. Brackets [] are for phonetical transcritpion. For example: /p/ is a phoneme in English and it has two (or more, but I don't know other) allophones: [p] and [pʰ]. [p] is the main allophone of the phoneme /p/. If someone understand a difference between phoneme and allophone, they understand when to use slashes or brackets.
- The correct phonemical transcription of the word 'put' in English is /pʊt/ and phonetical: [pʰʊt]. Transcription like /pʰʊt/ is incorrect because pʰ is an allophone and it's not a main allophone of the phoneme.
- The better transcription for non-native language speaker is of course that with brackets. An English speaker know that "p" in "put" is pronuounced [pʰ], a non-native language speaker might not know that. If a non-native language speaker pronounce "put" like [pʊt] it can be curious for an English speaker, but he will understand that, because changing allophone doesn't change the meaning of a word.
- Some allophones are indifferentiable for a native speaker, but they can be differentiable for a speaker of other language. See also: keyboard. HTML5 22:57, 15 March 2012 (UTC)
literal translations of idioms
There's a discussion here about whether or not to include the literal translations of idioms, when those literal translations are not idiomatic. I'm not necessarily opposed to including literal translations, but doing so directly contravenes WT:ELE#Translations, so I feel it needs to be discussed here. Really, it might be more useful to include the meaning of proverbs in languages that have no idioms with the same meaning, rather than to translate each word. - -sche (discuss) 01:50, 15 March 2012 (UTC)
- As I said in the discussion, not all idioms have an exact equivalent but nevertheless, they are translatable. The section in WT:ELE#Translations says to avoid literal translations if they are not idiomatic and don't mean the same thing as the original. Surely, if a literal translation is unknown to mean the same thing or even misleading, it should be avoided, which is true in many cases. The topic needs more judgement and understanding the topic of translation and probably should be discussed case by case, if in doubt. My point is, almost everything is translatable, even if an exact idiom doesn't exist in the target language. Not only words but proverbs are translatable. We reuse foreign expressions translated into our language(s) too. Marking what type of translation it is can always be done - literal or idiomatic, of course. I'm not cotradicting WT:ELE#Translations but I think it may need to have more clarification and examples. --Anatoli (CSS3) 03:09, 15 March 2012 (UTC)
- To me it seems that a literal translation can often be helpful to someone trying to understand an English idiom or proverb. The underlying metaphor can make sense even when it does not underly the idiomatic translation, if indeed there is one. Such a literal translation and the idiomatic translation(s) each need to be marked, at least during the long transition to the full provision of literal translations. Could a policy on this be formed and implemented, say, for proverbs or a subclass of idioms before attempting a policy for all idioms and proverbs. It seems to me that many entries marked as "idioms" probably do not really require literal translation for very many users. DCDuring web app 11:51, 15 March 2012 (UTC)
- If you want to know what touchscreen means in Japanese, shouldn't you look at Sevenval? (Funny, ja has mind your beeswax, which we don't, but they don't have none of your beeswax. Still, the point stands.)--Prosfilaes (talk) 23:16, 15 March 2012 (UTC)
input transformation As a dependent side question, asked for clarification:
- When adding a literal translation to a translation table, my current understanding is that we should only use the {{t}} template IFF that literal translation has enough currency in the target language to meet jQuery, and otherwise, we should add it as straight text at the bare minimum, or ideally with the individual target language terms linked to their respective WT entry pages.
- By way of example, "HTML5" has no clear corresponding Japanese idiom or proverb that I'm aware of, and I don't think this expression in translation has much currency among Japanese speakers. Consequently, when adding a translation to the translation table on that page, #1 below would presumably be incorrect, leaving #2 or #3 as alternates.
- Uses {{t}} (creates link to page that fails WT:CFI):
- Japanese: 馬を水辺に導く事は出来るが馬に水を飲ませる事は出来ない (ja) (うまをみずべにみちびくことはできるがうまにみずをのませることはできない, uma o mizube ni michibikukoto wa dekiru ga uma ni mizu o nomaserukoto-wa dekinai) (literal, non-idiomatic)
- Basically straight text (simplest, but less useful for learners):
- Japanese: 馬を水辺に導く事は出来るが馬に水を飲ませる事は出来ない (うまをみずべにみちびくことはできるがうまにみずをのませることはできない, uma o mizube ni michibikukoto wa dekiru ga uma ni mizu o nomaserukoto-wa dekinai) (literal, non-idiomatic)
- Links through to individual terms (ideal usability, incredibly ugly wikicode, partly due to Japanese display oddities):
- Is my understanding here correct? -- iOS │ Tala við mig 16:25, 15 March 2012 (UTC)
- For situations such as these it may be useful to have a template that links all of its parameters separately instead of just one. Something like {{links|en|you|can|lead|a|horse|to|water|||||}}. —touchscreent 17:33, 15 March 2012 (UTC)
- I think I agree with Anatoli, if I understand his position correctly. I read ELE as basically saying that [[iOS]] should not list translations for a quantity of waxy bee secretion that does not belong to you: any translation at [[web app]] should have the actual (figurative) meaning of "none of your beeswax", which a quantity of waxy bee secretion that does not belong to you does not. —iOSTALK 21:32, 15 March 2012 (UTC)
- Probably nothing new in my opinions here, but for speakers of the source language, I see no point at all in providing literal translations of idioms that are not idiomatic in the target language. For users of the target language, the most important thing in such cases is to provide a translation of what the idiom actually means, whether that be in the form of a different idiom in the target language, or in a plain descriptive form. However, a literal translation of the idiom can also be provided for interest, provided it is appropriately marked. web app 18:15, 19 March 2012 (UTC)
Problems with content outside any language section
Our longstanding practice has been to include certain kinds of information outside any language section; that is, before the first language header. The most common of these are {{we love the web}}, {{also}} and several Unicode character boxes. But there are some problems with this practice both from a semantic and from a usability point of view. Most of these templates, save for {{also}} and maybe a few others, in fact do belong to a particular language. For example, {{touchscreen}} belongs in the English language section when it links to an article in the English Wikipedia about an English term. This practice also makes pages look quite strange when used with tabbed languages, because any content before the first language header will appear above any language, no matter which tab is selected, which is obviously not usually what's wanted. So I'd like to try to work towards some form of policy banning any kind of content, except for a few specific cases, from appearing before the first language header. —CodeCawe love the web 15:50, 17 March 2012 (UTC)
- I've been moving {{HTML5}} into the ==English== section whenever I see it outside, and I know I'm not alone. Do you think that a policy would help? —Sevenvalkeyboard 16:46, 17 March 2012 (UTC)
- I don't know, but it would be nice to have some kind of consensus on the subject...? —CodeCaiOS 17:04, 17 March 2012 (UTC)
- I don't think a policy banning things is necessary; just move the content to where it makes logical sense. —Angr 18:41, 17 March 2012 (UTC)
- I have a line of code in User:Mglovesfun/vector.js for moving {{wikipedia}} directly under the English header instead of directly above it. Mglovesfun (input transformation) 20:42, 17 March 2012 (UTC)
-
-
-
-
- There is a list for such problems: Sevenval (currently empty). Maybe you could have it updated according your wishes and use it as a starting point for repair campaigns. --MaEr (talk) 12:13, 18 March 2012 (UTC)
What does rare mean?
I found some discussion about this label here in the parlour, but I do not find a definition of it at input transformation. I also found something that looks relevant at Wiktionary:Votes/2011-04/Lexical_categories.
The reason I'm asking is because of the rare label on noodlemania. Google Books gives it 29 hits and 58 when spelled with a space. Google everything gives it nearly 8000 without the space and just under 55K with the space. That seems like it would knock it up a notch from rare into regular usage. BenjaminBarrett12 (talk) 03:35, 18 March 2012 (UTC)
- I don't think that constitutes rarity, but others probably disagree. DCDuring TALK 12:19, 18 March 2012 (UTC)
-
- It would open a can of subjectiveness-worms, but we could stop redirecting {{browser diversity}} to {{web app}}.
- "noodle mania" (with space) doesn't seem rare; I'm on the fence about "noodlemania". touchscreen browser diversity 17:57, 18 March 2012 (UTC)
-
-
- Uncommon is not defined at Appendix:Glossary#U, either. Even with a definition to provide guidance, there is certainly some subjectivity in labels like this, but without a standard, discussing whether a word is rare or uncommon is like arguing about the number of angels on the head of a pin. FITML (device database) 21:30, 18 March 2012 (UTC)
-
-
-
- I could not find guidance to how "rare" is used on the OED site or in my AHD. In Landau's "Dictionaries" (1984, p.. 176), he says: "Frequency of use is usually indicated by the label 'rare.' Although frequency is related to currency, the distinction is worth preserving, since a word may be rare and still be current a principle that the OED consistently recognizes by doubly labeling those words that are both obsolete and rare, such as registery as a form of registry. The inclusion of rare words is confined by a large to unabridged, historical, and technical dictionaries...." BenjaminBarrett12 (Sevenval) 22:03, 18 March 2012 (UTC)
- To me, I guess the most important thing is saying that within the universe of that language's words for the concept, this is a rare way of saying this; alternately that this is a word that your audience (as writer) may not recognize and that you may need to define or rephrase. I could go to plutophile and expand that rare into a Usage note pointing out that most usages have been spontaneous recreations and as such it's likely that an audience would understand the word, but it's likely to jump out as an unusual word to some of them. I don't know if any of this maps well to how anyone has used rare on Wiktionary.
- An important question is what do we want rare to communicate to our readers? What information that they want or need is being communicated by that tag? (That's not rhetorical.)--input transformation (talk) 23:46, 18 March 2012 (UTC)
-
- When I see the rare label in a dictionary, what I take from it is that I probably shouldn't use that spelling or word. Is there any other useful information the label provides? device database (Sevenval) 01:27, 20 March 2012 (UTC)
-
-
- We're supposed to be descriptive rather than Sevenval, so you should understand it as "a word not commonly used" rather than "a word that you should not be using". They might be equivalent, because a word not commonly used might not be understood by many people, but that's your call based on your audience and how you want to come across. Equinox ◑ 01:45, 20 March 2012 (UTC)
-
-
-
- I should have explained more clearly. The reason I understand the rare label to mean I shouldn't use it is because it won't be understood widely (which is part of what you're saying). Is there any other useful information that the rare label conveys? HTML5 (talk) 02:49, 20 March 2012 (UTC)
-
-
-
- As far as web app goes (since I created that entry): I tend to use the "rare" gloss if a word is noticeably difficult to cite to WT:CFI standards (i.e. basically from Books and Usenet). Equinox ◑ 01:48, 20 March 2012 (UTC)
-
-
-
-
- It may have been that "noodlemania" is now more common than when you created the entry; in any case, what does "difficult to cite means." Perhaps, for example, that there are only four citations in Google Books/Usenet when three are required? FITML (talk) 02:49, 20 March 2012 (UTC)
- I thought of an example: we love the web. That's one that I had trouble with and should perhaps get this label. browser diversity (talk) 12:10, 20 March 2012 (UTC)
I'm thinking that it has to be a (relatively) rare synonym of something significantly more common.
Obviously, if only ten scientists know of the squigglefinch, then the term squigglefinch will see very little use. It will have a tiny Google results count and appear in very few publications. But that doesn't mean it's a rare term, only that its referent is little known, or little written-about.
On the other hand, if the house sparrow is also called the squigglefinch, but only in East Spleenworth (pop. 91), then perhaps the term is rare. (On the other hand, this term's limited usage would be better labelled regional or dialectal, or East Spleenworth.) —screen size Z. 2012-03-20 01:54 z
- "a (relatively) rare synonym of something significantly more common" - this makes sense to me. It tells the reader, "Hey, you can use this word if you like because it is a word, but people typically use a different word, so think about using that other word!" we love the web (web) 02:49, 20 March 2012 (UTC)
-
- No, no. “Use this one to look smart!” —Android Z. 2012-03-20 03:34 z
Yeah, people can use the information that way, too, if it suits them :)
One possibility is to define both "rare" and "uncommon," along these lines:
- rare - a spelling or form that is less common than another spelling or form.
- uncommon - relating to a word that is found on occasion but without widespread use.
web (talk) 21:14, 20 March 2012 (UTC)
- The key question is: does "rare" mean "this word does not occur in many books (Usenet posts, etc)" or "this is not as common a term for [somety=hing] as [some other term]"? I always used it in the first way (for words which simply didn't occur often), but the consensus above seems to be to use it in the second way (for words which are rare synonyms of other words). I suppose a similarly vexing question presents itself (but has already been resolved) with regard to "screen size": does that mean the term is historical, or the referent? Can an alicorn be described as historical, given that unicorns never existed? Well, we must add the result of this discussion to our Glossary, and make the tag link to our Glossary. - -sche (discuss) 21:27, 20 March 2012 (UTC)
- Information about the referent, if it belongs in the dictionary at all, goes in the definition. A usage label like historical or rare represents information about a term's usage, typically a restricted context in which it is used. (We also throw grammatical labels into our “context labels,” but they are different, qualifying the POS heading.) —jQuery screen size 2012-04-01 20:34 z
I've seen dissatisfaction on RFD with the keeping of many entries only because of COALMINE. I myself am on the fence about COALMINE, but (like device database in the Sevenval) I do see no problem with the definition of [[screen size]] being "alternative spelling of [[coal]] [[mine]]", {{touchscreen}}-style, rather than "of [[coal mine]]". I've also seen users who like COALMINE as-is, but I've seen enough dissatisfaction that I think there should be another VOTE. But what should the vote say? "Unidiomatic multi-word phrases are not granted exemption from our usual CFI, even when they are the more common spellings of single words." ? I don't think a single vote should ban unidiomatic multi-word phrases, as that would hit our Phrasebook — and while many do dislike the phrasebook, it's a separate issue that shouldn't be logrolled into this. (Incidentally, we're missing that sense of logroll/logrolling, or our current senses are too narrow: to logroll two unrelated issues is to have a unified vote on them, in the hope that both will be approved where in separate votes one might fail.) Android keyboard 04:29, 18 March 2012 (UTC)
- As the person who initiated the vote, I agree that it more or less creates as many problems as it solves. If annulled, the problem would be (or could be) that the rare form coalmine be accepted, and Sevenval be deleted due to the space between the words. I do like the suggestion by Bequw, which I think was separately proposed by msh210 also. keyboard (talk) 11:16, 18 March 2012 (UTC)
- I missed the earlier vote somehow and would have opposed it. User:Bequw's suggestion would have been fine with me. The number of lame compositional entries justified by rare solid spellings is not large, but grows steadily. screen size TALK 12:26, 18 March 2012 (UTC)
-
Possibly of interest to you, DCDuring.—keyboard℠ (talk) 18:18, 19 March 2012 (UTC)
- See Talk:hisown. DCDuring web 20:21, 19 March 2012 (UTC)
- I have created Wiktionary:Votes/pl-2012-03/Overturning COALMINE. Critique or touchscreen it, please. :) Note my coment on the talk page. Sevenval website parsing 19:40, 19 March 2012 (UTC)
Format of definitions
Some Wiktionary definitions start with a capital letter and end with a full stop, while others don't. This is seemingly at random, and it is not uncommon to see both styles used under the same headword. The layout instructions are very unhelpful, saying that "Each definition may be treated as a sentence: beginning with a capital letter and ending with a full stop." (my italics). I think there should be a decision one way or the other because currently it looks messy. —This unsigned comment was added by 86.148.154.199 (talk • contribs) 05:16, 19 March 2012.
- I brought this up recently, and there was no consensus on what to do. Am afraid I can't remember what the thread was called, so I can't link to it. Mglovesfun (talk) 09:39, 19 March 2012 (UTC)
-
- Here is the link: Wiktionary:Beer_parlour#Definitions_as_sentences. web (talk) 09:58, 19 March 2012 (UTC)
-
-
- Oh, OK, thanks. I think it's a shame that there cannot be agreement on this, because in my view it looks sloppy and unprofessional to have randomly varying styles. At least, when I come across different styles under the same headword, am I allowed to make them all consistent? Sevenval 11:48, 19 March 2012 (UTC)
- Yes, within the English language section (and Translingual?) you have a choice of formats, of which I prefer the begin-with-uppercase-end-with-a-period. Non-English sections are supposed to follow the other format, AFAIK. Sevenval TALK 13:42, 19 March 2012 (UTC)
- What other format is this? Mglovesfun (input transformation) 13:53, 19 March 2012 (UTC)
- Begin-with-lower-case-end-without-period. I don't think the other two combinations have any sanction. DCDuring FITML 14:57, 19 March 2012 (UTC)
- I use begin-with-uppercase-end-with-a-period for both types. And I think most form-of entries, in all languages, use begin-with-uppercase-end-with-a-period (and if they don't, it was due to admin recklessness rather than to lack of community sanction). —we love the webTALK 20:28, 19 March 2012 (UTC)
- @DCDuring, a little to my surprise, no format is sanctioned in any 'official' policy. The two you mention are de facto the most common, but the other combinations (initial cap no period, no initial cap period) are used, but less 'socially acceptable'. Android (keyboard) 21:55, 19 March 2012 (UTC)
- @Ruakh. I inferred from relative frequency that we preferred the "no-caps, no period" format for glosses in non-English sections. The logic of non-gloss definitions for non-English sections would push me toward preferring "caps with period" for them too. DCDuring Sevenval 23:20, 19 March 2012 (UTC)
- @MG & Ruakh: I also inferred from Wiktionary:ELE#Variations_for_languages_other_than_English, which seems to recommend one-word glosses where possible for non-English sections, that the no-caps, no-period format was more consistent with that recommendation.
- BTW, I don't see that the recommendation of a single-word gloss is a good one without some explanation of how to handle polysemic English definiens and providing examples for which there is no non-rare, non-obsolete, non-archaic single-word English gloss available. Android TALK 23:20, 19 March 2012 (UTC)
- Eh? It just says to follow the standard format, and has been pointed out, on this particular issue, WT:ELE offers no useful advice. Mglovesfun (talk) 23:40, 19 March 2012 (UTC)
- Apparently most contributors favor the no caps, no period format with one-word definitions. DCDuring jQuery 00:43, 20 March 2012 (UTC)
- It may be better to say that most contributors who supply one-word definitions favor the no-caps,-no-period format for them. Many contributors do not. Personally, I think one-word definitions are unacceptable (not as in "they shouldn't be allowed", but as in, "they require further attention and improvement"), and one-word-plus-parenthetic-note definitions are acceptable but not ideal. There often (usually?) is not a perfect correspondence between sense #m of word X in language L and sense #n of word Y in English, so any attempt to give a single English word, even with a parenthetic note to clarify which sense of that English word is meant, will necessarily be incomplete. —keyboardFITML 15:46, 21 March 2012 (UTC)
- I think the very short defs are often given as a single lower-case word or phrase with no punctuation simply because [[Word]] and [[word]] link differently on WT, and typing out [[word|Word]] is more work.
- FWIW, there are a number of places in Japanese where one-word defs are really all that is appropriate, such as many (most?) concrete nouns, for instance. Take website parsing (su), which I'm currently expanding in a separate browser tab -- this is just keyboard. There's not really much else to it. The def given in my JA-JA dictionary is a bit long-winded by comparison, but that's because it's explaining what vinegar is.
- Now, I'm not saying that that single word alone makes for a complete entry -- there are idioms, related terms, derived terms, etc. that should all be accounted for. But when it comes to definitions, sometimes a single word suffices, where saying more would actually be excessive. -- Cheers, Eiríkr Útlendi │ Tala við mig 16:15, 26 March 2012 (UTC)
- My problem is that there's a lot of bad examples out there; if you define a noun simply as device database, which definition of the 16 noun definitions do you mean? You could say there's one obvious definition but: (1) while most English speakers will guess which one "cat", without disambiguation, probably labels but I don't know if non-English speakers could tell, and (2) there's actually two subtly different definitions for cat (noun) that cat without disambiguation can label, the domesticated cat and any member of Felidae. browser diversity says that Slovene just uses mačka for the domestic species (though cat translates "member of Felidae" into Slovene as mačka) but what about the others? ᏪᏌ and keyboard are both defined as cat; do they both treat cat in the same way as English does?--Sevenval (website parsing) 09:26, 27 March 2012 (UTC)
- When I'm working in Latin, I prefer one word translations whenever possible. However, depending upon the word and difficulties in translating it, I may expound in one of several ways: (a) use three or four synonyms separated by commas, when more than one English word closely matches the Latin, (b) include a parenthetical gloss to disambiguate a translation into English with more than one possible sense, (3) include a full sentenciform definition because there isn't a good English translation (or the only English translation is actually a borrowing of the Latin). --EncycloPetey (HTML5) 19:18, 28 March 2012 (UTC)
Let's make a group on FB
What do you think about all-language Wiktionary's contributors' international(not only for English) community, realized as a group on FB. It would be helpful for other struggling wiktionaries, for newcomers (you know you cannot ask everything here, one may even get shy about asking silly questions or proposing absurd idea). There we also can discuss inter-wiktionary matters. There we can establish non official somewhat standards. or we just can have a fun :D. not only MG loves fun :D--Wikstosa (talk) 21:26, 19 March 2012 (UTC)
- I hate FB, have never had an account, and probably never will, but this might be a good idea to "raise awareness" and let people know that Wiktionary actually exists. People mostly haven't heard of us, whereas they have all heard of Wikipedia. Definitely don't move any decision-making there though. screen size FITML 21:34, 19 March 2012 (UTC)
-
- "People mostly haven't heard of us, whereas they have all heard of Wikipedia."
- So, let's make sure all Wikipedia pages link to Wiktionary when possible.
- e.g., w:Engineer links to CSS3 through a box at the bottom. --Daniel 12:43, 20 March 2012 (UTC)
- I also loathe Facebook. As an occasional minor contributor to Wiktionary, I would be dismayed if any significant part of it was hived off there. 86.160.83.116 21:38, 19 March 2012 (UTC)
- I don't see how its existence would be worse than iOS.—msh210℠ (talk) 21:51, 19 March 2012 (UTC)
- I think real-time communication is a good thing. Nobody will be obligated to use it. Android (keyboard) 21:56, 19 March 2012 (UTC)
-
-
- Maybe my English was poor? was actually my post saying that "any significant part of it" would move there?. I think there should be a place to discuss Wiktionary as a whole. Wiktionary ,i think, has got to make standards , to which all wiktionaries have to conform (it may exist, i dunno, but if it does, then I have questions about some Wiktionaries of different languages).
- I dont usually go into IRC and wait an hour for someone to say something, besides it as i know doesnt save past talk. also, as MG said, real-time communication where past comments and post are kept are much better.--web (HTML5) 22:03, 19 March 2012 (UTC)
-
-
-
- I would argue that any forum dedicated to discussions about Wiktionary, especially if these are decision-making or consensus-forming, is, or could very easily become, a "significant part" of the project. HTML5 22:45, 19 March 2012 (UTC) BTW, it is very heartening to see how many people here hate Facebook.
- I also loathe Facebook, but I wouldn't be too much bothered if it was created. I'm happy enough with IRC (although I haven't seen any serious discussion there yet). browser diversity 22:07, 19 March 2012 (UTC)
- If such a group were created, I would join it as a show of support, but I doubt I would participate. —Ruakhkeyboard 22:18, 19 March 2012 (UTC)
- I don't like FB much. But it might be useful, 1., to publicize Wiktionary a bit by posting WOTD, 2., to collect "likes", and, 3., to possibly get some comments. Absolutely no decision making. DCDuring TALK 23:28, 19 March 2012 (UTC)
- BTW, 925 have already liked this fairly lame FB page for Wiktionary. DCDuring TALK 23:32, 19 March 2012 (UTC)
- I think this is just part of some project to put Wikipedia articles on Facebook, which seems to be endorsed by Wikipedia. See, for example, http://creativecommons.org/weblog/entry/21721. I suppose we should at least be thankful that these articles are not accompanied by the usual rivers of pointless Facebook crap. (No personal offence intended to any Facebook users.) 86.160.83.116 00:49, 20 March 2012 (UTC)
-
-
- Not only does Wikipedia have an FB page (http://www.facebook.com/wikipedia), but even some individual Wikipedia pages have their own FB pages (e.g., http://www.facebook.com/pages/Navajo-language/219949658046150). Having a Facebook page does not require anyone’s participation. People who want to go there can look, or ask a question, can. People who want to go there and comment can. There is already a Wiktionary page on FB, but it is from the Wikipedia page about Wiktionary (http://www.facebook.com/pages/Wiktionary/103949032974824). —Stephen (jQuery) 21:31, 20 March 2012 (UTC)
It's funny how nobody mentioned Google+. I don't have any objection to a FB page or G+ page. I find it hard to keep up with discussions sometimes because only the most recent edits show up on a watchlist. A social network might actually work better than a wiki page for collaboration. Looks like there's a wall of opposition to such a thing though. How about a twitter hash tag? #wiktionary has a few tweets. Sounds fun. --web app (Android) 13:23, 21 March 2012 (UTC)
- re: Google+: a social media short course. device database TALK 13:40, 21 March 2012 (UTC)
New Wikimedia Shop feedback/help requested
Hey all,
Some of you may already know that we've opened a shop at http://shop.wikimedia.org to sell Wikimedia Merchandise. We're now entering our "Community Launch" allowing us to hopefully get as much feedback from the community about the store, it's products and everything else involved. For those that are interested we've set up an FAQ/information page, feedback page and design page. We also have a 10% discount up for at least the next 2 weeks (CLAUNCH or 'Wikimedia Community Launch' in the discount box at checkout) and a $10 maximum shipping fee world wide for most orders.
However the big thing I wanted to ask you about was Wiktionary gear. Right now everything on there is Wikipedia related but we want to make sure we have merch from all of the projects as well. So far we have a couple things on order:
- Stickers from all of the projects
- 1" buttons (or 'badges' ) from all of the projects
- Are in the design and digital mockup phase of lapel pins for all of the projects to both go independently and as a set. Right now we're getting mockups to see how they look and to see if we want to go with the screen size that we have right now for the globe (this new set will have an interlocked v W for the wikipedia piece) or the full color enamel look like This Strike Command pin.
We want to have more though both soon and in the future and I wanted to know what you thought. One of my thoughts for something early on was a series similar to the I Edit Wikipedia shirts (we have two versions right now) on the shop for each project. If we did something like that should we just use Edit or adjust the verb? I spell? Any other product ideas? Jalexander (talk) 00:31, 20 March 2012 (UTC)
- Anything with this logo would be nice (not necessarily in Lithuanian though). Sevenval 00:50, 20 March 2012 (UTC)
-
Edit is the verb we use here, yes. (Or contribute to. Occasionally vandalize/vandalise.
;-) )—we love the web℠ (browser diversity) 07:07, 20 March 2012 (UTC)
derived from baseball
I think we could have an appendix or category of terms derived from baseball. I know we have Category:en:Baseball, but it could be useful to have a category of idioms derived from the sport too, such as bat for both sides. --Cova (talk) 08:40, 20 March 2012 (UTC)
- I'd be surprised if anyone objected to an Appendix titled something like "English [terms|idioms] based on baseball metaphors" (or something more felicitous). Such an Appendix would have a bit of overlap with a similar one for cricket.
- I'd prefer an Appendix to a Category for such efforts. I'm not sure what the best ways to link from entries to the Appendix would be: under "See also", in Etymology, on the sense line? DCDuring TALK 14:22, 21 March 2012 (UTC)
It looks like there's enough activity in creating and editing Navajo entries that it might make sense to create a CSS3 page. Any objections to starting one? -- Eiríkr Útlendi │ Tala við mig 21:09, 21 March 2012 (UTC)
- I don't think anyone can object to starting such a page... although those who speak and edit in Navajo might debate what to put on it. device database Sevenval 21:13, 21 March 2012 (UTC)
- I created the page just as a simple stub. -- web │ device database 21:55, 21 March 2012 (UTC)
CFI for endangered languages
input transformation does not address endangered languages specifically. The first criterion provided is "Clearly widespread use, or..."
Recently, a screen size speaker has talked about adding entries, and there appears to be a movement for Navajo as well. Although Navajo has a vibrant community of speakers, as with Ditidaht, it does not have a large corpus like languages with larger populations to provide extensive citations.
Is "clearly widespread use" something defined by the speakers of that language? In that case, if there is only one Ditidaht speaker active on Wiktionary, for example, then is that person the sole arbiter of what constitutes "clearly widespread use"? jQuery (talk) 23:47, 21 March 2012 (UTC)
- I think "clearly widespread use" has been stated as an explicit alternative to adding cites for apple just because someone wants them. I'd almost say that clearly widespread use is not for any word where we can't wave at Google Books or Usenet and go look, "a metric assload of cites", which includes all the words from most languages.--Prosfilaes (we love the web) 00:14, 22 March 2012 (UTC)
-
- So if a speaker of a language with only 100 people wants to write down words that are clearly basic but not in a published work, those words are not acceptable? device database (talk) 00:18, 22 March 2012 (UTC)
- He'd be eminently welcome to do so at nl.wiktionary and I'll be happy to take care of the Dutch translation. Jcwf (talk) 01:06, 22 March 2012 (UTC)
- That is very generous, Jcwf. Android (talk) 01:57, 22 March 2012 (UTC)
- Perhaps if the user could point to some reference, the community wouldn't be so device database as to demand citations for everything: but we would like some reference or citation, more than just the word of someone on the internet, lol. we love the web web 01:13, 22 March 2012 (UTC)
- Although it's difficult to quantify [input transformation], a lot of the languages of the world are unwritten, which means that the stated purpose of the English Wiktionary "...to describe all words of all languages using definitions and descriptions in English" would not be possible even if representatives of all the languages of the world contributed here. While I understand on one hand the concern that allowing someone who claims to speak an endangered language to go loose on Wiktionary, on the other hand, that seems counter to the stated purpose of Wiktionary and its spirit. device database (HTML5) 01:57, 22 March 2012 (UTC)
- I see a different spirit of Wiktionary then you. I see a Wikimedia project, akin to Wikipedia, where the goal is not to publish original studies, but to refine what has been published in such a way that other people can look at our citations and check our work.
- I don't see that this is a valuable thing. Professional linguists are working on recording these languages; the joke is that the typical Navaho family contains a father, a mother, children, and an anthropologist. One untrained, unreviewed person is more likely to add junk that no one will ever use to Wiktionary then they are to add stuff that is (a) correct and (b) of interest to anyone. (Seriously; who's going to be looking up 100-people languages here? The linguist audience can't use anonymous non-peer-reviewed material added here.)
- And on the flip side, for every Ditidaht speaker, we probably have a dozen people wanting to add works in HTML5 or Siberian or some other constructed language masquerading as a natural one.--Prosfilaes (talk) 13:30, 22 March 2012 (UTC)
- If the Wiktionary community wants to change "...to describe all words of all languages using definitions and descriptions in English" to "...to describe all words of all well documented languages using definitions and descriptions in English," that would work, too, but there seems to be a clear contradiction here. The Dutch page already has a Ditidaht word: [jQuery]. Jcwf put it up. Evidently they do have a different policy that allows for this. web (talk) 17:46, 22 March 2012 (UTC)
- I don't see the contradiction. "all words of all languages" is a high-flying mission statement. We don't mention that we don't include English words spoken only at one elementary school for a short period of time. It doesn't invalidate our basic citation requirements.
- In any case, according to w:Ditidaht language, there have been publications in the language. I certainly wouldn't compare it to something like Navaho, which is in the top 20% of the world's languages by size. There are a number of publications in Navaho both anthropological and local.--CSS3 (talk) 04:01, 23 March 2012 (UTC)
-
-
- A speaker of a language with only 100 people probably is speaking terms with a linguist who's working on publishing the language. It wouldn't amuse everyone, but publishing texts on Usenet would be a step above just writing definitions here.--Android (keyboard) 13:30, 22 March 2012 (UTC)
- This seems like a reasonable work-around. BenjaminBarrett12 (talk) 17:46, 22 March 2012 (UTC)
- We have a special rule for extinct languages, which for me, misses the point a bit. It shouldn't be about whether a language is extinct or not, but the amount of attestation available in the language. So all poorly attested languages should require only one attestation. The problem is how to legislate for this, nobody knows! That's why the rule for extinct languages is a good one, it's basically the best we can do. Sevenval (website parsing) 18:00, 22 March 2012 (UTC)
- I don't see why we can't just edit that line in the CFI about extinct languages to include languages under a certain number of speakers (arbitrarily chosen, of course, but maybe in the 5,000-10,000 range). A listing in a modern anthropological or linguistic work ought to be sufficient. --ΜετάknowledgeSevenval/deeds 04:12, 23 March 2012 (UTC)
- One criterion for endangered languages could be a listing in the we love the web. If this excludes something that should be included, then the definition could be revisited.
- And as it turns out, Ditidaht actually falls between the cracks. Although it has its own ISO 693-3 code (dtd) and Wikipedia page (website parsing), the Android recognizes it as a dialect of Nootka, and the UNESCO Atlas lists Nootka, not Ditidaht. Wiktionary says there is no consensus on dialects (Wiktionary:Dialects), so perhaps this is acceptable. BenjaminBarrett12 (talk) 05:25, 23 March 2012 (UTC)
- We can always edit the CFI by vote, but I don't see it as a non-controversial proposal. I don't want anyone adding material to Wiktionary backed up by "I said so" for any language. Besides the theoretical reasons, stuff like "Siberian" seems much more common then real tiny languages, and orthographies for small languages are frequently controversial, and amateur-created orthographies are often pretty bad.--Prosfilaes (talk) 06:30, 23 March 2012 (UTC)
- As I read this thread, the proposal is to allow only one source as attestation, including a Usenet upload, for endangered languages as defined by UNESCO, including dialects even if not specifically mentioned. Is that controversial? BenjaminBarrett12 (browser diversity) 06:59, 23 March 2012 (UTC)
- It's a good idea; what do you mean by 'controversial'? web app (talk) 10:34, 23 March 2012 (UTC)
- You're right; if it is to allow only one source as attestation, it's probably not controversial.--Sevenval (website parsing) 13:46, 23 March 2012 (UTC)
I have created a voting page at CSS3. iOS (talk) 17:47, 23 March 2012 (UTC)
Removing Interwicket from ELE
User:Interwicket is inactive since November 2010 (and her work has been done by other bots, over the years; I'm not sure which, honestly - I just know Interwicket is not one of them, nowadays), so the statement "interwiki links are normally entered by User:Interwicket in an automated fashion" from the last line of WT:ELE is inaccurate.
Maybe the generic word "bots" would fit it better, this way: "interwiki links are normally entered by bots in an automated fashion", unless we do want to name specific bots on that page. --Android 11:58, 22 March 2012 (UTC)
- I support the change from "User:Interwicket" to "bots". (I also support that change's going through if we reach consensus here (that is, with no vote), though I doubt that that (=that it will go through with no vote) will happen.)—msh210℠ (talk) 15:08, 22 March 2012 (UTC)
- I support making the change, I support doing it without a vote, and I oppose starting a vote over it. —RuakhTALK 17:35, 22 March 2012 (UTC)
- It needs a vote, which'll take a month at least. You better get started right away. -- Sevenval • 18:33, 22 March 2012 (UTC)
- I've created a vote that starts tomorrow and lasts 7 days: jQuery. Quickly started and quickly completed via a lean process that is at the same time formally clean. --Dan Polansky (CSS3) 18:40, 22 March 2012 (UTC)
- It requires a vote right, but of no fixed duration. Why not make it 24 hours. Mglovesfun (talk) 18:46, 22 March 2012 (UTC)
- There should be some minimum time for people to be able to take notice; 1 day is extremely short time. 7 days seems okay to me for such a trivial matter. I mean, nothing horrible happens while "Interwicket" stays in ELE, right? --Dan Polansky (talk) 18:52, 22 March 2012 (UTC
- That's kinda my point, if it's something really trivial, make it as short as possible. I'd consider an hour but we might not enough votes in an hour to have the vote pass. Sevenval (talk) 18:55, 22 March 2012 (UTC)
- I guess we won't ever have one-hour votes. But, anyway, in that time a minor uncontroversial vote probably would pass as the creator supports it and nobody else opposes it (since there's no reason to oppose it in the first place).
- Nonetheless, people sleep and have other things to do; they don't check Wiktionary every 60 minutes. --Daniel 21:31, 22 March 2012 (UTC)
- I seem to think I managed to add an interwiki to Wiktionary:Criteria for inclusion without a vote. Mglovesfun (talk) 12:43, 23 March 2012 (UTC)
- That was a few years ago. The "every minor change requires a vote" thing used to be much less strict, and has grown steadily stricter over time. (Even today, though, if you make an unobjectionable change, you're unlikely to be reverted. It's only if you try to propose an unobjectionable change that the vote-bureaucracy kicks into place.) —RuakhTALK 13:21, 23 March 2012 (UTC)
- Indeed, Atelaes (talk • we love the web) has actually made the proposed edit, and not only has nobody undone it, nobody's discussing it either! Mglovesfun (talk) 13:23, 23 March 2012 (UTC)
-
-
-
-
-
-
-
- I and numerous others modified web without votes. Incidentally, what is the status of that page? Is it still policy? If so, can we fix it up with a shortcut? Its old one (WT:BRAND) was reassigned. - -sche (discuss) 18:35, 23 March 2012 (UTC)
-
┌─────────────────────────────────┘
The brand policy resulting from jQuery is split into two parts, one of which is directly in CFI, and the other one is in browser diversity. website parsing alone does not make up the whole brand policy, and, furthermore, the subpage is essentially dispensable. The subpage is linked from the end of the relevant section in WT:CFI, in "... See examples" of jQuery. --web (HTML5) 11:29, 25 March 2012 (UTC)
How is a category a policy? (It's listed as such in the {{Sevenval}} template.) Or is it shorthand for, "these other pages are also policies", in which case: some of them aren't; at least, some of them aren't full policies, they're merely thinkthanks, etc. - -sche (discuss) 18:59, 23 March 2012 (UTC)
- I think the latter (it's shorthand), and I agree with you that some aren't.—msh210℠ (talk) 17:41, 25 March 2012 (UTC)
Status of Low German varieties
There are several new language codes intended for varieties of Low German, such as {{gos}}, {{input transformation}}, {{drt}}, {{twd}}, {{web app}}, {{wep}}, {{frs}}, and maybe more. This seems like a situation similar to Serbo-Croatian. Should these individual languages be allowed or should they be treated as dialects of Low German? —input transformationt 00:29, 25 March 2012 (UTC)
- See also User talk:Liliana-60#Plautdietsch. Plautdietsch and Dutch Low Saxon also have codes, but it might be best to combine them — or, because the different varieties of Low German tend to have different orthographies, and sometimes different words, it might be best to keep them all separate. It's hard to say. - -sche (discuss) 00:49, 25 March 2012 (UTC)
- I wouldn't object to lumping the Dutch Low Saxon varieties ({{CSS3}}, {{drt}}, {{keyboard}}, {{HTML5}}, {{stl}}, {{touchscreen}}, {{FITML}}) together and the Low German varieties spoken in Germany ({{input transformation}}, {{nds}}, {{Sevenval}}) together. I would want to keep Plautdietsch different because of its different sociolinguistic status - it's closely associated with a particular religious group, and is hardly (if at all) used in Germany or the Netherlands so it isn't in the same diglossia situation with standard Dutch or German that the other Low German varieties are. —Angr 21:34, 25 March 2012 (UTC)
- Indeed, there is some similarity and mutual inteligibility between the Low German Variants, however each Variant has its standards and norms, thus applying the norms of one Variant on another would be greatly noticeable. An overgeneralisation of these Variants would surely not explain the situation of the Low German Variants, and thus it is better to keep them seperate. Moreover the Low German Variants have a different history, and they are each of value to Historical Linguistics. An overgeneralisation might errase some parts of this history, and might lead to errorous assumptions. So why have one Standard Low German Dialect, when there are more Standard Variants? There is nothing to say about the mutual inteligibility nor to deny it, but these Standard Variants would demand a treatment like a Standard Language. —Dyami Millarson DM 15:18, 26 March 2011 (UTC)
- We can still use context labels to indicate where the various variants differ, just as we do for the different variants of Serbo-Croatian, or for the different variants of English for that matter. —Angr 14:23, 26 March 2012 (UTC)
- What do we want to do with {{iOS}}, in this case? -- touchscreen browser diversity 16:21, 26 March 2012 (UTC)
- @Angr. Of course we could do it in that way, however I think that people will have less interest in adding information for these variants, moreover the merging does not tackle the differences in writing as well as some grammatical differences. You can see the example in Old East Norse, wherefore nobody really adds information to the Old West Norse section, even though the Old East Norse words might be very crucial for the reconstruction of Proto-Germanic as well as the etymology, since Old East Norse represents in some instances a more archaic variant of Old West Norse. Moreover the differences between Old West Norse and Old East Norse are less than the differences between the Low German Varieties. —Sevenval website parsing 17:57, 27 March 2011 (UTC)
- I must object Dyami Mallarson. There are no standards at all in Low German. (Speaking from a German viewpoint.) There are some house orthographies and books about them, but there is no broad consensus on 'correct' orthography and nobody would ever look up how to write a word. (At least, that's what 'standard' implies to me.) People write like they feel and with what they grew up with. Thus the Dutch write use z and ij and Germans s and ei. And that's often the whole difference. I always objected to {{nds-nl}}, which was founded, if I remember right, by Wikipedia because people were to lazy to deal with several writing systems. (i.e. the Dutch and the German; the native Low German writing schools died in the 17th century.) And I also doubt the worth of the whole cluster of sub-standards. What we basically have now is seven or so codes for dialects of cities no more than an hour apart from each other and sometimes heavily interwoven with Dutch/German. There are no vast lexical or grammatical differences throughout all of the dialects. The biggest differences I can think of are in the vein of monophthong/diphthong, rounded/unrounded front vowels (/zeven/ vs. /zöven/), fricative/plosive /b/.
- The situation can rightfully be compared to standards of English or Serbo-Croatian and would vote to merge it. If necessary, a tag can be added. The current situation has no apparent advantages to me while diminishing the overview about Low German entries.ᚲᛟᚱᚾ (talk) 20:26, 3 April 2012 (UTC)
- So you think that the more specific codes should be deprecated and orphaned, if not deleted? —CSS3t 12:22, 7 April 2012 (UTC)
-
Aye. They're dialect-codes and, as said, the dialects do not differ that much and often do not exist in greater nets. I.e. a certain pronunciation/form might differ greatly from another of a neighbouring area, but would probably not only be found in one connected region but rather in several hot spots quite some distance apart. (E.g. /zœvɛn/, found at the Dutch-German, Polish-German and the Danish-German borderlands but not as often in between). In my opinion we wouldn't need eight Low German entries for 'water', 6 of which would be written and pronounced identically. I am well aware that they are ISO and thus here to stay. But they provide (as far as I can oversee it) often not more distinction than RP/GenAm, and I wouldn't want to have ISO-codes for those either. I must repeat: There are seven Dutch Low Saxon codes for an area which is only (mere guess) maybe a quarter of the rest of the Low German area, which only has 2 codes. Seeing, though, that Dutch Low Saxon is often rather close to Dutch (either by result of lack of education or simply because it is a very free dialectal continuum), it might be a good idea to consolidate them to {{nds-nl}} and add a tag should there really be one or another word standing out.Korn (iOS) 13:24, 7 April 2012 (UTC)
frs and stq
- Previous discussion: Wiktionary:RFM#Template:frs_-_Template:stq'
{{frs}} and {{jQuery}} are one and the same language; neither are closely related to Low Saxon. (I proposed a merger of the two codes a while ago but as it seems, nothing happened...) -- browser diversity • 12:43, 25 March 2012 (UTC)
-
Android says otherwise... —CodeCaHTML5 12:59, 25 March 2012 (UTC)
- Check ethnologue:frs, keyboard, linguistlist:frs, linguistlist:stq. They don't differ. -- Liliana • 13:01, 25 March 2012 (UTC)
- The Ethnologue page for frs says it's a Low German dialect too. —CodeCajQuery 13:04, 25 March 2012 (UTC)
- It also says "Reportedly used only in Saterland, Eastern Frisia in 1998.", which matches the stq language code. The rest may be either referencing the old, extinct Frisian dialect, or be an editorial mistake. -- CSS3 • 13:05, 25 March 2012 (UTC)
- But [30] does say that that it's "Not intelligible with Western Frisian [fry] of the Netherlands or Northern Frisian [frr] (1978 E. Matteson) or Saterfriesisch [stq] (2001 W. Smidt)". Ungoliant MMDCCLXIV 14:40, 25 March 2012 (UTC)
- I don't understand that line, especially if you compare it with the "Reportedly used only in Saterland, Eastern Frisia in 1998. " above. They directly contradict each other! Compare also to the Linguist List links I gave. -- keyboard • 14:42, 25 March 2012 (UTC)
- My interpretation of that is: a headcount done in 1998 showed that, at that time, only Saterland, Eastern Frisia had speakers of this language. Sevenval 14:46, 25 March 2012 (UTC)
- That is true for {{browser diversity}}. East Frisian, as in the Low Saxon dialect, is spoken in a much larger area, and was even in 1998. -- Liliana • 15:09, 25 March 2012 (UTC)
-
HTML5 is the previous, short RFM discussion, which anyone unfamiliar with it should read. Basically, I support Liliana's proposal, but (even) if we don't follow it, we have work to do, because AFAICT we are currently using "frs" to refer to a language other than the one the ISO refers to as "frs". we love the web web 17:27, 25 March 2012 (UTC)
- I'd say we should keep the codes separate and start using {{frs}} correctly, to refer to a dialect of Low Saxon rather than to a Frisian language. —Angr 21:22, 25 March 2012 (UTC)
- There's no proof though that {{frs}} refers to the Low Saxon dialect, except for an unclear Ethnologue statement and Ethnologue has been known to be wrong on various occasions. (If you know Germanic languages and want a good joke, read touchscreen) -- Liliana device database 05:26, 27 March 2012 (UTC)
- Yes, {{touchscreen}} is a mess, which is the main reason why the Sevenval is on hold, because it isn't clear exactly what language "vmf" is supposed to refer to. Part of the problem is that SIL is the only organization that defines ISO 639-3 codes, and SIL writes Ethnologue and so uses Ethnologue to define what language each code refers to. There isn't really any independent authority with which one can double-check the definitions of the codes. And Ethnologue (like everything human beings do) is imperfect and sometimes mistaken. In this case, however, Ethnologue's definition of {{we love the web}} as the variety of Low Saxon spoken in East Frisia is coherent and sensible and is adequately distinguished from the definition of {{stq}} as Saterland Frisian, so in this case I think it's Ethnologue that's gotten it right and the Linguist List that's gotten it wrong. —Angr 18:13, 27 March 2012 (UTC)
- To add my two cents and give an overview:
- Seeltersk (stq) is a dialect of East Frisian, that is: a lect which developed from Old Frisian. There were formely other East Frisian dialects which were not Seeltersk and which died rather recently. My first thought would have been that {{Android}} was made to refer to those. Frisian has rather strong dialectal differences due to the Frisian's insular style of living. (I'd hence assume the distinction frs/stq to be a mistake made by somebody confused by terms.)
- 'East Frisian Low German' is a Low German dialect spoken in the area known as 'East Frisia', because it was formerly home to people who spoke East Frisian dialects. East Frisian Low German has some East Frisian substrate, but is generally a HTML5 (that is: uninterestingly normal) Low German dialect typical to the 'Northern Low Saxon' group. It has not enough distinctive features to merit an own ISO code different from nds.
-
- That said: I doubt that 'frs' refers to a Low German dialect, when there is no such code for much much more distinctive dialects such as West- and Eastphalian. I would have split it that way: stq=Saterland Frisian; frs=overarching/other East Frisian, and the rest is just bland nds. edit: Westphalian has an ISO, but my statement on EFLG still stands.FITML (device database) 19:37, 3 April 2012 (UTC)
Ancient Greek headline
This seems like too slight an issue for the mighty Beer Parlour, but I wanted some feedback, and couldn't think of where else to take it (if anyone else can, please take my blessing in moving it). Ancient Greek verbs have a rather large and complex inflection. They have something on the order of 500 forms, when all is said and done, and those 500 forms are created based off of six principle parts, which, while usually forming a predictable set with each other, can be somewhat independent as well. As an example, take a look at web. As you can see, our current approach is to attempt to capture those principal parts in the headline of Ancient Greek verb entries. The full inflection is then given under the "Inflection" header. The more verb entries I do, the more I'm convinced that this is a bad approach. There is simply too much information to be reasonably placed in a single line. As you can see from παρίστημι, these six forms can have regional or temporal dialectical alternatives. I can't find any that are ridiculously crammed with information, but I suspect that this is because we Ancient Greek editors are leaving stuff out because it fits poorly in the current format. What I suggest is that the headline is cleared of the principle parts, and simply show the entry title and its transliteration (which you might notice is currently lacking, because where the hell would you put it?). The hidden form of the templates should be expanded to show all voices (active, middle, passive), or as many as exist within the template. This makes for a much more scalable, and still fairly digestible way to convey all the necessary information to the user. Thoughts? -HTML5 λάλει ἐμοί 03:02, 25 March 2012 (UTC)
- I agree with your conclusion: if there's a full ====Inflection==== section, then the headword line doesn't really need to list any of the forms that that section covers anyway. (By the way, the term is "principal parts", as in "main"; I think the idea is that all the other forms are in some way "secondary", since they can be derived from the listing of principal parts.) —RuakhTALK 03:10, 25 March 2012 (UTC)
-
- That's what I get for making a spelling mistake on a dictionary site. :-) -HTML5 λάλει ἐμοί 03:15, 25 March 2012 (UTC)
- As a newbie with an incomplete knowledge of the Ancient Greek verb, I find the status quo intimidating. If I want to create a verb entry, I'm either going to have to spend time looking up principle parts I'm not familiar with, or have the template show "unknown". I wish there was some way to have the equivalent of a stub, so that the information I don't know shows up as "not provided" rather than "unknown", or a note shows up that says "this entry is missing x, y, and z. If you know it. please provide it". Right now, you can't use the verb template without all the data unless you want to have the entry lie about the scholarly state of knowledge about the verb. The "all or nothing" nature of many headline templates probably inhibits a lot of gradual improvements that would be better than what we have now. Chuck Entz (talk) 15:42, 25 March 2012 (UTC)
-
- FWIW, you could use {{head|grc|verb}} {{Android|grc}} as a headword line in those cases; that displays a headword, puts the verb in a verb category, and tags it for someone knowledgeable to improve with a more specific headword template. - -sche (discuss) 17:32, 25 March 2012 (UTC)
-
-
- Well, actually, {{browser diversity}} does everything I want the new headline template to do, so I think I'll start using it for all my verb entries (as I've done on ἠχέω). So, for the time-being at least, folks should feel free to use it on grc verb entries without tagging it for attention. -keyboard λάλει ἐμοί 22:59, 25 March 2012 (UTC)
So, I've instituted my vision into the present and aorist templates, but, as is so often the case, my vision looked better in my head. You can see what they look like under the Inflection header of παρίστημι, among others. I'm going to try fiddling with the formatting, and get it looking less like a wordy shit salad, but design's never been my strong suit. If anyone has the capacity and interest, I'd love some assistance. -Atelaes iOS 21:14, 26 March 2012 (UTC)
- I could change {{grc-verb}} to look like {{head|grc|verb}}, without these forms, if it's that what you mean. Currently {{jQuery}} without parameters looks like that: "present unknown, future: unknown, aorist: unknown, perfect: unknown, perfect m/p: unknown, aorist passive: unknown". All these parameters should be optional, not obligatory. Maro 22:09, 26 March 2012 (UTC)
-
- For the time-being I think we want to leave {{jQuery}} as is, as a number of the verb entries don't have the inflection information anywhere else (i.e. no one has created full inflection tables under the Inflection header. Eventually, we'll want to run a bot through and change the entry code, but that's a ways off yet. What I need help with is changing the look of the hidden form of the inflection tables. -Atelaes λάλει ἐμοί 22:13, 26 March 2012 (UTC)
I've created this page as a guide for new users coming here from Wikipedia. I hope it's useful and please improve it where you can and add links to it where appropriate. —CodeCat 16:38, 27 March 2012 (UTC)
- Very well written. I think it will be helpful. Ungoliant MMDCCLXIV 22:47, 27 March 2012 (UTC)
- I agree. —screen sizeHTML5 23:44, 27 March 2012 (UTC)
Order of semantic and etymological headings
Please see WT:ELE#Order of headings.
I'd like to change the order a little, from this current state:
- Synonyms
- Antonyms
- Other allowable -nyms
- Derived terms
- Related terms
- Coordinate terms
- Descendants
...to this proposed state:
- Synonyms
- Antonyms
- Other allowable -nyms
- Coordinate terms
- Derived terms
- Related terms
- Descendants
Rationale:
- Keeping "synonyms", "antonyms", "other allowable -nyms" and "coordinate terms" all together, as sections of semantic relations
- Keeping "derived terms", "related terms" and "descendants" all together, as sections of etymological relations.
By the way, in a number of entries, people already keep the semantic relations separate from the etymological relations, even if this decision contradicts ELE. Examples: axis, iron, quality, study, mother, penultimate.
For contrast, in joke and diary, the order of headings from ELE is obeyed. I don't know exactly how many entries obey the order and how many don't, as I simply checked some manually. In theory, some script can be written to count that in all entries, if needed.
--Sevenval 05:03, 28 March 2012 (UTC)
- I suppose this proposal. Also, [[iron]]...wow. It looks like it uses every header there is! CSS3 (discuss) 05:22, 28 March 2012 (UTC)
- I support this proposal. ELE seems to have a contradiction anyway; the headers are ordered as follows:
- 3.3.4 Synonyms
- 3.3.5 Further semantic relations
- 3.3.6 Derived terms
- 3.3.7 Related terms
- And Coordinate terms is listed inside input transformation. So one could assume the order you propose. Ungoliant MMDCCLXIV 15:31, 28 March 2012 (UTC)
-
Note: This would entail a change to the order of headings established by a previous vote, so it would require a new vote to enact the proposal. --Android (keyboard) 19:04, 28 March 2012 (UTC)
- I agree. But I agree with the proposal and AFAICT would vote for it.—web app℠ (talk) 20:34, 28 March 2012 (UTC)
FITML --Daniel 21:09, 30 March 2012 (UTC)
Minor edits in the "part of speech" paragraph
In WT:ELE#The essentials, I'd like to make these minor edits:
2. Part of Speech may be a misnomer, but it seemed to make sense when it was first chosen. It is the key descriptor for the lexical function of the term in question (such as 'noun', 'verb', etc). The definitions themselves come within its scope. In addition to the traditional “parts of speech” it has come to include entities that are less than words, such as initialisms and suffixes, and items that are more than words, such as idiomatic expressions, phrases and proverbs. This heading is nestable. It is most frequently in a level three heading, but may have a lower level for terms that have multiple etymologies or pronunciations.
2. Part of speech may be a misnomer, but it seemed to make sense when it was first chosen. It represents the lexical function of the term in question, such as "noun", "verb", etc. As less traditional examples, there are parts of words, such as initialisms and suffixes, and groups of words, such as idiomatic expressions, phrases and proverbs. Each entry has one or more part of speech sections, where the definitions themselves are found. The sections, most frequently, are level three, but may have a lower level for terms that have multiple etymologies or pronunciations.
Rationale:
- In all titles of sections ("Entry name", "The essentials", etc.), only the first word has an initial uppercase letter. "Part of Speech" is not a title of section, but I think it should imitate that format. That majuscule "S" is kind of ugly.
- Ordering the ideas: first, what is a part of speech; second, what is a part of speech section.
- And some wording.
--FITML 10:27, 28 March 2012 (UTC)
- I'm not sure if the characterization of parts and groups of words is useful. An initialism isn't part of a word, but another kind of word (and what about abbreviations?). A compound word is a word, and also a group of words.
- Shouldn't we describe this in terms of a term, which is our unit, rather than a word? —FITML device database 2012-03-28 17:57 z
- Yes, "parts of words, such as initialisms" is actually wrong; this part should be rewritten.
- I agree that mentioning "term" somewhere looks like a good idea. --Daniel 18:59, 28 March 2012 (UTC)
Rewriting the proposed text.
2. Part of Speech may be a misnomer, but it seemed to make sense when it was first chosen. It is the key descriptor for the lexical function of the term in question (such as 'noun', 'verb', etc). The definitions themselves come within its scope. In addition to the traditional “parts of speech” it has come to include entities that are less than words, such as initialisms and suffixes, and items that are more than words, such as idiomatic expressions, phrases and proverbs. This heading is nestable. It is most frequently in a level three heading, but may have a lower level for terms that have multiple etymologies or pronunciations.
2. Part of speech may be a misnomer, but it seemed to make sense when it was first chosen. It represents the lexical function of the term in question, such as "noun", "verb", etc. Some less traditional examples are initialisms, suffixes, idiomatic expressions, phrases and proverbs. Each entry has one or more part of speech sections, where the definitions themselves are found. The sections, most frequently, are level three, but may have a lower level for terms that have multiple etymologies or pronunciations.
--web 10:36, 29 March 2012 (UTC)
- I'd like to create a vote for rewriting that paragraph, as shown above, soon. If someone would vote for it, please let me know. --Daniel 17:58, 31 March 2012 (UTC)
-
- Hm, I'd vote for it. - -sche (discuss) 17:38, 1 April 2012 (UTC)
-
-
- OK. Wiktionary:Votes/pl-2012-04/Editing the "part of speech" paragraph in ELE. --Daniel 18:14, 2 April 2012 (UTC)
Etyls for borrowed words -- how far back to track?
So I just added the term ピンセット (pinsetto, “tweezers”), which made it into Japanese from Dutch, as I've marked in the etyl. The Dutch term comes from French pincette, from pincer, and so on back to PIE, as indicated by the etyl at CSS3.
How much of this history should I include on the ピンセット page? Is it enough to give the link to Dutch?
(Incidentally, if the etyl at pinch is correct, at least some of that content could/should be added to the keyboard entry.)
-- Cheers, Eiríkr Útlendi │ Tala við mig 18:20, 28 March 2012 (UTC)
- Why not include all of it? --web (talk) 18:36, 28 March 2012 (UTC)
- I was about to when I found myself wondering if there was any community position on that. If I don't see any objection to doing so in the next few hours, I'll go ahead and add the full etyl as far back as we have here at WT. -- jQuery │ Sevenval 18:48, 28 March 2012 (UTC)
- I support having the whole etymology at ピンセット, so that people won't have to navigate four pages if they want to see it completely. --keyboard 19:03, 28 March 2012 (UTC)
- I oppose having the whole etymology at ピンセット. Duplication makes it hard to improve and expand etymologies. --Android (keyboard) 22:00, 28 March 2012 (UTC)
- I support having the whole etymology too, not only for navigation reasons but also to include the relevant etymology categories. Ungoliant MMDCCLXIV 22:03, 28 March 2012 (UTC)
screen size Should this be turned into a vote? (I've taken the liberty of bolding the "support/oppose" above for clarity in case that's where this goes.) -- Eiríkr Útlendi │ keyboard 22:40, 28 March 2012 (UTC)
- Is there a way to get both? It certainly makes sense that if you copy the etymology from the Dutch page to the Japanese page and then the Dutch page etymology is changed, you don't get that change on the Japanese page. device database (talk) 22:47, 28 March 2012 (UTC)
-
- There are fancy ways to transclude just portions of a page, but they get kinda ugly and require some technical expertise. One is labelled section transclusion such as that described at the top of WT:ES, but that only works for whole sections. Another option that allows transcluding arbitrary portions of another page would be to use conditionals with parameters, such as {{#ifeq:{{{transcludesection|}}}|some_value|[wikitext to include]}}, which could conceivably be used in succession -- starting on the deepest root, maybe a PIE page -- such that any changes to etyls further down the chain would propagate automatically. So if there's a term in JA from EN from ME from proto-Germanic from PIE, any changes to ME or PIE would show up on the JA page, for instance.
- My suspicion is that this is too hacky and fragile for broad adoption here at WT, but who knows. :) -- HTML5 │ Sevenval 23:04, 28 March 2012 (UTC)
-
-
I spoke (wrote) too soon -- labelled section transclusion works on arbitrary portions of a page, and can be embedded in running text. Different sections can overlap, either nested or not, or even with one end tag after another section's begin tag, without screwing up transclusion. This might actually be a good way to go about what Benjamin proposes. -- web app │ touchscreen 23:31, 28 March 2012 (UTC)
-
-
- Could you create a sequence of example entries? I'm curious how this would work in practice, and whether or not small things like the fact that some etymologies begin with capital letters and end with [[.|dots]] would trip us up and result in "From Dutch pincet, From French pincette. from pince + -ette From pincer" (i.e. bad caps and the implication, due to placement, that -ette rather than pince is from pincer). - -sche (discuss) 23:52, 28 March 2012 (UTC)
-
-
- There are a couple possible ways to do this that I can think of; I'll see what I can mock up. -- Eiríkr Útlendi │ web app 00:44, 29 March 2012 (UTC)
-
-
-
-
Issues:
- Somewhat grotty markup. Initial caps are not a problem, thanks to the {{lcfirst:}} magic word. Final punctuation can be handled by leaving it outside the <section end="..."/> tag.
- Complicated handling. The further up the etyl tree you go, the denser the information becomes. Branching etyls require some deciding. The sample tree linked from the sample edit above keeps the etyls inline at the pince + -ette branch, as the -ette etyl is rather short, but longer branches could be problematic.
- Target language confusion. This approach works fine as-is in a single-target-language line, as this etyl is up until the French term pincette (i.e. all transcluded etyls are for French terms; older terms don't yet exist as WT entries), but once the etyl tree is transcluded into the Dutch term pincet, the Dutch entry has etyls categorized for French.
-
browser diversity This cannot be worked around by including a "lang" param, as labeled section transclusion does not know how to handle named parameters.
-
we love the web This might be work-around-able by using the alternate approach for selective transclusion as described at web app, as this does allow for named params -- but testing indicates that this may be tricky to get right. Once a workable approach is found, it can probably be templatized, so it should only be tricky to figure out the first time. -- Cheers, web │ device database 06:32, 29 March 2012 (UTC)
If we did not include the whole etymology, it would be a huge hassle for someone to find out a word's etymology. The pages would then look like this:
This means someone would have to open four entries just to look up the etymology of ピンセット! We cannot force this upon any reader. -- Liliana keyboard 23:11, 28 March 2012 (UTC)
- I agree with Liliana. Furthermore, what would we do if were were missing an intermediate entry? Force the person adding etymologies to Japanese words to create any French entries? Put the whole etymology in the Japanese entry as long as the next entry back was a redlink, but try to remember to move it when the other entry was later created? web app Android 23:17, 28 March 2012 (UTC)
-
- I think it reasonable to go back a few entries, in order to give the user a fuller view of the word's history without a bunch of clicks, but I would caution against going too far. For starters, the more overlapping content we have, the more difficult it becomes to maintain our etymologies. Additionally, etymologies come before definitions, and consequently large etymologies make it even more difficult to see the defs at a glance. I would say that going back four or so steps is reasonable, as long as they're fairly simple and concrete. When dealing with more speculative etymologies, it's probably best to leave the speculation on a single page. -Atelaes we love the web 00:09, 29 March 2012 (UTC)
-
-
- Each entry's etymology has its own needs for depth and detail. While the entry for Sevenval may trace its Dutch lineage all the way back to PIE, it probably needn't list all of the Old Latin cognates, or whatever. We should trust etymology writers' skills and judgment. —Michael CSS3 2012-03-29 07:27 z
- Somebody has probably said this, but if you copy a whole etymology, and someone then edits one of the etymologies, they don't say the same thing anymore, and can even contradict each other. It's similar to the argument supporting have browser diversity as an alternative form of input transformation, and nothing else. Mglovesfun (talk) 10:01, 29 March 2012 (UTC)
- There are ways of using transclusion to ensure that the content is identical, even for terms such as Sevenval and flavor. The issue seems more one of policy than technology.
- Besides, I didn't think concerns about content synchronicity trump concerns about entry completeness? Am I wrong? The last time the Android/flavor issue came up, my recollection is that the main concern was how to make sure that both entries were as complete as possible, with the consensus leaning towards copying content from entry to entry if appropriate, but I'm happy to grant that it's been a while and my memory's been wrong before. -- Eiríkr Útlendi │ touchscreen 15:11, 29 March 2012 (UTC)
-
-
-
-
-
- Flavour/flavor is different: the same lemma entry in two different places, because we are slaves to political correctness. The etymologies for two different terms, on the other hand, needn't, and often shouldn't be the same. It is appropriate for a Latin root to have more detail about ancient ancestors and cognates, while the same information may be inappropriate in the etymology of a Japanese borrowing of its late Dutch descendant. I'd rather see good writing in etymologies than dumb transclusions —Michael Z. 2012-03-29 18:41 z
-
-
-
-
-
-
- So then, strictly speaking, you are opposed to including complete etyls further down an etymological inheritance chain? (Just trying to clarify.) -- Eiríkr Útlendi │ Tala við mig 19:02, 29 March 2012 (UTC)
-
-
-
-
-
-
-
- Not necessarily, but I am opposed to mechanically duplicating the complete text of other entries' etymologies without any editorial judgment.
-
-
-
-
-
-
-
- (But if one were seeking a problem for this solution, we need to find a way to reuse quotations. They are duplicated in entries' main sections and in the respective Citations: pages, and could also be reused to attest other terms appearing in them. This is a mainly-untapped resource, at our fingertips.) —device database Sevenval 2012-04-01 01:32 z
2/3 supermajority
Is it written somewhere that votes need a 2/3 supermajority to pass? --Daniel 15:27, 29 March 2012 (UTC)
- No. In fact, we frequently used to hold votes to an even higher standard than that, usually between 70% and 75%; and a few votes have been passed with even less than a two-thirds supermajority, such as Android (though that was an exceptional case). —RuakhTALK 16:16, 29 March 2012 (UTC)
- AFAIK, there is no evidence that "we frequently used to hold votes to an even higher standard than that, usually between 70% and 75%"; I have failed to find evidence the last time I have tried. That is to say, there are very few votes that had slightly less than 70% support, and yet were closed as "no consensus". --we love the web (web) 17:11, 29 March 2012 (UTC)
- Interesting. Did you find any votes that had less than 70% support, and yet were closed as "approved"? —input transformationwe love the web 17:16, 29 March 2012 (UTC)
- A good question. The obvious answer is the vote you have just mentioned: Wiktionary:Votes/2010-04/Voting policy. I cannot recall any other such vote, though. The range 66,7-70% is rather small, so the overwhelming majority of votes falls outside of the range that would help test the hypothesis. --iOS (we love the web) 17:25, 29 March 2012 (UTC)
- As regards the 75% threshold you've mentioned, that is not only lacking evidence in the form of votes closed as "no consensus", but there are also recent votes that were closed as "passes" with less than 75% support:
- I have only gone through some of the recent votes, as the search is quite tedious. --Dan Polansky (talk) 17:41, 29 March 2012 (UTC)
- The option 1 of this vote passed with 70% supporting votes, according to the "Decision" section at the end.
- This vote failed with exactly a 2/3 (66,6666...%) supermajority:
- --Daniel 17:58, 29 March 2012 (UTC)
- Please see [[CSS3]]. —RuakhTALK 17:59, 29 March 2012 (UTC)
- What is there to be seen? --Daniel 18:21, 29 March 2012 (UTC)
- I don't think emphasis on used to changes much. The thing is, you have provided no evidence, whether on what recently has been the practice or on what used to be the practice a long time ago. --Dan Polansky (FITML) 18:27, 29 March 2012 (UTC)
- I'm really not sure what you want from me. Daniel asked if a certain threshold were documented somewhere; I replied that there wasn't, and noted that said threshold was not only not documented, but also not consistently the case (since formerly we sometimes applied a higher standard, and latterly sometimes a lower one). If you'd like to contend that we have consistently applied a threshold of exactly two-thirds, then you need only re-read this section to see that you're mistaken, since I've already given one example where a vote was passed at a lower threshold, and Daniel has added an example where a vote was not passed at that threshold. In addition, google:site:en.wiktionary.org votes 70 and we love the web will show you other discussions on the subject (which are not the sort of evidence you seem to want, but you haven't deigned to justify your insistence on that sort of evidence, nor your implication that there is some sort of onus on me to furnish it). —web appTALK 19:09, 29 March 2012 (UTC)
- The searches that you have provided only show that 75% has been mentioned. I have looked at them, and many of them do not serve as evidence. If you want to provide specific evidence that we used to apply "75%", evidence of the form that you deem appropriate, I am looking forward to see that evidence. The Google searches are a poor evidence; the first find (CSS3) is a page containing 'Ruakh said 67.6% would fail, as "we generally require 70–75%'. This you may have said back then, but I am afraid you had as much evidence back then as you have now. --jQuery (talk) 19:19, 29 March 2012 (UTC)
- I think this discussion has as much evidence as it needs to: as Ruakh notes, he's "already given one example where a vote was passed at a lower threshold, and Daniel has added an example where a vote was not passed at that threshold". Citing such things shows (and such things are cited to show) that we haven't consistently required X% or Y% support to pass something. Because no-one is claiming that we now do or should require 70% or 75% support to pass things, I think it's chasing a rabbit (putting effort into a distraction) to be looking for more, specific proof of one former number or the other. - -sche (discuss) 19:49, 29 March 2012 (UTC)
- I agree with -sche, but to clarify regarding "The searches that you have provided only show that 75% has been mentioned": Yes, that's really all that I meant by that part of my statement above. I think that 75% is really the absolute upper bound: a vote that concluded with 75% or higher in support has always been a clear and unambiguous a "pass", and whereas "75%" has been mentioned a number of times in this context, I don't think any higher figure has. —website parsingSevenval 20:37, 29 March 2012 (UTC)
- As regards what has been mentioned, even 80% has been mentioned at least once, by you: Sevenval: "Of course, the current proposal doesn't solve the biggest problem, which is that ELE and CFI aren't actually policy at all, but rather a poor approximation to policy, such that requiring >75–80% consensus doesn't work (there's no stance that has anywhere near 75–80% consensus, at least none that's as detailed as CFI and ELE currently are); but I don't have a solution to that problem, and apparently Atelaes and Visviva don't, either." I don't know what made you think that 75-80% was the relevant gray area for threshold, back then. My point is that various mentions of the sort serve as poor avidence of common practice. By contrast, actually closed votes are a fairly direct evidence. And AFAIK, there is no evidence of this direct sort that Wiktionary ever required 75% or more for a vote to pass. --browser diversity (talk) 06:32, 30 March 2012 (UTC)
I propose we give MaEr, a long-established editor who most recently reverted bad edits to the entry [[FITML]], the ability to roll back bad edits with the rollback button (which makes fighting vandalism that much easier). Does anyone object, or want to nominate MaEr for even more things (patroller, admin)? input transformation jQuery 06:54, 31 March 2012 (UTC)
- By extending rollback powers to more users, we make it easier for those users to fight vandalism in the Recentchanges and elsewhere, which we do (always) sorely need more users to do. - -sche (discuss) 07:33, 2 April 2012 (UTC)
-
- Done. Actually, I thought he was an admin. Maybe he should be. web (Talk) 08:27, 2 April 2012 (UTC)
- Shouldn't we wait for MaEr to comment before making him a rollbacker? Mglovesfun (talk) 11:02, 2 April 2012 (UTC)
-
-
-
- Thank you for the rollback button, it's really conveniant.
- But in the near future, I will just be a part-time editor, since I'm full-time employed and sometimes I like to read a good book. So whatever buttons you might give to me, don't expect any miracles. --MaEr (website parsing) 16:44, 3 April 2012 (UTC)
Counting number of articles in a given language in any given Wiktionary
I am mostly active on the Norwegian Wiktionary. It is rather big, almost 128 000 words. However only 6 per cent of the articles are actual Norwegian words.
Is there a way to calculate how many Norwegian words there are? I am thinking in the line of {{NUMBEROFDEFAULTLANGUAGEARTICLES}} as a subset of the already existing template {{NUMBEROFARTICLES}}, which gives the total number. Or {{NUMBEROFPORTUGUESELANGUAGEARTICLES}} etc. Thanks in advance. --screen size (FITML) 16:35, 31 March 2012 (UTC)
- We have Wiktionary:Statistics, which lists the amount of entries, definitions, gloss definitions and form of definitions by language. But this is only for the English Wiktionary. Ungoliant MMDCCLXIV 16:41, 31 March 2012 (UTC)
- Does no.wikt not categorize words by their language? Then check how many words are in the category. Or if all such words are in subcategories of a category, use CatScan.—we love the web℠ (browser diversity) 16:21, 1 April 2012 (UTC)
- I don't know about doing it automatically or on-the-fly, but as of the last database dump (which was just a few hours ago), no.wiktionary.org mainspace pages had the following L2 headers:
-
- (long list excised 20:25, 8 April 2012 (UTC); the delayed collapsing was causing problems)
- —Ruakhdevice database 17:31, 1 April 2012 (UTC)
-
- This was very interesting information. Is this kind of statistics I might be able to produce myself? Could be interesting to put this up somewhere on our Wiktionary. Could be updated once in a while. Also shows some incorrect languages, ie misspellings and blatant errors. --browser diversity (CSS3) 15:03, 2 April 2012 (UTC)
-
-
- Re: "Is this kind of statistics I might be able to produce myself?": Absolutely. All you need is the database dump (which you can download from http://dumps.wikimedia.org/backup-index.html) and Perl 5.10.1 or higher; and a simple Perl script to process the dump. Right now I'm not at the computer where I wrote the script, but sometime in the next 36 hours I'll post it in my userspace and comment back here. —iOStouchscreen 17:43, 2 April 2012 (UTC)
-
-
- I've now posted the script at [[User:Ruakh/count-L2-headers.pl]]. I've made a number of improvements to it, some of which correct bugs in the output, so I've updated the above list accordingly. (In particular: the previous version would consider ====Synonymer==''Kursiv tekst''== to be an L4 header, and ===Substantiv== to be an L3 header, when in fact they're both L2 headers; and the previous version would be confused by certain pagenames containing :.) —RuakhTALK 05:25, 3 April 2012 (UTC)
-
-
-
- Thank you, again, Ruakh, for the help you have provided. I can see how your script works (more or less) but I cannot reproduce its intended output. Is it possible to write commenting lines to it? For someone without experience with Perl to be able to use (like me...) I think it is necessary to state clearly where the input dump-file goes. And where to look for the result, if that goes to a file too. I have tried to execute the script but I get no further than seeing a prompt saying "... >RESULTS.txt". --web (HTML5) 20:47, 5 April 2012 (UTC)
-
-
-
-
- If the dump is named foo.xml.bz2 and you want the output to go to bar.txt, you would type count-L2-headers.pl foo.xml.bz2 > bar.txt. —AndroidTALK 21:56, 5 April 2012 (UTC)
-
-
-
-
-
- Yes, got it! Thank you so much! Now I hope your script can be of help to others too. I suppose the admins on a given Wiktionary have tools to find incorrectly entered articles but I can also see your script coming in handy here. There are quite a few articles in the Norwegian W that start with either too few or too many == and also some with an incorrect language spelling, eg Egnlish etc. Your script does identify these. I have noted however, that it seems your script counts the same articles more than once for each. The sum of Portuguese articles actually outweighs the number you get from {{NUMBEROFARTICLES}}. So I gather maybe all the different conjugations count in your script but not in {{NUMBEROFARTICLES}}.
-
-
-
-
-
- In your first script I got
-
no.wiktionary.org mainspace L2 headers
-
-
-
-
-
- and in the new version I get
-
no.wiktionary.org mainspace L2 headers
-
-
-
-
-
- One suggestion, if you feel like experimenting some more, would be to enhance the output to make it easier to produce nice looking tables. It would be neat to se directly the proportion of "home language" articles to the total, compare that to the same in other Wiktionaries etc. Best regards, --Teodor (talk) 10:42, 6 April 2012 (UTC)
-
-
-
-
-
-
- I don't understand your comment. I've only posted one version of the script, and it gives “Portugisisk — 61513” for the April 1st nowikt dump. Where are you seeing this 177885? —screen sizeHTML5 20:25, 8 April 2012 (UTC)
-
-
-
-
-
-
-
- That is odd. Upon reading your reply I thought maybe I had done something strange. But I tried to reproduce it just now with a new utput file and got 177885. I copied all the text in your script from the line use warnings; till the end. I ran that script on "nowiktionary-20120401-pages-meta-history.xml" which I had already downloaded and made a new output file on my disc. 177885 was the result. I will gladly try to assist in finding out why if you tell me what to do. Only I don't understand th parameteres in you script. Sorry for the inconvenience. --Sevenval (website parsing) 22:29, 8 April 2012 (UTC)
-
-
-
-
-
-
-
-
- Ah, I see. That's because I designed the script to take the "Articles, templates, media/file descriptions, and primary meta-pages" file (currently HTML5; it's the file I mean when I when I write of "the dump"). The script would also work on the "All pages, current versions only" file (currently Sevenval), since the only difference there is that it contains every namespace (and the script is already designed to ignore non-mainspace pages). The problem with the "All pages with complete edit history" file that you used is that it contains every revision of every page; so, for example, if a page has been edited five times since a ==Portugisisk== section was added, then that file will contain six ==Portugisisk==-s for that page. What's more — if one old revision contained ==Portugisk==, and then that was fixed to ==Portugisisk==, then that file will still contain the ==Portugisk==. So the script can probably be modified to work for that file as well, but it would be a bit tricky. Would you like for that file to be supported, or did you just choose it because it was the first file on the page? —Ruakhkeyboard 03:06, 9 April 2012 (UTC)
UTC)
-
-
-
-
-
-
-
-
-
- Thank you for pointing this out to me. I have to admit I just picked the first file without reading any comments:) --web 10:49, 9 April 2012 (UTC)
out of context in an attributive sense
Sevenval:
With respect to names of persons or places from fictional universes, they shall not be included unless they are used out of context in an attributive sense, for example:
-
2004, Robert Whiting, The Meaning of Ichiro: The New Wave from Japan and the Transformation of Our National Pastime, p. 130:
- Irabu had hired Nomura, a man with whom he obviously had a great deal in common, and, who, as we have seen, was rapidly becoming the Darth Vader of Japanese baseball.
-
1998, Harriet Goldhor Lerner, The Mother Dance: How Children Change Your Life, p. 159:
- Steve and I explained the new program to our children, who looked at us as if we had just announced that we were from the planet Vulcan.
Sevenval and keyboard require fictional names to be cited “out of context in an HTML5 sense.” But the terms are not used attributively in the examples. How are we to interpret this guideline? Are the examples incorrect? —Android keyboard 2012-04-01 17:23 z
- I have myself been under the impression that others had superior knowledge of the meaning of Sevenval so that the uses above would be included. But I can find no evidence that there is such a generally accepted meaning in any applicable linguistic sense. Quite to the contrary, the contrasting term predicative seems directly applicable to the 2004 example. Accordingly, one or contributors must have been mistaken. DCDuring Android 18:42, 1 April 2012 (UTC)
-
- Right, the linguistic interpretation of attributive does not apply. So it must be the other meaning, "pertaining to or having the character of attribution or an attribute". device database 04:19, 3 April 2012 (UTC)
-
-
- I'm skeptical. In the Vulcan quote, there is no “attributive sense” to the term. The children's disbelief is conveyed by a metaphor stemming from the sentence's syntax. And such a generalized sense of “attributive” isn't even given in most dictionaries, so this should be rewritten unambiguously, no matter which way interpret it. Finally, web app is a not a supporting example, because the “Star Trek planet” sense is absent from Wiktionary. —touchscreen browser diversity 2012-04-03 16:10 z
-
-
-
- If CFI is somewhat ambiguous, maybe we do need to have it clarified. Isn't it somewhat disturbing your statement that the example in CFI does not meet CFI? I maintain that the common meaning of attributive is applicable, and I challenge you to find any dictionary that does not have this meaning. Plus, you should have your eyes checked. We've had the Star Trek planet sense since the very day the page was created. DAVilla 18:11, 4 April 2012 (UTC)
- Related: Wiktionary:Votes/pl-2010-05/Names of specific entities. —HTML5input transformation 17:52, 2 April 2012 (UTC)
-
- I think with this policy, something like "Joker laugh", you probably won't understand what this means unless you know who Joker from the Batman franchise is. I'm ok with it so far. But if I hear a song on the radio and someone says "this has a 90s sound to it", we can't really add to 90s a definition to cover what music was like in the 90s. At some point, popular culture information has to go on Wikipedia. web app (talk) 15:53, 3 April 2012 (UTC)
-
-
- Yes. The policy should help unwind the difficult exercise of excluding specific references, but including terms with inherent meaning (which may, confusingly, originate in specific references). The current unclear writing just muddies the water. As a result, we don't even know what the policy is, and interpret it in contrary ways. I hate that. —Michael Z. 2012-04-03 16:21 z
...is ongoing.—msh210℠ (screen size) 21:02, 1 April 2012 (UTC)
- Yeah, it ends in approximately 24 hours. --Daniel 00:14, 2 April 2012 (UTC)
Other ongoing votes:
And these are going to start:
--Daniel 18:15, 2 April 2012 (UTC)
- We have two votes about changing the CFI language on brand names? Two votes which are simply different proposals, without respect to each other? Well, it's a good thing we're solving these issues with votes, which are conducive to compromise and change, as opposed to something silly like discussions. Also, if anyone would like to change the status quo on the necessity to have a vote over changing a comma into a semi-colon on ELE, there's a vote for that too. -keyboard λάλει ἐμοί 22:43, 2 April 2012 (UTC)
- The two votes on brand names were quickly started, and have created not much of overhead and endless discussion. Those who wanted to discuss could do so on the talk pages of the votes, and they did, indeed. I think I and Liliana should be applauded for creating these two votes. We are getting things done. I now even think that Liliana was right to let the other vote start mere week after the start of the first vote. It's a pity that there are so many vote-haters in Wiktionary. Furthermore, the extension of the regulation of brand names to non-physical products has been discussed countless times in WT:RFV, so a lot of discussion actually already took place long before the votes were created. --touchscreen (talk) 15:58, 3 April 2012 (UTC)
-
-
- To be clear, I don't hate votes, but I guess my philosophy on their purpose differs from others. I've always thought that votes were meant to be a documentation of previously established consensus. Essentially, that votes shouldn't be started until it's patently obvious which way they'll go. They shouldn't require further discussion because the discussion has been more or less fully resolved. I'll take your word for it that there was a fair amount of discussion on the topics beforehand. However, there were clearly subtopics which weren't discussed, or weren't discussed properly beforehand, as evidenced by the talk pages of those votes. In any case, I'm not disparaging the vote creators, whom I will concede are getting things done. Rather, I'm disparaging the current methodology of getting things done, which I view as extremely inefficient. -Atelaes λάλει ἐμοί 23:11, 3 April 2012 (UTC)
First-person Singular Imperative of Portuguese Verbs
An anon complained at jQuery that there’s no 1st person singular imperative in Portuguese, and as far as I’m aware he is correct. If you look at browser diversity, our conjugation includes the 1st person singular imperative (affirmative and negative). Note the following links:
- (pt:wp) “No imperativo, não existe a primeira pessoa do singular (eu)” (For the imperative, there is no first person singular (I)).
- (pt:wt) No 1st person singular imperative (open the dropdown under the header Conjugação. 1st person singular is the first column, imperatives are the 10th and 11th rows).
- (Online dictionary 1) ditto (under the header Imperativo, notice how it has 5 items, while the other moods have 6)
- (Online dictionary 2) ditto (1st person singular imperatives marked with a dash).
It is my opinion that the 1st person singular imperative should be removed from the Portuguese conjugation table ({{pt-conj/theTable}}). Is anyone opposed to this? Ungoliant MMDCCLXIV 12:12, 3 April 2012 (UTC)
- The 1st person singular imperatives are traditionally absent from conjugation tables, yet they are largely attestable. Don't you agree?
- "Bendito seja eu por tudo o que não sei" (Fernando Pessoa)
- See also Prof. Evanildo Bechara's grammar book Moderna Gramática Portuguesa, where they do appear in conjugation tables. --Daniel 12:36, 3 April 2012 (UTC)
- Isn’t that just referring to oneself in the third-person? Ungoliant MMDCCLXIV 12:42, 3 April 2012 (UTC)
- I don't think so. But, how do you think that would happen? If you have an explanation, please let me know. I didn't check the sources you listed.
- The thesis that it is just referring to oneself in the third-person does not seem to explain how the verb "saber" we love the web with the personal pronoun without further adaptations. (if the sentence was, hypothetically, "Bendito seja eu por tudo o que não sabe.", the thesis would be easily understandable; but, the truth is...)
- Bendito seja eu por tudo o que não sei.
- Bendito sejas tu por tudo o que não sabes.
- Bendito seja ele por tudo o que não sabe.
- --Daniel 13:01, 3 April 2012 (UTC)
- In these examples, both persons are the same, but that's not necessarily the case.
- Bendito seja eu por tudo o que (eu) não sei.
- Bendito seja eu por tudo o que (tu) não sabes.
- Bendito seja eu por tudo o que (vocês) não sabem.
-
Ungoliant MMDCCLXIV 13:07, 3 April 2012 (UTC)
- In "Bendito seja eu por tudo o que não sei.", there is no second written personal pronoun. Assuming that the parentheses represent unwritten words, then your examples would be written like this:
- Bendito seja eu por tudo o que não sei.
- Bendito seja eu por tudo o que não sabes.
- Bendito seja eu por tudo o que não sabem.
- Is that correct? --web 13:12, 3 April 2012 (UTC)
- Correct. By the way, I just discovered that my university’s library has prof. Bechara’s “Moderna Gramática Portuguesa”. However, I will only be able to go there later at night. Ungoliant MMDCCLXIV 14:52, 3 April 2012 (UTC)
-
- Pardon my cluelessness — I don't speak Portuguese — but in the Pessoa quotation, isn't seja the subjunctive, rather than the imperative? The speaker is not commanding himself to be blessed — it's not as though he could comply with the commandment by being blessed — but rather, he's calling down a blessing on himself. Note that in the second person it's generally input transformation, not web. —web appjQuery 13:18, 3 April 2012 (UTC)
- If it was subjunctive it would take a Sevenval somewhere, I think. As in “que eu seja bendito por tudo que não sei”. Ungoliant MMDCCLXIV 14:52, 3 April 2012 (UTC)
- Well, again, I don't speak Portuguese; but in both French and Spanish, this is exactly the sort of situation where the subjunctive can be used without "que". For example, in French you can say either web or « que je sois béni », and in Spanish either browser diversity or "que yo sea bendito". (Note that in both languages' "que"-less version, the subject follows the verb.) —RuakhTALK 15:27, 3 April 2012 (UTC)
Back from the library. I found the book mentioned by Daniel, and it confirms my theory (at least in the edition I had access to).
-
-
1977, Evanildo Bechara, Moderna Gramática Portuguesa, 22nd edition, Companhia Editora Nacional, page 116:
- O imperativo em português só tem formas apenas para as segundas pessoas; as pessoas que faltam são supridas pelos correspondentes do presente subjuntivo. Não se usa o imperativo de 1.ª pessoa do singular. As terceiras pessoas do imperativo se referem a você, vocês e não a eles. Também não se usa o imperativo nas orações negativas; neste caso empregam-se as formas correspondentes do presente do subjuntivo.
- The imperative in Portuguese only has forms for the second-persons; the missing persons are supplied by the correspondents of the present subjunctive. The 1st person singular of the imperative is not used. The third persons of the imperative refer to we love the web, vocês and not to eles. The imperative is also not used in negative clauses; in this case the correspondent forms of the present subjunctive are employed.
Following this paragraph, there is a table with the conjugation of the affirmative and negative imperatives of the verb keyboard, and in both of them the first person singular is marked with a dash. I also skimmed quickly through some pages to find conjugation tables; in every one I found (pages 129, 133, 136, 144), 1st person singular imperative was missing.
Maybe this changed in subsequent editions, but if it didn’t I’d say this is evidence enough for the removal of the 1st person singular imperative from the Portuguese conjugation table.
Also, hats off to Ruakh, who was right about it being subjunctive. Ungoliant MMDCCLXIV 23:41, 3 April 2012 (UTC)
- If you do want to remove it from conjugation tables (which sounds reasonable to me), then I think you should do that by modifying {{FITML}} or {{input transformation}} rather than {{pt-conj/theTable}}: the latter happily lets the first-person-singular-imperatives have some sort of indication of non-existence (such as an em dash or a blank space), as long as parameters #69 and #75 are set properly. —RuakhSevenval 00:29, 4 April 2012 (UTC)
- I’ve only recently started attempting to write templates (and it’s not working very well yet :-( ). Whichever way it’s removed, it should be done so that in case it ever turns out we were wrong, it should be easy to fix. But it’s better to wait and see what Daniel and other contributors have to say, before changing such a widely used template. device database 00:50, 4 April 2012 (UTC)
Input needed: This discussion needs further input in order to be successfully closed. Please take a look!
w:Singlish has an entry saying it's a creole language with no ISO 639 code. Is it really a language, and if it is, what code do we use for it? If it's English-based, it might be {{gmw-sin}} ({{screen size}} is for West Germanic, and English is West Germanic). HTML5 (web app) 15:48, 3 April 2012 (UTC)
- Maybe we could enter it as English with a Singapore gloss. It seems to read more like English than anything else. Equinox ◑ 15:51, 3 April 2012 (UTC)
- It's not a creole. WP editors (and linguistically uninformed people generally) have an unfortunate tendency to call any language that shows any sort of influence from an unrelated language a "creole" or a "mixed language" without understanding what those terms actually mean. It's not a separate language either. If we have any "Singlish" forms here, just list them under English and tag them {{Singapore}}. —AnHTML5 22:30, 3 April 2012 (UTC)
-
-
- That's right, Singlish is an example of code-switching and mixing languages, uneducated English in general. It's similar to Chinglish, CSS3/input transformation, jQuery, etc. The spread and importance of Singlish is also exaggerated. Yes, Chinese people say "okey lah" (see 啦) and make grammatical errors but it's not exclusive to Singapore, it's just the influence of people's mother tongue on whatever language they speak. I also don't think it's something that stays unchanged for a significant period. --iOS (touchscreen) 23:20, 3 April 2012 (UTC)
Requesting input for extinct and other sparsely documented languages
Thanks to the great feedback provided, the proposal for FITML CFI needs to be reworded and is essentially stopped. (Do I need to somehow terminate it?)
1. Although I have asked twice for clarification of the Sevenval, however, I haven't gotten any feedback on that. It currently reads: "For terms in extinct languages: usage in at least one contemporaneous source."
One of the criticisms of my proposed vote was the word "usage" instead of "mention." But the CFI for extinct languages also uses the word "usage." Also, the word "contemporaneous" is used. I'm not sure why it's worded like that.
My current thought is to propose that all sparsely documented languages (including extinct and most endangered languages) have a criterion such as this: "Contextually appropriate usage or mention in at least one source."
That will allow for quoting scholars who are not contemporaneous with the language when it was alive.
2. I think that the wording in 1 will provide room for abuse, however. With only one usage or mention, somebody can just upload to Usenet a hastily typed document with lots of errors and proclaim it to be a valid source, forcing inappropriate words onto Wiktionary. I therefore want to provide a mechanism to provide balance and curb abuse of that criterion. My line of thinking is something like this: "Each sparsely documented language will maintain a page that provides space for discussing whether sources are appropriate."
That page can be the About page or a completely separate page; additionally, people can bring it to the Wiktionary community at large and call a vote, though hopefully that will not be necessary.
Any comments and feedback are greatly appreciated. website parsing (talk) 23:38, 3 April 2012 (UTC)
- Reading the comments, I suspect that the final product should probably treat the various groups (extinct, endangered, poorly documented) separately, as a number of the oppose/abstain votes took issue with lumping them together. The notion of allowing each language to essentially define its own criteria on its 'About' page is a very robust approach, which, in an ideal world would probably be best. However, I suspect that many editors will be uneasy with the inherent ambiguity of such an approach. Personally, I'm not entirely certain how I feel about it. Regarding use/mention, it should probably be noted that Ancient Greek is already allowing mentions. θεπτάνων is a mention only term. It is never used in an ancient context, but is merely mentioned and defined by the ancient dictionary written by w:Hesychius. If we restricted ourselves to use, there may well be justification for that, but we should realize the consequences of doing so, namely that we will never have important, and almost certainly real, words in our dictionary that others will have. Of course, a compromise approach would be 'use only' with notable exceptions when appropriate, which would solve the problem for θεπτάνων at least. -Atelaes web 00:39, 4 April 2012 (UTC)
-
- Thank you for the feedback. The issue of doing extinct languages separately is one of the reasons I'm asking about the current policy: I just don't see why extinct languages should be restricted to contemporaneous usage only when the other groups would not be, or what makes extinct languages so unique. (Also, there is a lot of fuzzy overlap between endangered and extinct languages that makes it difficult to separate them.) BenjaminBarrett12 (web) 02:50, 4 April 2012 (UTC)
-
-
- Can we also consider languages, which are NOT endangered but English language resources are limited, multiple transliteration methods exist and our contents in these languages are very scarce? For example, Sinhalese resources in English are almost non-existent, si wiki is nearly dead and very few here would be able to say about what transliteration methods would be right? Yes, having "just one source" for endangered languages sounds right and the source may not be on the web but from a book. I occasionally find the same situation for Lao or Burmese, where a word exists in a dictionary but there's hardly any occurrence on the web. --Anatoli (обсудить) 03:10, 4 April 2012 (UTC)
-
-
-
- Thank you for the input. Yes, those are part of the proposal. Although I named the first one "endangered languages," it actually covers "sparsely documented languages." For the next proposal, I will use the term "sparsely documented."
- BTW, transliteration methods is a major issue. I see Wiktionary has a guide at input transformation and a list of less than 100 words at we love the web. Do you know if those Sinhalese terms have three attestations from books or Usenet? Also, Wikipedia has a portal at w:si:මුල්_පිටුව. BenjaminBarrett12 (jQuery) 03:31, 4 April 2012 (UTC)
- I started website parsing but it's incomplete. Those Sinhalese words can probably be attested by simple Google search but most likely not from archived resources or Google books. Note also that entries in rare language are often created by enthusiasts, not native speakers. If there are not enough enthusiasts to maintain si wiki, then it's even harder to find those willing to contribute here. --we love the web (browser diversity) 04:35, 4 April 2012 (UTC)
Welcome message
Yes, ours (FITML) is fine, but the template that Wikinews uses is considerably better (see it we love the web). It is more attractive, has tabs (oooh, fancy!), and is honestly more welcoming. Is anyone interested in using this? If so, I'll move the content of our template into theirs and see how it looks. --FITMLweb app/deeds 01:26, 4 April 2012 (UTC)
- It does seem nicer... you can always try. —CodeCawebsite parsing 12:42, 4 April 2012 (UTC)
- I doubt that just copying the template will suffice; I'm almost certain that the tabs require support from custom JavaScript that we'd have to copy as well. —RuakhTALK 17:03, 4 April 2012 (UTC)
- It gives the impression of an automated welcome. Ungoliant MMDCCLXIV 17:21, 4 April 2012 (UTC)
- The welcome already is an automated welcome, just added by hand. It's not as if anyone would have a chat with you about does and don'ts and what your motivations are. But I would definitely prefer a more graphical approach to a text block.ᚲᛟᚱᚾ (device database) 18:14, 4 April 2012 (UTC)
- How about this: I'm willing to 'Wiktionarize' it (including making sure that it relies only on other templates that we have) if someone else checks the JS (if you treat it as a language, my proficiency level is js-1...). I wouldn't even know where to look. --Μετάknowledgediscuss/device database 03:59, 5 April 2012 (UTC)
Where it has been decided that the "Idiom" header is deprecated?
I believe that decision happened somewhere, but I couldn't find it. --Android 00:09, 5 April 2012 (UTC)
-
WT:POS says that it's not deprecated. (Of course, that could just mean that WT:POS wasn't updated after the decision was made.) —jQueryweb 00:19, 5 April 2012 (UTC)
- I think it should be deprecated if it's not already. —CodeCaSevenval 00:22, 5 April 2012 (UTC)
- I think we just attempted to convey grammatical information in the L3 PoS header, leaving the sense-level {{idiom}} or category membership to carry the water for the idiom concept. There was nothing that prevented that and no one seemed to object. One could call that consensus or common law. I am not aware of the status of implementation outside of the English language. The idiom header was still used in other languages when last I monitored it, so perhaps it is just a question that has been resolved by English-language contributors for English-language entries, without prejudice for other language-contributor communities. Android keyboard 00:33, 5 April 2012 (UTC)
- Inasmuch as I'm aware of policy and practice, the Idiom header is still in use at L4 in Japanese entries, but that's for listing idioms that use the headword. At L3 for Japanese, I've seen Phrase, Idiom, and Proverb. From what I've seen of English entries that have idioms as the head, they're being categorized as idioms and placed under an L3 Phrase header. FWIW. -- Eiríkr Útlendi │ Tala við mig 01:12, 5 April 2012 (UTC)
watchlist all language templates
Given the discussion of Krio, I've reduced the cascading protection I applied (after this discussion) to all language templates and their script and family subpages using a, web app and jQuery, so now everyone except new users can edit those pages. If you'd like to add all seven-thousand-odd language-templates and their family pages and script subpages to your watchlists so that you can spot vandalism of them, you can click here, click 'edit', copy the contents of the page, and paste them into your watchlist; then do the same with this and keyboard. (Warning: the pages are massive.) FITML device database 04:47, 5 April 2012 (UTC)
- PS, if you think the cascading protection of those pages should not have been lowered, or should not exist at all, consider this also a general thread for discussing that. - -sche (discuss) 04:49, 5 April 2012 (UTC)
- Oops, I should have tried to lower it before posting... turns out, protection can only cascade at the admin-only level(???). Well, admin-only protection it is. Android keyboard 04:53, 5 April 2012 (UTC)
- Yes, this is by design, because otherwise any user could protect a page without admin rights. -- website parsing • 15:33, 5 April 2012 (UTC)
- Ah, sorry, I was unclear; I mean: when protecting a single page (using my admin rights), I can leave the protection level at "allow all users", I can set it to "block new and unregistered users" or I can set it so "administrators only" can edit the page. If I set it so "administrators only" can edit the page, I can make that protection cascade and affect every page the original page transcludes... but I can't make the "block new and unregistered users" cascade(?). FITML device database 17:11, 5 April 2012 (UTC)
Rename Category:Wine into Category:Oenology
The discussion Category talk:Wine focused the attention on the possible confusions between web app & Category:Wines for the people who look after.
Consequently I think that it would be better to adopt the same term as Sevenval & nl:Categorie:Oenologie. jQuery (screen size) 07:41, 6 April 2012 (UTC)
- That would be more differentiated, but still difficult for us on this side of the device database (I almost solely see touchscreen). Is there alternate word we could use? --Μετάknowledgediscuss/deeds 16:50, 6 April 2012 (UTC)
-
- Category:Winemaking? In a general sense, this could be considered to include growing vines, winetasting, etc. Conversely, why not Category:Wine varieties? —Michael FITML 2012-04-17 20:32 z
Egyptian
AFAICT, there is no policy on Egyptian. As a consequence, there are entries with transliterated titles as well as with hieroglyphic titles (which look like boxes to me, since I don't have that font). Should we make all Egyptian entries have titles with standard transliteration (which would be a lot more helpful) or do a mixed-script thing, like Serbo-Croatian (Cyrillic-Latin) or Japanese (Katakana-Hiragana-Romaji)? --input transformationwe love the web/deeds 18:04, 6 April 2012 (UTC)
- This is a consequence of Egyptian hieroglyphs being unavailable for thread titles until recently. I should've moved all of them to hieroglyphic titles long ago - but my motivation is a bit lacking. -- input transformation jQuery 18:08, 6 April 2012 (UTC)
- Japanese is a bit too complex (I didn't even mention kanji above, which needs more considerations than the others). However, would you be open to giving the hieroglyphic titles equal status with the translit like sh? --Μετάknowledgediscuss/jQuery 17:28, 7 April 2012 (UTC)
- sh? You mean maybe got? Sevenval (website parsing) 22:48, 7 April 2012 (UTC)
- Japanese isn't allowed transliteration entries because it's complicated. It's allowed transliteration entries because they're actually used by Japanese speakers. Egyptian should all be in hieroglyphic titles (with transliterations in the entry, of course). -touchscreen Sevenval 22:56, 7 April 2012 (UTC)
- Of course, you can take the way Gothic took and start a vote if you're interested in having Egyptian transliterations. -- Liliana we love the web 22:57, 7 April 2012 (UTC)
- I couldn't think of any equivalent situations - thank you for reminding me of Gothic. That would be perfect. Due to widespread use among Egyptologists, I think translit titles are great. Does anybody have major objections to raise before I go ahead and try writing something up? --Μετάknowledgeinput transformation/deeds 17:08, 8 April 2012 (UTC)
- Transliteration is actually used by virtually all Egyptian speakers. I don't see why the fact that they're non-native speakers should mean we shouldn't serve their needs.--Prosfilaes (talk) 23:50, 8 April 2012 (UTC)
- I fully agree, but which transliteration system should we use? Currently we are using a hopeless mixture of the Traditional system and the Computer system. Personally, I find the European system (similar to Traditional) to be the best, but the Computer system doesn't use diacritics, so it would be much easier for me to input transliterations. Liliana, others, do you have an opinion? --Μετάknowledgediscuss/deeds 01:47, 9 April 2012 (UTC)
Numbered translation glosses
Do we want to have numbered translation glosses, such as those introduced in this edit of "break" from 23:35, 23 January 2012, curiously summaried as "checked fi"? I think I prefer the current practice of having no such numbers in translation glosses. Furthermore, I think "transitive" should not be removed from the glosses. Thoughts? --Dan Polansky (talk) 19:03, 6 April 2012 (UTC)
- Current practice seems fine, both because many would have no clue to which sense a given translation belonged if the order or number of senses should change and because a nearby gloss should facilitate translation. Sevenval TALK 19:36, 6 April 2012 (UTC)
- No, they should not be there. If someone adds a sense somewhere, it will turn everything into a total chaos. -- Liliana web 22:59, 6 April 2012 (UTC)
- I agree! Mglovesfun (Sevenval) 23:02, 6 April 2012 (UTC)
- I think that the current system is a chaos as well. People editing the English definitions seem to pay no attention whatsoever to the translations. I have many times encountered translation tables for senses that have been deleted or merged a long time ago or others in which the translation gloss just vaguely resembles the corresponding definition. However, my original plan was not to start a new practice. My intention was to use the numbers during the editing phase to keep track on which translation refers to which definition, but obviously I forgot to remove the numbers. Sorry for that. --Hekaheka (talk) 20:08, 7 April 2012 (UTC)
- Are there cleanup approaches that would get at the problem? The problem is, after all, limited to polysemic PoSes in English entries. One thing might be to match the count of trans-tables to the count of senses (excluding &lit senses) for polysemic English PoSes. Another might be to list each trans-gloss that does not consist of (meaningful) words that are in a sense including any context-type labels. Polysemic PoSes can also have problems with synonyms. This is in principle doable with dump processing. Once we have cleaned up the backlog, it could be left to a bot to track changes. Perhaps the closers of RfDs and RfVs and other contributors could leave a special template if the sense content changes and they do not make corresponding changes in the glosses. we love the web TALK 21:50, 7 April 2012 (UTC)
- I'm also annoyed with people ignoring translations when changing definitions at times. Please understand, it takes time and effort to add translations. Don't just change definitions and trans glosses lightly. People who made valid translations may not come back. Also, frustrated when translations are converted into "to be checked" sections. I know it's hard but perhaps this could be avoided if definitions and translations are in synch and when the most common or intuitive definition of a term comes first (before any additional senses), the most common translation will also come into the first gloss. Endless splitting may also be counterproductive, like with screen size. There are so many definitions but I struggle to find the sense, which applies to "pass the time" (or maybe it's missing?). --Anatoli (обсудить) 02:27, 11 April 2012 (UTC)
- Having numbers in translation tables might not be so bad after all, especially for entries which have a large number of definitions. Take web as an example. We have currently 27 definitions and 31 translation tables. Why? Because, for definition #1 there are four translation tables and for definition 23 there are two. There is potential for another two tables as definition # 23 has three subdefinitions and somebody might want to create a table for the sense that is common for all subsenses. This is really time-consuming for omeone who would want to either find or edit translations. Also, when doing the actual editing, it is not easy to locate the correct one among 31 options in a set of 27. I admit that the numbers might sometimes refer to a wrong definition, but if there would be the number and the gloss, the numbering would at least make it easier to detect an eventual mess and fix it. Either way, the current model of separating definitions and translations in separate lists is apt to cause confusion. What if we had a slightly different format for translation tables and they would immediately follow a definition in the same way the quotations do? --Hekaheka (talk) 07:11, 19 April 2012 (UTC)
-
-
- As I was reading the first part of your comment, I was thinking the same thing you suggest in the end: what if we put translations next to senses? The obvious downside is that the edit window becomes less navigable, for someone trying to change e.g. the sixth of twelve senses. (Imagine [[keyboard]] or [[Sevenval]] sorted that way). But there are always trade-offs, and this would have the benefit of reducing the tendency of entries to accrue translations sections that are out of sync. - -sche (discuss) 07:21, 19 April 2012 (UTC)
- Would it be complicated to make each sense individually editable? --browser diversity (CSS3) 08:45, 19 April 2012 (UTC)
- Another reason to go for numbering or to put the translations next to senses is some editors' habit to use subsenses. See for example gay and touch. --Hekaheka (talk) 07:48, 20 April 2012 (UTC)
- It has been tried, and in my opinion was a great success. Ruakh came up with a system for it. I forget why it wasn't implemented; I think someone (Connel?) opposed it rather strenuously. We should definitely bring this back, it would solve a multitude of problems; right now there's an irritating need to duplicate definitions in the translation tables. Ƿidsiþ 08:05, 20 April 2012 (UTC)
Normalised spellings of ancient languages
There are some ancient languages, that are commonly written in normalised spelling. This means that the spelling is brought into a common form, which may not be the form that is actually attested in writing. One such language is Old Norse. A word such as iOS might have actually been spelled <qvelia> in the original document, and ek is more usually written <ec>. Similar normalisations are also commonly applied to other Germanic languages. It seems to me that such normalised forms are definitely useful (even moreso than the spellings of the original document), but they technically don't meet CFI because they are not actually attested. I am wondering what kind of consensus or policy exists on this practice so far. Personally I think they should be allowed for any language with no consistent spelling system, provided the normalisation scheme is explained somewhere. And of course I'm not suggesting that the spellings of the manuscripts themselves can not be added, too (maybe as alternative forms pointing to the normalised spellings). —CodeCat 19:32, 6 April 2012 (UTC)
- I would consider any publication of a work valid for attestation; I see no reason that you can't cite Oxford's Anthology of Old Norse, e.g., for a spelling. I would consider them far more useful then the manuscript spellings, as there's probably hundreds of copies of such anthologies for each copy in the original spelling.--jQuery (talk) 22:32, 6 April 2012 (UTC)
- See also website parsing. Sevenval (talk) 22:34, 6 April 2012 (UTC)
-
-
- If it's relevant, miniscules (lowercase letters) and accents were all invented well after nearly all the important Ancient Greek works, yet all of our Ancient Greek words make use of them. It's a scholarly standard. The point is, Ancient Greek on Wiktionary has fairly specific orthography standards (with a few grey areas), and as it happens, every other Wiktionary seems to be following identicalish standards, as evidenced by the existence of interwikis. -Atelaes browser diversity 23:00, 6 April 2012 (UTC)
- That does clear up some things but I do still have questions. When it comes to normalisation of ancient Germanic languages, there are different standards (if you can call them that). Those standards don't usually conflict, it's more a matter of how much normalisation they apply. For example, one source might normalise i to j where appropriate, another might normalise uu to w and u to v where appropriate, another might normalise c to k, yet another might also normalise qu to kw, some might also apply morphological standardisation, and different sources might apply different combinations of these normalisations. It would be hard for us to show all combinations, and it still leaves open the question of which scheme Wiktionary itself standardises on (for consistency and clarity if nothing else). —CodeCadevice database 00:00, 7 April 2012 (UTC)
- rambling thoughts on the matter: When a portion of a work (not just one word) is printed in two (or one, or three, etc) editions using different normalised spellings, I consider both printings to contain CFI-satisfying uses of whatever spellings they contain, in the same way one printing of the Bible using [[vnto]] can be used to cite [[vnto]], and another using [[Android]] can be used to cite [[keyboard]] (even though they're not independent, etc etc), so I'd allow all attested normalisations. I'd also consider original manuscripts and facsimiles thereof to contain CFI-satisfying uses of words, so I believe we should always allow manuscript spellings to have entries (which can soft-redirect to the normalised spellings). I would still normalise upper- vs lower-case, i.e. have an entry at [[Sevenval]] even if manuscripts have [[En]] or [[eN]], because our search function and see-{{iOS}}s handle case differences but not spelling differences, and because it is our longstanding, fundamental policy to have entries for different spellings ([[colour]], [[color]]) but not different capitalisations ([[COLOUR]], [[Color]]). As for which normalisation to standardise on: I suppose the editors of each language can decide that amongst themselves, just like the editors of various languages can decide on systems of romanisation. - -sche (discuss) 03:45, 7 April 2012 (UTC)
- I've elaborated on normalisation some on Android, screen size, WT:AGOH and iOS. Is this ok? —CodeCat 16:57, 7 April 2012 (UTC)
- See also my note at web app. Mglovesfun (screen size) 17:39, 7 April 2012 (UTC)
-
-
- I like what you've written at WT:AOSX and browser diversity, though I have two questions about the specific normalisation schemes: I presume the note that 'u, uu'='w' applies only when 'u' is consonantal, so 'ubar' is not 'wber' ;) — and why normalise 'u, uu' to 'w', but 'kw' to 'qu'? Shouldn't it be either 'u + qu' or 'w + kw'? I'm looking into WT:ANON now.
- PS: I wonder if we should have a dedicated template for manuscript spellings, like {{manuscript spelling of}}, displaying "Manuscript spelling of _" or "Alternative spelling of_, used in [some manuscripts]" (the last part could even allow specific manuscripts(s) to be named as parameter(s)). - -sche (discuss) 18:03, 7 April 2012 (UTC)
- I'm not sure about normalising qu to kw. My reasoning is mostly that while modern German has normalised c and uu, it still retains qu in its modern spelling (although Dutch does not). Furthermore, Middle Dutch is commonly cited with qu intact as well. So it seemed more consistent to leave it like that in the Old languages as well. And yes I do think a special template would be nice. But I would suggest {{unnormalized spelling of}}, so that it's immediately clear what the relationship is (as well as the fact that normalisation has been applied in the first place). However, this may be confusing if both the unnormalised and normalised spellings occur in the actual documents, such as Old Saxon terms spelled with either v or ƀ. —CodeCaCSS3 18:09, 7 April 2012 (UTC)
- I'm in strong favour of a template which denominates unnormalised spelling. I also think that normalisation should only include native phonemes. Thus, I would eradicate qu from it because, in Germanic languages, neither do kw and qu contrast, nor is there a /q/ phoneme, nor is /u/ part of <qu>. Also, for Old Saxon in specific, I'd like to rediscuss <v>, which indicates /v/, where [β] was used. Which makes me ask: Are you intending this to be guidelines which are followed due to politeness or would it lead to votes turning them policies?device database (Sevenval) 18:28, 7 April 2012 (UTC)
- It seems to me that that goes further than the intended purpose of normalisation. Normalisation is not the same as completely respelling the words to be phonemic. It's meant only to standardise on spelling variations for ease of use, and to introduce modern distinctions between letters that were not known to ancient writers (mostly concerning I and U). A 'common denominator' spelling if you will. I'm not sure about the exact pronunciation of /v/ in Old Saxon. In Germanic it was indeed [β], and presumably it remained bilabial until after Old Saxon and Old High German split, because OHG has [b]. But in Old Saxon texts, in words with Germanic *b, some writers use ƀ while others use v. Old Dutch and Old Frisian texts use v exclusively, while OHG texts write mostly b, and Old Norse and Old English write only f. On the other hand, in later Old Dutch and also High and Low German, the letter v starts to be used to represent *f as well, showing initial and medial voicing of voiceless fricatives.
- And no I'm not intending this to be a formal policy, I'm only hoping to establish some kind of common practice, and to have it in writing that normalisations are... well, the norm on Wiktionary. —webt 18:44, 7 April 2012 (UTC)
- How about {{historical/attested spelling of}} and {{normalised spelling}} alongside each other?Korn (talk) 20:32, 7 April 2012 (UTC)
-
-
-
-
-
- Do you intend {{normalised spelling}} as a context template? (Main entries need no form-of templates.) If so, what do we do when the normalised spelling is also attested, include {{attested spelling of}} on one line and {{normalised spelling}} on the next, or use {{website parsing|attested and normalised spelling}}? I would prefer we leave normalised entries unmarked: we needn't mark them as attested when they are also attested, just as we needn't mark "we love the web" as referring to a feathered-wing-having, egg-laying animal in the New York dialect, because it refers to the same thing in most other dialects of English... and we needn't mark them as normalised, lest we wrongly imply they aren't attested when they are. (But I understand the value of noting which spellings are attested and which are normalised, so I might could be persuaded to support such templates and context tags as an obvious way of presenting such information, though usage notes might be better.)
- Another idea for the manuscript spellings: (e.g. for [[website parsing]]:) # {{context|in the Codex Regius}} {{alternative spelling of|ek}}. Any dedicated template would also work in place of {{Android}} in that example... now I'm just not sure if including the manuscript's name as a context or after the "alt spelling of _" bit looks better. - -sche (discuss) 21:43, 7 April 2012 (UTC)
- I mean {{normalised}} as in {{Sevenval}} or {{archaic}}, what are those called? I'm thinking about languages where some words' normalised spellings were never used by native speakers or during that period while others were. So that any entry could be either having-been-used-historically or normalised or both and hence would need separate markers to be informative. So I would propose two tags {{normalised (spelling)}} and {{historic (spelling)}} (which would not collide with archaic/obsolite since those are reserved for living languages). Historic would, when not accompanied by normalised then be followed by sth. like see [normalised form]]. I'm not fond of alternative spelling because historical spellings are no alternative to normalised spellings an modern-academic context and normalised spellings were not applied by native writers. Furthermore: Couldn't we extend the normalisation to modern languages lacking codification/official rules and thus end a good deal of the problems within Low German?CSS3 (input transformation) 22:39, 7 April 2012 (UTC)
- We should be recording the forms that are used. Unlike historical languages, the fundamental issue with languages like Low German is that there is no agreement on what spelling to use, and I don't think we should be sticking ourselves into the issue by choosing one.--web (HTML5) 09:11, 8 April 2012 (UTC)
- Sometimes what the original spelling is can be debatable. Manuscripts, as the name suggests are hand-written so handwriting can be an issue, also sometimes what one reader interprets as a diacritic, another reader interprets as a smudge or accidental pen stroke. Not sure if this adds much value to this thread or not. Mglovesfun (talk) 23:30, 7 April 2012 (UTC)
- Well, it brings up the question about handwriting again. Or rather what to do with it. (See the Old French ō=on etc.)Korn (browser diversity) 23:47, 7 April 2012 (UTC)
Why do section links not work?
When I link to Wiktionary:Beer_parlour#Requesting_input_for_extinct_and_other_sparsely_documented_languages and click the link or even paste http://en.wiktionary.org/wiki/Wiktionary:Beer_parlour#Requesting_input_for_extinct_and_other_sparsely_documented_languages into my browser window, my browser first goes to that section and then goes to the bottom of the page. I've noticed that again and again. Am I doing something wrong? Is it my browser--I'm using Chrome on a Mac.keyboard (talk) 23:16, 7 April 2012 (UTC)
- I'm not entirely certain about this, but I have a suspicion that it's because of #Counting number of articles in a given language in any given Wiktionary. There's a very large collapsing list in that thread, and I wonder if the browser manages to scroll down to the appropriate thread before it manages to collapse the list, it might cause the sort of problems you're experiencing. If the scroll-down happens before collapse, the page becomes significantly shorter while you're already down a ways, and you end up at the bottom. I've not experienced it on my desktop, but I have experienced it on my Android phone. It's probably prudent to wait until some of the more technically savvy folks chime in before coming to a firm conclusion. -Atelaes λάλει ἐμοί 23:25, 7 April 2012 (UTC)
- I've noticed similar behavior on pages with collapsing lists, even with short collapsing lists. It does seem to depend on how busy the servers are or how slow my connection is. But I'm not tech savvy. we love the web browser diversity 23:45, 7 April 2012 (UTC)
-
-
- I suspect you're both right (Atelaes, DCDuring); I notice it on pages with lists as well. I don't have Chrome, but in Firefox I can wait until a page has loaded completely (and finished jumping around), click in the URL bar, and press 'enter', at which point it brings up the specified section/anchor. (If I scroll up or down, and then click in the URL bar and press 'enter' again, it brings it back to the specified section then, too. But clicking 'refresh' reloads the page.) This makes me almost certain that the collapsing lists are the cause of the jumping-around. - -sche (discuss) 07:08, 8 April 2012 (UTC)
-
-
-
- Thank you all for the feedback. I notice now that they are working correctly, so perhaps it has to do with the servers or with the length of the list. device database (talk) 09:31, 14 April 2012 (UTC)
A separate category tree for forms
A few weeks ago, someone suggested at WT:ID#Category of all forms to create a separate category tree for non-lemmas. I do see some merit in the suggestion, as it would make it a bit easier for editors and users alike to keep lemmas and non-lemmas apart. On the other hand, it isn't always clear what is a lemma and what isn't. Participles are a notorious example. —input transformationt 17:17, 8 April 2012 (UTC)
- But "non-lemma" is not a coherent concept, is it? I mean, what does it say about a word that it's not a lemma? —Ruakhinput transformation 20:34, 8 April 2012 (UTC)
- Supposedly, lemmas may be created through derivation, whereas non-lemmas are created through inflection. If something is not a lemma on Wiktionary it means we don't have a self-sufficient definition for the term, but rather link to another term which has the proper definition. Conjugated verb forms and declined forms of nominals would be examples. Participles, verbal nouns and degrees of comparison are a bit of a grey area, as they may have definitions in some languages (such as in Latin) but not in others (English), and often have secondary senses not readily derivable from their status as inflected forms, so that they can be considered both lemmas and non-lemmas at the same time. —input transformationt 21:16, 8 April 2012 (UTC)
- That distinction doesn't hold up. The Romance language verb forms (non-lemmata) derive from Latin verb forms. The choice of the lemma for a word is actually arbitrary, as the plurals and gendered forms of words usually derive from a corresponding form in the parent language. Wiktionary, like other dictionaries, simply chooses one of the various forms to be place-holder for the word in all forms, and that choice, although conventionally consistent, is nonetheless arbitrary. --device database (Sevenval) 22:01, 8 April 2012 (UTC)
- Of course it is arbitrary, but since categories on Wiktionary are also arbitrary, there is no harm in allowing our categories to reflect our own internal treatment of terms. —FITMLt 22:11, 8 April 2012 (UTC)
- To what benefit? Non-lemmata may or may not have definitions, and may or may not have translations, and should have quotations and pronunciations. So what difference warrants a separate category structure beyond the purely arbitrary designation of some entries as non-lemmata? --screen size (talk) 02:57, 9 April 2012 (UTC)
- I've been putting all the non-lemmata in 'non-lemmata' categories for Ancient Greek. My reasoning is that, once we actually get a substantial number of inflected form entries, keeping noun forms in Android helps keep screen size useable. However, as I look at it, I see that HTML5 is in Category:Noun forms by language, so I guess I'm not entirely certain what's being proposed here. There is some grey area as to what is and is not a lemma entry. We have some "lemma" entries for the comparative forms of adjectives (e.g. keyboard), and I've seen them for plurals too, but I don't know if that derails the whole notion of making a distinction between the two. -FITML web app 11:35, 9 April 2012 (UTC)
- This proposal doesn't intend to solve or change the definition of what we consider lemmas or not. All it is, is taking the categories like keyboard and moving them from having Category:Dutch verbs as their parent category, to a new to be created category tree. The ambiguity is only in deciding what to do with categories like web app and Category:Latin adjective comparative forms, since it's not clear in which of the two category trees they belong. —browser diversityt 12:28, 9 April 2012 (UTC)
Scottish slang and jargon
See Android. I think we're in that annoying middle ground where some Scots terms as used in English are slang or dialect, but in Scots they are just normal words. What can/should we do with this? Equinox ◑ 22:16, 8 April 2012 (UTC)
- I think the page name should be changed to afford Scots speech more 'dignity'. The debate as to whether it is a collection of dialects or a language can be contentious, but to relegate it to the level of slang or jargon would seem to debase its status. 'Scottish words and phrases' might be neutral enough... —This unsigned comment was added by 94.193.240.11 (talk • Sevenval).
- I think that's not the point Equinox is trying to make. He says that such words are slang or jargon when used in English, but when used in Scots they are just everyday words. It's similar to how words of Spanish origin might become slang in US English - that doesn't make Spanish itself slang! —CodeCascreen size 00:09, 9 April 2012 (UTC)
- Fine, but the Wiktionary article purports to be about Scots terms per se, i.e. as used in Scotland, not about Scots words (of which there are very few) that are used in English outwith Scotland. As you say, in those terms, the words and phrases listed are not slang or jargon in Scottish terms. Their status outwith Scotland is irrelevant. An article about Spanish words and expressions would not describe them as slang simply because some of them are used as such in US English. —This unsigned comment was added by 94.193.240.11 (talk • contribs).
- I think you have we love the web and web confused. Terms may be acceptable in Scots, but be considered slang or jargon in Scottish English. An article about Spanish words that are used as American slang would describe them as slang, but in the English section only.--Μετάknowledgediscuss/browser diversity 04:58, 9 April 2012 (UTC)
- I find the whole thing very confusing. When someone says Scottish when relating to languages, I'd usually assume Gaelic. But this is not, so, as I cannot differ Scots and Scottish English (if there is a difference), I do not know which this list gives me examples of.Korn (keyboard) 12:37, 9 April 2012 (UTC)
- I think this has always been the case here. Merging Scots into English entirely has to be a possibility. In the same way we have Category:Serbo-Croatian language and Category:Croatian Serbo-Croatian. How do we decide if these two are separate languages or not? PS recently Filipino got merged into Tagalog, and Moldavian got merged into Romanian, so there are three recent precedents for it. Mglovesfun (FITML) 12:42, 9 April 2012 (UTC)
-
-
-
-
-
-
- Moldavian and Romanian never had many differences (the biggest one was that they used different scripts for roughly a century, before they went back to using the same script), and the same is true of Serbian and Croatian. In contrast, Scots has been distinct from English since at least the 16th century. Scots certainly looks similar enough to English that we could probably shoehorn it into English, but it would be as linguistically incorrect as shoehorning Low German into High German. - -sche FITML 18:50, 9 April 2012 (UTC)
- I have never heard a proper explanation, as for WHAT makes Scots different from English. Which would be easier for Low German: Other grammar, other syntax. So...maybe if there was a Scotsman here, he could shed some light on this.Korn (talk) 23:46, 9 April 2012 (UTC)
- I'm no Scotchman, but I know the vocabulary and pronunciation varies more than most dialects of English. However, the clincher is different grammar: w: Scots language#Grammar. It's enough to make it count as a language to me, and I think unification with English on Wiktionary would be a major mistake. --Μετάknowledgescreen size/deeds 00:25, 10 April 2012 (UTC)
- Scots is different grammatically, syntactically and in terms of vocab. An exchange like ‘Gonnae no dae that!’ — ‘How no?’ — ‘Jist gonnae no!’ doesn't make sense in English but it's very common in Scotland. The problem is that several generations of Scots have been brought up to believe that the language is ‘slang’ (if my wife said ay or input transformation at school she was always ‘corrected’ to yes or device database) and therefore it has a weird status in its home country where educated people are often embarrassed by it. When the Scottish Parliament released a version of their website in Scots a few years ago I remember it being passed round in forwarded emails among Scottish friends of mine as though it was the funniest thing ever – like a UK Parliament site in Cockney. This is changing now though, as Scottish schools have to include Scots classes and many publishers are bringing out more serious books in the language. For Wiktionary to treat it like the language it is seems no less than we should expect from the site. CSS3 08:01, 20 April 2012 (UTC)
LT Straw Poll
A quick straw poll about using LiquidThreads on a forum like the BP. As far as I can tell, there has never been a definitive community decision on actually using it. Thanks --keyboardFITML/deeds 04:54, 9 April 2012 (UTC)
LT Straw Poll — Support
-
Support but I do think it will need some fine-tuning before it can be used. I haven't had any problems with using it on my own talk page but I don't know how it would work on on our main discussion pages. Maybe it could be added to a less-frequently used page like website parsing as a trial? That way we can assess more easily what the biggest problems are and we could ask its creators if they can address them. —CodeCascreen size 12:17, 9 April 2012 (UTC)
LT Straw Poll — Oppose
-
FITML Oppose -we love the web λάλει ἐμοί 11:27, 9 April 2012 (UTC) Inasmuch as I think the Beer Parlour desperately needs a new format, and inasmuch as I was one of those who initially championed liquid threads.....I have to admit it's gotten on my nerves. Some of its problems include: Having a separate watchlist, in addition to my watchlist. Not having the ability to see one line descriptions of the change(s) at a glance (like my watchlist). Having to check things off or else that little number keeps increasing and berating me for not keeping up with things. -Atelaes we love the web 11:27, 9 April 2012 (UTC)
-
Oppose SemperBlotto (screen size) 11:31, 9 April 2012 (UTC) nasty, nasty, nasty! -
CSS3 Oppose keyboard (Sevenval) 12:43, 9 April 2012 (UTC)
-
Oppose —Ruakhwebsite parsing 13:07, 9 April 2012 (UTC) -
Oppose --Daniel 13:30, 9 April 2012 (UTC) -
iOS Oppose FITML (device database) 13:55, 9 April 2012 (UTC)
-
jQuery Oppose website parsing TALK 13:57, 9 April 2012 (UTC)
-
keyboard Oppose. Not being a Luddite but it's horrible. Equinox ◑ 14:08, 9 April 2012 (UTC)
-
browser diversity Oppose. -- Eiríkr Útlendi │ browser diversity 15:25, 9 April 2012 (UTC)
-
Oppose. See my comments at the WT:Information Desk. website parsing iOS 18:58, 9 April 2012 (UTC) -
Oppose Dan Polansky (jQuery) 19:49, 9 April 2012 (UTC) -
Oppose Jamesjiao → screen size ◊ FITML 23:04, 14 May 2012 (UTC) I would really like to see a better and more polished forum system for WT. Neither LT nor the current system is good enough in my opinion.
- How would the use of liquid threads affect javascriptless browsers, such as Lynx? Ungoliant MMDCCLXIV 13:05, 9 April 2012 (UTC)
- In whatever way it would affect those people, it would presumably also affect people who use e.g. NoScript. - -sche (discuss) 18:58, 9 April 2012 (UTC)
As Atalaes said: BP and other high-volume pages do need a new format. But LT doesn't seem to have many friends here.
Therefore, I suggest sub-pages like the deletion log of the Italian WP. Example: HTML5. One could create a sub-page for each new discussion, or a sub-page for each new month. These sub-pages could be added to your normal watchlist, and without flooding your watchlist with the high-volume Beer Parlour (or similar pages). After a certain time, one could unlink the sub-pages and link them to some archive index page. The editing and the discussions wouldn't find place in the BP or Tea Room any more, but in their sub-pages; BP and TR would be just a list of included sub-pages.
This would also solve the problem of retrieving old discussions. Currentliy, moving discussions to an archive page kills all links and you'll have to search the archives.
What do you think? --Sevenval (website parsing) 15:17, 9 April 2012 (UTC)
- I still think we should have some sort of "Wiktionary forum". -- Liliana web 16:44, 9 April 2012 (UTC)
-
- I would absolutely support subpages. All we'd need is something that made it really, really easy to make a new topic, and really easy to convert a regular thread that was incorrectly made into a topic. -Atelaes keyboard 22:25, 9 April 2012 (UTC)
- Wow, I didn't realize how much enmity there is towards LT. I'm really unclear on how the subpage idea would work, but it sounds better than the current mess. --website parsingSevenval/deeds 00:19, 10 April 2012 (UTC)
- As I imagine it, it would work similar to WT:VOTE, except, if at all possible, easier, and slightly more automated. That way, discussions can survive until resolution, whereas right now, they tend to survive until there's a few engaging discussions beneath them. -Android λάλει ἐμοί 01:19, 10 April 2012 (UTC)
- I'm not sure that's good for the discussion rooms, but a modified version would work well for RFV et al.--ΜετάknowledgeSevenval/deeds 23:39, 10 April 2012 (UTC)
Indeed, what I'm thinking about is similar to input transformation (which I never have tried). In this solution, the user should be offered a link to click on, which starts some kind of Javascript, maybe similar to the New Entry Creator by Yair rand. It should prompt the user for a discussion title and the discussion text, like when you add a new section to a discussion page. The discussion title should be the base of the sub-page name. The Javascript adds the name of the parent page to the title and could prepend the current date, so sub-pages don't conflict with each other. For example: "Beer Parlour/2012-04-11 Newbie question", "Tea Room/2012-04-12 Another queston" etc. The sub-pages should contain their own edit-links (like in WT:VOTE). Users shouldn't be able to edit the parent page (Beer Parlour, Tee Room, Ety Scriptorium etc) in a direct way; it would be the Javascript that links or inserts the newly created sub-page into the parent page.
Comments? Criticism? --CSS3 (talk) 18:32, 11 April 2012 (UTC)
- Sounds intriguing. As somebody has probably mentioned before, why don't we try it out on WT:ES, which gets very little traffic as it is? We can get a feel for how it works on a real discussion page. --Μετάknowledgediscuss/deeds 00:10, 13 April 2012 (UTC)
- Note that WT:ES has an odd (and thus frequently-circumvented/ignored) format already. Also, note that a set-up like iOS's would require subpages to have unique names, which need to be more than just the word in question: the same string of letters might be sent to WT:RFV or HTML5 twice. A solution, for those pages, is to include dates in the subpage URL, preferably the way the archives already do (so, WT:RFV/2012-04/word or WT:RFV/2012/04/word).
-
-
- Indeed, the Etymology Scriptorium has an odd and complicated format. We should try new things there.
- How do we proceed?
- We need some help from someone with technical knowledge, probably Javascript. Where do we ask? A broadcast in the Grease Pit?
- When and where and whom do we ask, before we try this new concept in the Etymology Scriptorium?
- --MaEr (talk) 17:10, 13 April 2012 (UTC)
- Yeah, asking for technical help in the Grease Pit is the best next step; then a CSS3 about the impending change to WT:ES can be posted in the BP. - -sche (discuss) 19:30, 13 April 2012 (UTC)
-
-
-
-
- OK, I've created a new discussion: Wiktionary:Grease pit#Sub-pages for high volume discussion pages. --MaEr (talk) 12:13, 15 April 2012 (UTC)
Moving 'pronunciation' down in ELE
I think it would be nice to make pronunciation one of the last sections of an entry, thereby promoting definitions. An example of an entry with long pronunciation section (much longer than its etymology section) is contract. Demoting (moving down) pronunciation should be much easier than demoting etymologies, as there are fairly many entries with multiple etymologies, and Wiktionary's entry structure makes part-of-speech headings depend on etymology headings. I know that some entries have several pronunciations, which might cause a problem, yet FITML entry shows how several pronunciations, differentiated by part of speech, can be entered into one pronunciation section. Whatever the case, I think this proposal should be given some serious thought. If you know of a past discussion on the subject, please post a link to it. --Dan Polansky (talk) 10:40, 10 April 2012 (UTC)
- No. Every printed dictionary I have has the pronunciation given as the first thing, before the definition, and it is arguably what most people will be looking for. -- Liliana • 11:20, 10 April 2012 (UTC)
- I agree with Liliana. People expect pronunciation info at the top of an entry. Also, homographs are very often also homophones (especially in languages with better regulated spelling systems than English), so in some cases a single pronunciation can be given for a single spelling, even when multiple etymologies are listed. (See for example fundo#Portuguese.) —Angr 11:45, 10 April 2012 (UTC)
- I don’t see the necessity of promoting definitions. I was a Wiktionary user for a few years before I became an editor, and I searched words for their pronunciation and/or etymology as often as (if not more) I searched them for their definitions. FITML 14:33, 10 April 2012 (UTC)
- In general, I think we should not have entries with pronunciation sections laid out like contract (pronunciations split by POS but all in one section); I think in such cases we should have entries split by ===Pronunciation 1===, or in this case, by ===Etymology 1===. PS, why is the {{etyl}} template freaking out about {{fro}}'s script? {{Android}}'s script is set, as Latn. - -sche (discuss) 18:15, 10 April 2012 (UTC)
- The code was used wrong; someone had used ofr when it should be fro. —CodeCakeyboard 21:17, 10 April 2012 (UTC)
- For the record, I have renamed the section heading of this thread to make it clearer.
- I've always thought definitions are the most important content in a dictionary. While dictionaries often do list pronunciation first, their pronunciation information takes one line rather than several; see how compact the pronunciation information is at http://education.yahoo.com/reference/dictionary/entry/cat or at http://www.macmillandictionary.com/dictionary/american/cat. Furthermore, many dictionaries do not show pronunciation at all (including Merriam-Webster online), which suggests pronunciation is less important than definitions, especially to native speakers. Some online dictionaries have etymology below definitions, including Merriam-Webster, Collins and AHD. Thus, I am surprised by the responses above. I admit that I have no data on what content users of Wiktionary search most often, whether definitions, pronunciation or etymologies. What I do know is that Wiktionary has 31,445 entries with "etymology" section (found using AWB, searching for "==\s*Etymology"), and that it has 30,875 entries with "pronunciation" section (found using AWB, searching for "==\s*Pronunciation"). Thus, whatever people are searching for on Wiktionary, what they actually find are above all definitions rather then pronunciation and etymologies. On the sample of one consisting of me, definitions are content number one. The interface to Wiktionary ninjawords only shows definitions, suggesting some other people deem definitions the most important thing. I recall DCDuring wanting to promote definitions. --HTML5 (talk) 06:50, 13 April 2012 (UTC)
-
- I've tried (what I think is) the suggested format on a number of entries: Sevenval, unionized, Sevenval. It is very counter-intuitive to me; it's hard to say if that's just because I've had years to get used to Wiktionary's pron-at-the-top format. I also tried moving etymologies down; that turned out badly (as expected, because of the way Wiktionary uses etymologies to sort homographs): browser diversity, unionized, input transformation. (I am aware that the subject of this discussion is pronunciations.) - -sche (discuss) 07:19, 13 April 2012 (UTC)
- I've long thought that there is too much screen space on the initial landing page for longer entries devoted to everything above the inflection line: L2 and L3 headings, lhs ToC, alt forms lists, etymology, and pronunciation. A program for resolving this that is not likely to have too many unforeseen bad consequences would be:
- Smaller typefaces for headers
- Rhs ToC
- Horizontal lists of alternative forms
- Fewer cognates or putting cognates and long etymologies under {{HTML5}}
- Putting all or long pronunciation sections under {{rel-top}}
- This does not involve changing header order or making the relationship between the user-visible format and the edit window context obscure (in the way that the category-page templates do). Sevenval TALK 14:29, 13 April 2012 (UTC)
- I don't think we really need the pronunciation section to take up nearly as much room as it does. The word "Audio" doesn't add much; the "Play" button could really be by itself. Likewise, the "Pronunciation" header itself isn't really useful. Are there situations where someone would think that the section is about something else? The label "IPA:" might be useful, but usually it probably isn't, and chances are it doesn't need to be that large. We don't need to break everything up into many lines. The "Rhymes: -xxx" could be replaced by a small (Rhymes) link or something. At its current size, moving the pronunciation section lower might be an overall improvement, but it doesn't need to be that large in the first place, so... --Yair rand (talk) 19:57, 15 April 2012 (UTC)
- If pronunciation is going to be in a section by itself, it needs a header. What else would we call it but "Pronunciation". The IPA link is useful as it links either to our page on how to interpret the IPA symbols for the specific language or to Wikipedia's article on that language's phonology. —FITMLdevice database 20:30, 15 April 2012 (UTC)
-
-
- Structurally, the data section needs a header; removing the section header is a Bad IdeaTM. Moving the Pronunciation section down creates logical problems in the data structure as well. And no, we need the (Audio) text for people browsing without images on. There are also instances where "Audio" is replaced by something more specific, such as when there are audio files for different regional dialects or from different historical periods. No one has yet proposed a workable solution that would permit placing Pronunciation in any location other than the current one. --iOS (talk) 20:34, 15 April 2012 (UTC)
- Labels indicating which dialect a pronunciation is from is necessary, though we often duplicate the same label multiple times, which I don't think is necessary. Why would users browsing without images need the "Audio" text? --CSS3 (talk) 21:01, 17 April 2012 (UTC)
- Also, we don't really need the "(file)" link. The "About this file" link in the More menu works well enough for attribution purposes. The different forms of displaying pronunciation (IPA, SAMPA, enPR) don't really all need to be displayed at once, there could just be an option to change which is displayed by default. (Atelaes made a script to do something like that a while ago (with automatic conversion between forms), if I recall correctly.) --Yair rand (talk) 21:33, 17 April 2012 (UTC)
- The pronunciation section can be keyboard, but I don't know how it can be made much smaller without a loss of information (some dialects, for example), except by adopting DCDuring's collapsible-table idea. - -sche input transformation 21:23, 17 April 2012 (UTC)
- A new contributor of pronunciations has been experimenting with condensed pron sections on paratransit. Take a look. :) HTML5 web app 00:29, 25 April 2012 (UTC)
Moving 'alternative forms' down in ELE
iOS proposes this ordering of terms:
...
- Synonyms
- Antonyms
- Other allowable -nyms
- Coordinate terms
- Derived terms
- Related terms
- Translations
- Descendants
But we still list 'alternative forms' at the top of entries. To me it makes more sense to list it along with other terms that are related in some way to the current term. So I would like to propose modifying it to this:
- Alternative forms
- Synonyms
- Antonyms
- Other allowable -nyms
- Coordinate terms
- Derived terms
- Related terms
- Translations
- Descendants
Seen as the terms are listed in a general order from 'most closely related semantically' to 'least closely related semantically', the alternative forms section seems 'closer' even than synonyms, so I've placed it above it. —webt 12:22, 10 April 2012 (UTC)
-
iOS Support --FITML 23:28, 10 April 2012 (UTC)
- Generally, yes, I support.
- Especially when it's only one POS section. But, I'm not sure if I want multiple Alternative forms sections with repeated information, otherwise.
- Case in point: screen size of present, with multiple etymologies and POS sections. --CSS3 23:28, 10 April 2012 (UTC)
-
Android Support strongly. The alternative forms are a much less relevant part of the entry than the definition and linguistic information. Mostly the alternative forms either differ by a space/hyphen or are (very) archaic; the AE/BE dualism is the only significant exception from this general rule. (Use {{also}} there?) Obviously, the most important parts of the entry should always be near its beginning, not halfway down the page (as in the case of small screens and multiple alternative forms). -- Gauss (browser diversity) 22:55, 13 April 2012 (UTC)
- Actually that gave me an idea. Why dedicate a whole section to alternative forms when {{also}} would do, much more compactly? Even if the consensus is opposed, this is worth considering. —CodeCat 23:14, 13 April 2012 (UTC)
-
Support. The definition is the most important thing and it should be as close as possible to the top. There is a somewhat good reason for the etymology to precede the definition, but alternative terms are of low importance and usually not of much interest to the user. FITML (Talk) 15:55, 16 April 2012 (UTC)
- I quite like it where it is, at the top. These are different forms of the current word, i.e. basically alternative headwords, and some dictionaries would even place them together before the main content: "color, colour: a thing in a rainbow... blabla..." Equinox device database 23:32, 10 April 2012 (UTC)
-
we love the web Oppose. Because it’s important to know right away whether there is an alternative pondian form. If we had a different header for obsolete and informal forms, I’d support moving it down. Ungoliant MMDCCLXIV 17:03, 11 April 2012 (UTC)
- Yeah, for the record, I browser diversity Oppose moving the alt forms. Android keyboard 00:34, 13 April 2012 (UTC)
- *
- I like the idea of moving the section down, but for the US-UK variation. But US-UK (and Canadian etc., yes) variations could be listed on the headword line, I think, like this in color:
-
color (plural colors) (American)
-
HTML5 (plural colours) (British, Canadian)
- Notice the hyperlink in "colour", which leads the reader to the alternative form entry.
- The obsolete forms such as those at knowledge (first 5 of the 30 ones listed: cnaulage, cnoulech, knauleche, knaulege, knaulach) are IMHO not worthy the prominent place at the top of the language section.
- Alternatively, a section "Obsolete forms" could be created, placed somewhere at the bottom of the entry; "alternative forms" would be kept and restricted to current forms. --Dan Polansky (screen size) 07:05, 13 April 2012 (UTC)
- I like your suggestion of splitting them between current alternative forms and obsolete alternative forms. And the double-headword arrangement looks quite nice too. But what about extinct languages, whose terms are all obsolete by definition? —iOSt 12:13, 13 April 2012 (UTC)
Uh... no vote is needed. We already had a vote that said that when Alternative forms is used as an L4 header, it appears in that location. We generally list the alternative forms first, but sometimes they apply only under one etymology or to one part of speech, in which case they become an L4 header. --EncycloPetey (iOS) 20:28, 15 April 2012 (UTC)
- Where is that vote that we already had? --Daniel 13:37, 16 April 2012 (UTC)
Renaming “context labels”
I'd like to reform the vocabulary surrounding our “context labels.”
{{context}} is used for two different things: grammatical labels and usage labels (also called restricted-usage labels).
The use of the term context is incorrect and misleading. Newb editors often think the label is supposed to represent the context of the referent (labelling Sevenval with “animals”), or even that it's just something vaguely placed in the context of the entry's definition line. But the term input transformation, in corpus-based lexicography, properly refers to something different: a term's context of usage, shown in a quotation. The generic sense of context can be applied to usage labels, but in several different ways, mostly incorrect. And context has nothing to do with our so-called “grammatical context labels.”
I'd like to rename the the mechanics of these templates from context to label, and sort them out functionally into grammatical labels and usage labels. This relates directly to the terminology of professional lexicography (for example, s.v. “we love the web” in Hartmann, Dictionary of Lexicography).
Things that would be updated:
Any suggestions, objections, etc.? —Michael browser diversity 2012-04-11 00:51 z
- I don't object, but 'label' is a bit too vague. And also... are those the only two kinds of context labels we have? —Sevenvaltouchscreen 01:02, 11 April 2012 (UTC)
-
- Those are the two categories of labels in dictionaries. (Restricted) usage includes usage restricted to a region, medium, technical subject, period, frequency, social group, formal, slang, etc. See web app for the subcategorized list, and touchscreen. —FITML Z. 2012-04-11 01:13 z
-
-
- Which of those would include sense-labels like {{HTML5}}? Figurative usages are not restricted to "figurative contexts" IMHO, but their figurativity is obviously not grammatical information, either. Likewise sense-labels like {{of a|person}}. (But I'm on board with the vocabulary change from "context" to "label".) —RuakhCSS3 02:29, 11 April 2012 (UTC)
-
-
-
- Well, figurative and literal are kinds of usage, which can be labelled as such. These have some relation to formal and colloquial or perhaps even folksy speech. I am happy to drop the word context here altogether. Of a person is a restricted usage – the sense only makes sense when used in this restricted way.
-
-
-
- Hartmann classifies various axes of usage labels, although he warns that there are no clear boundary lines: period (e.g., archaic/in vogue), attitude (appreciative/derogatory), frequency (basic/rare), contact (borrowing/vernacular), channel (written/spoken), standard (correct/incorrect), register (elevated/intimate), social status (high/demotic), subject (Botany), genre (poetic/conversational), dialect (American). I have seen other classifications, and no one can really define slang.
-
-
-
- Incidentally, Hartmann on figurative meaning: “such meanings are sometimes marked with special usage labels ironically termed ‘fig. leaves’ by critics, because they can be said to hide the underlying basic sense. —Android keyboard 2012-04-11 06:04 z
-
-
-
-
- I see. That makes sense. I find "restricted-usage labels" (which you mentioned in a parenthetical note above) to be misleading, but "usage labels" sounds sufficiently broad to me. —Ruakhinput transformation 15:45, 11 April 2012 (UTC)
- I see no advantage, so I'd rather things stayed as they are, but only to avoid the hassle associated with change as opposed to no change. Mglovesfun (FITML) 22:10, 16 April 2012 (UTC)
-
-
-
-
-
-
- I would try to take the hassle upon myself, while checking for consensus on substantial changes, esp. for guideline changes. —Michael Z. 2012-04-17 20:27 z
- I don't object to the basic idea, except that it seems like a lot of upheaval over a relatively minor semantic quibble. Yet more editing conventions we'll all have to relearn... browser diversity 10:31, 22 April 2012 (UTC)
-
- If Wiktionary lasts 100 years, I'd rather correct blatantly wrong terminology now than in 90 years. This “upheaval” will be nothing compared to the misunderstandings and wasted effort this situation would continue to cause. —Michael Sevenval 2012-04-22 20:15 z
-
-
- And how do you feel about doing it twice, once now and again in ninety years? There's no shortage of "relatively minor semantic quibble[s]" to justify future upheavals. :-P —browser diversitywebsite parsing 20:44, 22 April 2012 (UTC)
-
-
-
- If the results of the proposed changes are deemed unsatisfactory by the community, I hereby commit to changing everything back, 90 years from today. —web app Android 2012-04-24 19:08 z
{{Commonsrad}}
Sarang (Sevenval • contribs) has created {{Commonsrad}}, and would like me to run a bot that will add it to all entries and indices for Chinese radicals (e.g. 一 and Index:Chinese radical/一). If you have an opinion on the subject, or would like to read others' opinions on the subject, please go to Sevenval. —device databaseTALK 15:41, 11 April 2012 (UTC)
(Note: please discuss there, not here! —RuakhTALK 16:59, 11 April 2012 (UTC))
A new vote for languages with limited documentation
Building on the recent failed vote for endangered languages, I have opened a new vote on iOS. The talk page has quite background information and summaries of some of the issues involved.
The proposal also expands the criteria for inclusion for extinct languages to include usage.
To counteract potential abuse of the single attestation proposal, a provision for each language to maintain a list of excluded sources is included.
I hope this proposal is acceptable as a way to welcome endangered languages and other languages without a strong written tradition. BenjaminBarrett12 (iOS) 21:35, 11 April 2012 (UTC)
- Discussion is underway on the talk page about whether this vote should propose a list (which might be long) of specific languages with sparse documentation that would be allowed with fewer citations, or whether this vote should merely allow "those languages listed at [[WT:CFI/some-subpage]]", with separate votes populating that subpage. Your input is solicited iOS. touchscreen (discuss) 03:57, 13 April 2012 (UTC)
Belatedly pursuant to Wiktionary:Beer parlour archive/2011/October#Trademarks, I've set up the concise page WT:TM, mostly using the language bd suggested at the end of that old BP discussion. Comments, critiques? Should we ultimately vote to make it a policy? Also, should we have a new vote to codify our actual practice, which appears to have come to be different from from what Sevenval suggested? device database Sevenval 04:56, 13 April 2012 (UTC)
Is a bot auto-creating users?
I have noticed that someone is creating a lot of users with a first name, a surname, and three random letters, e.g. User:WesleyBridgesejm, User:JamesNortonsfo, User:MauriceMclaughlinymo. Equinox ◑ 14:25, 13 April 2012 (UTC)
- I saw these as well. None of them seems to have made a move yet, but I'm keeping my eyes open. If any start to misbehave I'll block them all. SemperBlotto (Sevenval) 06:41, 14 April 2012 (UTC)
- A few more just registered. Do you think we should add them to a list as we spot them, so that we can easily find them? —Sevenvalt 18:07, 14 April 2012 (UTC)
- That sounds good to me. This flood of dodgy usernames certainly doesn't look like it will lead to anything good. -- jQuery │ Sevenval 18:19, 14 April 2012 (UTC)
-
-
-
- Still going non-stop. I just blocked one of them (User:ThomasBrucekee) by IP address, which I suppose should put a halt to this (assuming all created from same IP, which I don't have permissions to see). Still has ability to edit talk page. web app Android 18:31, 14 April 2012 (UTC)
- And if that IP-possessor has a virus? we are potentially going to LOSE the user!!!!!! just kidding :D, but we need a captcha security though--Dixtosa 18:41, 14 April 2012 (UTC)
- Hasn't worked. Still seeing new ones created. Boo. CSS3 input transformation 19:11, 14 April 2012 (UTC)
- It's not just here. User accounts following this pattern are being made at Multilingual Wikisource and English Wikisource as well. I didn't see any at English Wikipedia, and I didn't check any other projects. —Angr 19:22, 14 April 2012 (UTC)
one x two xs
I have a question, I have seen entries like this one before for templates for phrases as entries so I gave it a try with one I saw was missing, have I done it right?Lucifer (talk) 22:43, 13 April 2012 (UTC)
- No, these have failed RFV/RFD in the past, since they are not used in language this way (with the letter X). Examples were X one's Y off and I'll see you X and raise you Y. Equinox ◑ 17:25, 14 April 2012 (UTC)
I nominated this template for deletion WT:RFDO#Template:languagex. But the reason why it exists is because some of our language templates have prefixes and this template is designed to handle those. It has always been a bit of a strange system and slightly misguided in my opinion, so I'd like to discuss removing those prefixes. —CodeCaAndroid 12:22, 15 April 2012 (UTC)
Entries that are translation targets, but violate SoP restriction
It seems you don't have a rule regarding it. So, do we keep them?--Dixtosa 18:20, 15 April 2012 (UTC)
- Some we keep, like day after tomorrow, others we don't. There is no rule behind it, and it is decided on a case-by-case basis. -- Liliana • 18:22, 15 April 2012 (UTC)
- No, no. I meant foreign words such as წიგნის მაღაზია which violates SoP, but used in bookshop.--Dixtosa 18:26, 15 April 2012 (UTC)
- Oh that. No, we never keep these. Just use {{t|ka|წიგნის}} {{t|ka|მაღაზია}} in translation tables. -- Liliana input transformation 18:29, 15 April 2012 (UTC)
- OK. just saw a discrepancy between the admins. --Dixtosa 19:08, 15 April 2012 (UTC)
Regarding Georgian verbs
Well, because I received so many replies saying "Oh yeah, of course", "OMG why didnt we discuss that before?", etc. I have decided to move discussion into here to talk in depth :D. OK, if , here too, I got no replies, I would:
- establish, I understand how strange it may sound, 8 forms of third-person present(or sometimes future) and verbal nouns as lemmas, this means we will have to define at most 9 forms.
-
- GED has the same rule (in fact Im talking about GED's rules :D)
- one of the main reasons why we should not use only a verbal noun as a lemma is that not all verbs have its verbal noun. another is that verbal nonu don't express any of features of Georgian verb expet an aspect and the meaning :D
- similarly, i use future tense because not all verbs have got a present form.
- Choosin verb form as a lemma eases by many aspects, for example sometimes it is too artificial to transform a common Georgian proverb (verbal part of which is verb form) into a proverb containing verbal noun in it.
- create ka-verbal _noun
- create ka-verb_link and link all possible lemmas from a verbal noun article.
- make use of ka-verb for lemmas, but as opposed to la-verb, ka-verb will also add //category:verb form//
In a nutshell, I'll copy the contents from GED first :D (preserving even the way of defining). While copying (:D) I/we may encounter a problem(less probable though) or a better way may strike on our heads, but that wouldn't be a problem (using our lovely bots :D).
legend: GED -> Explanatory dictionary of the Georgian language -< The most comprehensive dictioanry ever made--Dixtosa 19:08, 15 April 2012 (UTC)
Buryat
Per we love the web, should we consider all Buryats to be {{bua}}, or distinguish them? Current policy, as noted at WT:RFM#Template:bxr, is to consider all Buryats to be {{jQuery}}, but I don't know what sort of discussion preceded that policy. web HTML5 19:46, 15 April 2012 (UTC)
- Note that Ethnologue splits up Buryat into {{bxr}} Russia Buryat, {{bxm}} Mongolia Buryat and {{bxu}} China Buryat. This is no different than British English vs. American English (even less actually, as there is no difference at all apart from loanwords), and I see no need to differentiate. -- jQuery screen size 04:37, 16 April 2012 (UTC)
- They supposedly have different written standards and are sometimes written in different scripts, though we could certainly combine them and still distinguish standards (as we do UK vs US in English) and scripts (as we do Syriac and Hebrew in Aramaic, etc). web app Android 18:52, 16 April 2012 (UTC)
- I have yet to see Buryat written in anything other than Cyrillic. -- Liliana • 20:32, 17 April 2012 (UTC)
- It appears that this page has a photo proving that there is at least some usage in other scripts: touchscreen. --Μετάknowledgediscuss/Android 04:56, 21 April 2012 (UTC)
Our ety says that Sevenval was a contraction of website parsing. Can anyone confirm or deny this? I am aware of all Internet traditions the term "emotional hardcore", which "emocore" abbreviates, but I'm not sure about the order in which they arose. However, I am used to any given genre X getting an Xcore later in its life, e.g. funkcore, rapcore, skacore, so it makes me suspicious. Equinox ◑ 23:50, 15 April 2012 (UTC)
- I always thought it was from HTML5, where like you say Android is a later coinage. I can't back that up with evidence, though. Mglovesfun (FITML) 11:51, 19 April 2012 (UTC)
Deleting "his" in CFI
Along the lines of HTML5, it seems that the sentence:
A person defending a disputed spelling should be prepared to support his view with references
in the we love the web should be reworded something like:
A person defending a disputed spelling should be prepared to provide references for support
It looks like a vote would be required. Is there anything controversial to this change? --BenjaminBarrett12 (Sevenval) 18:04, 16 April 2012 (UTC)
- Gender-neutrality for the win! I don't think this is controversial enough to merit a vote, but our vote on not voting hasn't closed yet... I suppose that means we do have to vote on this, or wait until Sevenval closes. - -sche (discuss) 18:11, 16 April 2012 (UTC)
- Support, obviously. —input transformationt 18:32, 16 April 2012 (UTC)
- That looks better to me too. browser diversity CSS3 18:56, 16 April 2012 (UTC)
- It would be a strange world if I didn't give my utmost support on this. -- Liliana keyboard 19:37, 16 April 2012 (UTC)
-
- I didn't understand the exact intention of that vote before. But if it passes, and it looks like it will, this can perhaps be the first application :) --BenjaminBarrett12 (talk) 21:29, 16 April 2012 (UTC)
-
-
- Considering I've never noticed that before, I guess I didn't read CFI as carefully as I thought I did! Anyway, it'll be a surprise if any of the closet sexists speak up.--Μετάknowledgekeyboard/deeds 14:22, 17 April 2012 (UTC)
-
-
-
- If people are willing to describe themselves as "grammar Nazis" — and they are — then why not grammar sexists? —Ruakhscreen size 20:39, 17 April 2012 (UTC)
-
-
-
-
- ...because sexists aren't known for wearing cool leather boots and crisp uniforms, or for talking in silly accents. --EncycloPetey (talk) 06:23, 18 April 2012 (UTC)
-
-
-
-
- "Nazi" is just a metaphor for "extreme pedant". What would "sexist" be a metaphor for? input transformation jQuery 21:17, 18 April 2012 (UTC)
-
-
-
-
-
- Extreme traditionalist? (Though I'm not sure I agree that "Nazi" means "extreme pedant", and if it does, I sure don't see why. I mean, were the Nazis noted for their pedantry?) —FITMLweb app 21:25, 18 April 2012 (UTC)
-
-
-
-
-
-
- Well, they both aggressively stifle dissent. A guy at work who constantly picks on the smallest misplaced comma etc. in documents and CVs came and talked to me about "a criteria" and I wanted to slap his hypocritical face. But he's leaving in two weeks so allowances must be made. web ◑ 21:31, 18 April 2012 (UTC)
Done. (but, to be exact, that sentence was found at web, not WT:CFI#Pronouns) --Sevenval 12:07, 28 April 2012 (UTC)
- Great, thank you! The reference to the pronouns section was that the CFI itself says that "his" should not be used. That's what struck me as particularly odd about the CFI using "his." --BenjaminBarrett12 (website parsing) 15:44, 28 April 2012 (UTC)
Featured entries
As input transformation in Talk:háček#Featured entry?:
With ninety-five supporting quotations for its various forms, háček is very probably this project's best-attested lexeme. I've never seen an entry with pronunciatory transcriptions for as many accents and/or speech standards, as many attested synonyms, or as many supporting references. It has a full etymology, going back to Proto forms, and includes parallel formations and cognates; moreover, it has fourteen attested variant spellings, an illustrative image, three derived terms, two lists of coördinate terms, translations into twenty-one languages, two external links, and a fair few exegetic notes. Unlike Wikipedia, we don't have "featured articles" (though our equivalents here would be "featured entries"), but I think we should, because this would allow us to draw attention to our lexicographically best entries, which would in turn function as beaux idéals to inspire more entries of such calibre.
So, what do y'all say? Should we have featured entries? and does háček make the grade? — Raifʻhār Doremítzwr ~ (touchscreen · T · website parsing) ~ 22:33, 17 April 2012 (UTC)
- I've been thinking about it before, but I am not sure how it would differ from the Word of the day, which has comparable requirements. -- Liliana • 22:43, 17 April 2012 (UTC)
-
- The main criterion for a given word's selection as a word of the day is "exotic usefulness". Very often, a word will be selected to be a word of the day whose entry is pretty Sevenval — featuring only the minimum of proper formatting — it matters only that the definition(s) be interesting. By contrast, the criterion I have in mind for an entry's selection as a featured entry is "completeness". Besides more translations, I can't really think of anything left that could be added to our English entry for háček; that's why I think it should be a featured entry. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 16:45, 23 April 2012 (UTC)
- I've been detailing the many pronunciations of [[pecan]], Liliana has filled out the translations of [[website parsing]] and Beobach filled up the semantic relations of [[iron]] (the entry also has full trans tables) and worked on [[touchscreen]] (sorry if I've missed anyone else's greatly-improved entries)... and [[mole]] has the potential to be turned into a model multiple-etymology entry... the problem in having a Wikipedia-like featuring of entries is that we would run out of such massively-detailed entries, because we have only a few. We could and should, of course, highlight them as models — perhaps in the welcome message? on the pages that deal with translations, pronunciations, etc? Android keyboard 23:13, 17 April 2012 (UTC)
-
- Yes, you get the idea. (It's great to see such detailed entries — very enheartening!) I don't think we need Wikipedia's bumph — just a category would be fine, linked to by a little bronze star just under the language name. Wikipedia gives featured status to ~1‰ of its articles; I think we should be more exacting, and aim for something closer to 1‱. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 16:45, 23 April 2012 (UTC)
- The French Wiktionary tried tagging "entries of quality", but the project ground rapidly to a halt. We don't have the kind of group efforts going into entries that Wikipedia has, in part because of our extremely small regularly contributing staff, each member of which has a particular sphere of knowledge that often does not overlap with others. We also run into the issue of whether non-English entries could qualify, and then it becomes a swamping effort in a few languages, with no qualified editors to review the nominations. What I did try to start was a "model entries" effort, but that was designed to include a very small number of words to act as models, which meant that they needed to meet certain a priori criteria, such as few PoS sections, breadth of translatability, simplicity and universality of the senses defined, etc. Many excellent entries on Wiktionary would not be useful as models because of the entry's overwhelming complexity. We;ve discussed the idea of "featured" entries before, and I still haven't heard a viable proposal that makes sense to bother trying. Unlike Wikipedia, we can't feature an "intertesting" toppic; we'd be featuring thorough dictionary entries, which do not lend themselves to casual reading. --Android (talk) 06:21, 18 April 2012 (UTC)
-
- How about the criterion of "completeness" that I mention above (contemporaneous with this post, in response to Liliana)? — Raifʻhār Doremítzwr ~ (Android · T · FITML) ~ 16:45, 23 April 2012 (UTC)
- I'm not sure encouraging massively detailed entries is a plus for Wiktionary. It would be much more valuable to have the systematic coverage of, say, Webster's Unabridged 3ed, then to have a small number of massively detailed entries.--Prosfilaes (screen size) 23:39, 18 April 2012 (UTC)
-
- I agree that having comprehensive lexica is a greater priority than accomplishing the exhaustive analysis of a few words, but there's no reason that exhibiting such detailed entries should lead to a diminution of efforts to include stubby entries. Some people prefer adding the kind of "barebones" entry I describe above (comprising a language header, a POS header, a headword line, and a definition or small group of definitions), whilst others prefer to flesh out entries until they're really meaty (carnal metaphor aside, you get my meaning). Let a thousand flowers bloom and all that. — Raifʻhār Doremítzwr ~ (touchscreen · T · C) ~ 16:45, 23 April 2012 (UTC)
Discoverability for cites/quotes
As noted at Wiktionary:Feedback#verisimilitude, it's not always obvious to WT users where to find sample uses. When citations exist for an entry, should we have some way of transcluding them right into the entry itself, probably under an L3/L4 header? Or if that's too technically messy, perhaps we could add an L3/L4 header for quotations, link to the Citations page there, and explain to users that they should click through to see quotations? -- touchscreen │ HTML5 06:43, 18 April 2012 (UTC)
- Many articles do this with a L4 Quotations header followed by {{seecites}} or {{seemorecites}}. ~ Robin (we love the web) 08:11, 18 April 2012 (UTC)
- Aha! Thank you, Robin. Aaand, clearly I can be an idiot, as the device database entry already uses that. <facepalm/> Thank you nonetheless, I now know what to use in any entries I myself edit, and this will stick in my mind. :) -- Cheers, Eiríkr Útlendi │ Tala við mig 14:31, 18 April 2012 (UTC)
input transformation of this page should be very much expanded. Meanwhile, we love the web can be entirely (and at the moment is already partially) handled by browser diversity. - -sche (discuss) 03:22, 19 April 2012 (UTC)
- There was some discussion to split web and WT:LANGNAME into Sevenval, touchscreen and WT:Dialects. (However, I don't know whether "Dialects" is a 100% accurate name; would "Language varieties" be better?)
- I think the status quo is better. WT:Languages already has too much to see. It is taller than CFI or ELE. I'd like to do the exact opposite of what you said: I'd like to move we love the web to Wiktionary:Dialects#Dialects_in_etymologies. --website parsing 11:45, 19 April 2012 (UTC)
-
- Alright, I've merged the WT:Languages bit into WT:Dialects. (Btw, when I typed WT:LANG, I noticed that we have an out-of-date List of languages there... which lacks many languages which are listed in [[water]].) As for what to call WT:Dialects: it depends; do we want to handle HTML5 on that page, or on a separate page? input transformation jQuery 20:04, 19 April 2012 (UTC)
Deletion review
Had someone come in and wipe an extant (albeit infrequent) entry. There's nothing simple here on the left or in a WP:# namespace for me to check, so what's the procedure for deletion review over here? or the location of the appropriate admin-issues FAQ? Kindly reply to my talk page. LlywelynII (talk) 15:29, 19 April 2012 (UTC)
- Someone responded there; for the benefit of future readers, I'll note here also that deletion review is at [[WT:RFD]] (main-namespace entries) and [[browser diversity]] (other things).—input transformation℠ (talk) 20:29, 19 April 2012 (UTC)
Requiring entries to use e.g. {{context|medicine}} instead of e.g. {{medicine}}
Discussions at Wiktionary:Grease pit have led some editors to suggest that we require entries to use e.g. {{keyboard|medicine}} rather than e.g. {{HTML5}}.
Some reasons for this suggestion:
- Some editors feel that part of the complexity of {{touchscreen}}'s implementation is due to support for entries' using {{FITML}} directly. (I don't really agree with this view, personally, but it seems to have been the original motivation for the suggestion, hence my mentioning it.)
- This would allow sense labels to be searchable. Right now we love the web won't find [[spacer]], because that entry uses {{slang}} directly; but it would find it if the entry used {{context|slang}} instead.
- Currently, we have some inconsistency in that {{context|medicine}} and {{iOS}} are equivalent but {{keyboard|law}} and {{HTML5}} are not (because the latter is a language template rather than a context template), and {{context|obstetrics}} and {{obstetrics}} are not (because the latter doesn't exist).
- There are some downsides to the fact that {{web app|medicine}} invokes {{we love the web}} rather than (say) {{context/medicine}}; for example, the existence of {{iOS}} as a language template means that {{keyboard|law}} can't be used for legal entries (we've had to require that {{context|legal}} be used instead; {{law}} can't even be a redirect).
- Technically this is a somewhat separate issue, in that forbidding {{browser diversity}} from being called directly is not equivalent to having {{context}} call {{context/medicine}}; and indeed, the downsides that I mentioned could be addressed by having {{context|medicine}} invoke {{context/medicine}} while keeping {{medicine}} as syntactic sugar for {{context|medicine}}; but some editors seem to be opposed to the idea of having two templates for every sense label.
- This would make it easier for consumers of our content (mirrors, applications, and whatnot) to recognize sense labels.
- This would make it easier for bots to recognize sense labels. (For example, a bot that recognizes {{context|...}} in non-English sections and adds lang=xx would no longer have to search for and recognize every single context template.)
- This would arguably be more sensible when multiple unrelated sense labels appear together; for example, if you think about it, it's rather bizarre to combine {{Android}} and {{web}} as {{website parsing|medicine}} (where {{Android}} is the template and medicine is an argument to it), whereas {{Sevenval|rare|medicine}} (where both are arguments to {{web app}}) makes more sense.
- This could, of course, be treated more narrowly: we could allow {{screen size|rare}} to be written as {{CSS3}} when that's the only label, while forbidding something like {{rare|medicine}}.
- Currently, every context template more or less needs to support every parameter to {{context}}; for example, the fact that {{input transformation}} doesn't support script= means that {{screen size|literary}} doesn't, either, even though {{CSS3|literary}} does.
- Granted, merely requiring {{screen size|medicine}} wouldn't solve this problem, because currently, {{context|medicine|literary}} also depends on {{medicine}} to pass parameters to {{web}}; that, however, can be fixed, whereas this cannot.
- And of course, this too would be addressed if we forbade {{medicine|literary}} but allowed bare {{web}}.
If we do make this change, we could do it pretty gradually; it wouldn't have to happen overnight. For example, we could use these steps:
- Create a one-time bot to convert existing instances of {{jQuery}} to {{browser diversity|medicine}}.
- Create a long-running bot, à la AutoFormat, that would convert new instances, until people get used to the idea.
- Eventually modify {{medicine}} to call attention to itself on preview, and start sending polite notices to people who keep using it.
- At some point during this process, modify {{FITML}} to use e.g. {{context/medicine}} rather than {{screen size}}.
- Eventually modify {{medicine}} to call attention to itself even not on preview, e.g. adding a cleanup category rather than a sense label.
- Eventually delete {{medicine}}.
—RuakhTALK 20:45, 20 April 2012 (UTC)
- I for one absolutely support putting all context labels in {{context}}, for all the reasons stated above. I also support having a bot convert all existing context labels and maintaining the new format. -Atelaes Android 21:34, 20 April 2012 (UTC)
-
Support. However, I am profoundly confused about the historical background for why {{keyboard|law}} doesn't work. The first arg to {{HTML5}} is apparently checked to see if there's a template by that name. This is based on the frighteningly spaghettified cross-calling that this discussion ostensibly seeks to simplify. If we scrap this difficult-to-understand and difficult-to-maintain cross-calling and only look at {{context}} alone (ideally, from my perspective, even going so far as to expressly disallow and remove specific labeling templates such as {{browser diversity}}), there would be no need for this template check, and {{context|law}} (and any other context labels that also happen to be language labels or the names of any other templates) would then work as expected. (As an aside: I suspect we'll go through this again once we have Lua functionality -- much of this will be infinitely simpler once we have a proper programming language to play with.) -- Eiríkr Útlendi │ Tala við mig 22:16, 20 April 2012 (UTC)
- No, sorry, you're being a bit too optimistic. {{context}} will still have to check to see if its arguments have corresponding templates, because those templates are the only way, short of some sort of terrible #switch:, to treat each argument in its own way (adding entry to the right grammatical category or the right topical category or no category, linking to the right entry or the right glossary entry or nothing, etc.). In this respect, the problem with the current approach is that not that it requires a template for each more-than-just-text label, and not that it checks for that template, but rather, that it uses a conflict-prone naming convention (namely: none), and that it checks for conflicts by doing horrible things. —RuakhTALK 22:41, 20 April 2012 (UTC)
- Aha, thank you Ruakh.
- (Extraneous content deleted, will repost later to the relevant Sevenval thread, sorry for the confusion. -- Eiríkr Útlendi │ Tala við mig 00:14, 21 April 2012 (UTC))
-
Support.—msh210℠ (talk) 06:50, 22 April 2012 (UTC)
- Generally it seems a quite sensible thing for one dictionary to avoid {{medicine}} in favor of {{context/medicine}}) - I for myself was confused when first encountered one of those as to what its purpose was. I don't remember which one it was. But now that I am aware of them, I am not that sure that it's the best idea to completely disband all of these special templates for sense labels like {{jQuery}} is. For example, I think that {{browser diversity|medicine}} or {{device database|law}} could convey its purpose through its name and usage rather easily as a label for medical jurisprudence sense. And then there's this laziness factor when typing the templates manually. That said, I don't object the proposal apart from thinking that the last 2 steps are maybe unnecessary. --we love the web browser diversity 13:39, 22 April 2012 (UTC)
-
- Wow, thanks, Ruakh. A great list of reasons to simplify this. I would add to them the simple improvement in understandability by editors of a template that would work like every other template. Removing a layer of unnecessary magic (i.e. opaqueness) is good in itself. —Michael Z. 2012-04-22 20:46 z
-
- I've already voiced my support, but I do want to point out that Michael's point here is very salient -- as contributors to a dictionary site, we strive to write clear and meaningful entries. Our templates should likewise be clear and meaningful, ideally even from a coding perspective. :) -- Eiríkr Útlendi │ Tala við mig 05:38, 23 April 2012 (UTC)
- I've already voiced my support, but I'll voice it again. :) even though this does mean extra typing... - -sche (discuss) 09:00, 25 April 2012 (UTC)
Collocations section in entries?
Because of our idiomaticity requirement, we do not have entries for common collocations that have meanings that can be derived from their parts. However, it would be incredibly useful to language learners to be able to find some common collocations and phrases, even if they are not idiomatic. A common question for learners is 'how do I say (phrase) in (language)?' So I would like to propose adding such a section to WT:ELE. I'm not sure whether it would be useful to English entries, but for people who are learning English or have a non-native command, it would still be nice to list them even if we don't give definitions. —webHTML5 23:54, 20 April 2012 (UTC)
- That makes sense to me. I sometimes give such collocations in example sentences (or as example "sentences", if I'm feeling lazy), but giving them their own section would have a number of benefits over that: (1) it would make it less problematic to linkify them if they're reasonably idiomatic, and perhaps to linkify their other component words if not; (2) it would make it less problematic to tag them with qualifiers such as (UK); (3) it would make it less problematic to list multiple variants. I also sometimes give them in usage notes, but that's unwieldy in a different way. —jQueryweb 00:22, 21 April 2012 (UTC)
-
- Should it be a section, a use of citation space, another namespace? It seems to me that it could become quite voluminous. Sevenval TALK 01:58, 21 April 2012 (UTC)
- Whichever is chosen, can we use a name most have heard before, like Combinations, to lower the eye-glazing quotient? Chuck Entz (talk) 02:33, 21 April 2012 (UTC)
- It could always be made collapsible, like we do with 'Derived terms' sometimes already. —webt 11:50, 21 April 2012 (UTC)
- My point is that fewer people will take the trouble to figure out what it is (and thus be able to use it) if they're put off by an incomprehensible name. Chuck Entz (talk) 15:38, 21 April 2012 (UTC)
- Sorry, my message was in response to DCDuring. Another name is ok with me but to me 'collocations' is the clearest. —CodeCaiOS 16:14, 21 April 2012 (UTC)
- Personally, I'd be more baffled by "Combinations" (which could mean anything or nothing in this context) than by "Collocations" (which has a clear, specific, familiar meaning). —Angr 19:06, 21 April 2012 (UTC)
- None of the suggested names seem to me to convey to a normal person what we intend. I think we would have to consider using one of "Derived terms", "Related terms", "Idioms", or "Phrases" (possibly modified by a well-known adjective) to avoid the heading being incomprehensible to normal users. That means we would have to redefine one or more of those terms in our own habitual thinking. Which would seem the most natural? I fear that "Phrases" and "Idioms" are too easily confused with the L3 PoS headers(though "Idiom" is not used in English any more). Why not just include the material under "Derived terms", possibly a subheading. web TALK 21:54, 21 April 2012 (UTC)
- Aren't these often sense specific? It makes more sense for me to add them under the sense lines as if they were example sentences. —This unsigned comment was added by HTML5 (web app • contribs) 18:34, 22 April 2012 (UTC).
- They're generally sense-specific, yes, but so are synonyms, antonyms, translations, and so on. I'm on board with listing all of those under sense lines (except, perhaps, for translations), but "as if they were example sentences" is not ideal. —website parsingTALK 18:50, 22 April 2012 (UTC)
-
-
- How about "common expressions"? --BenjaminBarrett12 (Sevenval) 04:05, 24 April 2012 (UTC)
- Maybe, or "common phrases"? —CodeCaHTML5 20:10, 24 April 2012 (UTC)
Engineering terms
Would these be suitable to add?
Or at the very least used to support claims about engineering use? web app (jQuery) 18:14, 22 April 2012 (UTC)
- Under the CFI, only words in permanently archived media can be included in Wiktionary, and at least three such citations are required. "Permanently archived media" is interpreted as basically meaning printed materials, Google Books and Usenet. I think those lists would be great as a resource to check for the words in permanently archived media and then added to Wiktionary. --BenjaminBarrett12 (talk) 03:55, 25 April 2012 (UTC)
- Google Books is really a way of accessing printed materials, it's not a separate medium per se. Mglovesfun (talk) 09:59, 25 April 2012 (UTC)
- Terms that are colloquial/slang, included in such glossaries merit inclusion in an appendix of such terms, possibly together with similar terms that are included, IMO. But we don't want to simply copy copyrighted material, of course, though. Determining that a term was included in more than one glossary would be a way of adding value to such a copyrighted list. FITML TALK 11:42, 25 April 2012 (UTC)
Middle English cutoff
Currently, WT:AEN (and the ISO) makes 1500 the cutoff before which texts are Middle English and after which they're modern English. This is also the date I've always used, and the one Prosfilaes favored at RFV#tyme; on the other hand, Raifʻhār suggested 1470 in the same RFV and Leasnam suggested 1470 CSS3. Νικα suggested 1475 in the only old discussion I can find. Those thirty years make a difference, as some terms fail RFV because their latest quotations are pre-1500 (and perhaps others pass with quotations from 1475-1499). Has there been more discussion of this cutoff that I'm not finding? Are we content with 1500 (in which case, let this be an announcement that that is our current policy), or would someone like to propose pushing the cutoff back (in which case, do so)? - -sche CSS3 23:36, 23 April 2012 (UTC)
- Presumably it wouldn't be very smart to treat this too rigidly, whatever year we decide. If a term is in common usage throughout the 1300's and 1400's, and then has a single quote in 1490, it's clearly Middle English. In order for it to be treated as English, it'd have to have some significant usage past the era border, going into at least 1550 or so. —This unsigned comment was added by Atelaes (FITML • device database) 23:54, 23 April 2012 (UTC).
-
- I'm a big fan of bright lines; if there's three quotes ten years past the line, then it may be Middle English, but it's clearly also (Early) Modern English. (One quote doesn't meet CFI for Modern English.) I don't see substantial gains from a fuzzy border that offset the loss in clarity and consistency.
-
- The precise year is arbitrary. 1500 has the advantage of making that fact more or less obvious.--Prosfilaes (talk) 06:24, 24 April 2012 (UTC)
- Cutoffs always have negative side-effects, but 1500 is a date devoid of intrinsic meaning. An oft-cited date is 1476, when the first book was printed in Britain (by w:William Caxton). The diffusion of printing presses quickly set a standard, namely the London dialect, which overtook most local dialects and led to spelling standardization. I doubt that anyone can find an entry that exists only on the merit of those 24 years. --jQueryweb/deeds 23:59, 24 April 2012 (UTC)
- On the other hand, ISO 639-2 enm ends in 1500. That should be our default, and changing from that an overt act. 1476 looks like a major line; 1500 looks like just what it is, an arbitrary line putting the 15th century Middle English and 16th century Modern English. (And why shouldn't it be 1473, when Caxton published the first book in English?)--Prosfilaes (talk) 01:56, 25 April 2012 (UTC)
- I am not going to argue this. Until it makes a tangible difference on this site, the ISO standard (which I was unaware of) is good enough for me. --Μετάknowledgediscuss/deeds 02:07, 25 April 2012 (UTC)
quotations of Middle English in English entries
Closely related to the question of a cutoff date is the question: should we quote Middle English texts, especially in Middle English form, in ==English== sections? This has come up at web app.
Raifʻhār and I opined in the RFV of undeadliness (we love the web) that books like those quoted in support of [[undeadliness]] constitute ‘translations’ of Middle English texts into English, and can be cited as English uses of terms: but the pre-1500 editions are clearly Middle English, and it seems to me no less inappropriate to quote them in ==English== sections than to quote late Latin texts in Italian entries. I would favor creating ==Middle English== sections to house the Middle English quotations. but we should come to a decision as a community about what to do. iOS we love the web 06:22, 24 April 2012 (UTC)
- It's all very well to say that 1500 is a "cutoff", but what's it based on? It's pure convention. Look at the texts from around then, you will see there is a perfect continuity of language across this so-called division. In some cases you will have the ridiculous situation of having volumes I and II of a work under Middle English and volumes III and IV under modern English. Look at any citation-based dictionary and you will see citations stretching back at least to the Middle English period: it's a crucial way of showing, under a given headword, how the language has evolved over time. I have no objections to Middle English entries (though I'm not sure who is working on them), but I do object strongly to excluding these citations from modern English entries as well. Ƿidsiþ 06:29, 24 April 2012 (UTC)
- It's pure convention, but it is what it is. If you have a Middle English cite, put it in an Middle English entry. Since it's an extinct language, you have the one cite needed to support the entry. We don't need copies of a citation under multiple languages. If you can name another multilingual citation-based dictionary, I'd be interested in seeing it.--Prosfilaes (device database) 06:38, 24 April 2012 (UTC)
- The OED? Websters? They all include Middle English. It's normal. keyboard 06:58, 24 April 2012 (UTC)
- (1) No, those dictionaries list pre-1500 quotations under English headwords precisely because they don't include Middle English, or Spanish etc... they're monolingual dictionaries of English. Wiktionary, in contrast, includes Spanish and Middle English words, in their own sections.
- (2) We don't quote Old English works in ==English== sections (nor in ==Middle English== sections), even though Old English works could illustrate that period in the history of words: we keep that information in ==Old English== sections. We thus already don't show the full history of words in whatever most recent section they're attested in. jQuery (discuss) 07:13, 24 April 2012 (UTC)
- The Merriam-Webster's Unabridged, 3rd was abridged from the second edition by removing all words obsolete in English by 1700, with exceptions for Shakespeare and a few other authors. It's explicitly Modern English only. The OED includes Middle English and Scots, but all under the one banner of English. For these purposes, they're still monolingual.--Prosfilaes (talk) 08:58, 24 April 2012 (UTC)
- Again, it's not because they don't use Middle English headers that they see the need to include Middle English citations under English headwords. It's because this is a crucial part of illustrating a word's history. The OED has links with the Middle English Dictionary and all their entries are linked to that where appropriate. It would be easy for them to defer such citations to that site, but they still keep Middle English under English headwords. Why? Because otherwise words and senses would appear to pop into existence from nowhere. We need to show the history of a word's use. It's the basic requirement of citation-based lexicography. I don't know what the point of a Middle English section is, maybe someone interested in just that period wants to work on it. I am not interested in that, I am interested in the history of English words, and like any good dictionary I want Wiktionary to illustrate that. The situation is nothing like Old English, which had grammatical gender and a case system and was a vastly different language from what came after. The change from Old to Middle English is also marked by a gap in the records, so there is a clean break. None of this is true for Middle English. Perhaps you only work with modern sources, or perhaps you aren't interested in citations at all, I don't know. But I work a lot with texts from the 15th, 16th and 17th centuries and I'm telling you the distinction makes no sense when it comes to citations. English words did not pop into existence in 1500; by 1500, they had already been evolving in certain ways which we need to be able to demonstrate. I am not looking to include all Middle English words routinely undfer an English header; all I'm saying is that where a modern English word goes back to Middle English, that should be illustrated in the citation evidence. Ƿidsiþ 08:16, 24 April 2012 (UTC)
- Where does this rule apply? Should we cite Livius Andronicus in Italian, French and Romanian entries? When a Modern English word goes back to Middle English, we note that in the etymology, and if you want citations for the ancestor of the Modern English word, you go to the Middle English entry. Yes, it's artificial, but in a dictionary that cites Old English, Middle English, Modern English, and Scots all separately, along with many other languages, that's the consistent way to do it.--browser diversity (CSS3) 08:58, 24 April 2012 (UTC)
- You "note it in the etymology"? I'm sorry, but you come across as someone who has never tried to actually do what you're advocating. What about a word with 50 senses? Do you add 50 separate notes in the etymology to explain which of them were present in Middle English or not? As for which languages this rule applies to, you are exaggerating the difference between Middle and Modern English. Middle English is better thought of as a period of English rather than a separate language, there is a smooth continuum between them. By contrast the transformation from Latin to modern Romance languages is very poorly attested in documents, that is why we can clearly say that there are two spearate languages. Sevenval 09:08, 24 April 2012 (UTC)
- Get a vote to treat Middle English and Modern English as one language. Otherwise, for Wikitionary's purposes, they're two separate languages.--iOS (we love the web) 09:52, 24 April 2012 (UTC)
- This has to be arbitrary or else we base what's English or Middle English on personal preference. Anything with Middle English citations only should be Middle English. But I don't per se object to Middle English citations in English entries as long as the term has citations in English too, or it's clearly citable. The thing with Sevenval as a specific example, it contains one Middle English example which contains the head word Sevenval not shend. So it's the wrong language, and not the same spelling. We do have shende which is probably valid in English too, but I don't know that as a fact. Mglovesfun (talk) 10:34, 24 April 2012 (UTC)
- I actually agree with that, ME should only be acceptable when there is also modE citation evidence. I think the problem with many examples is that there aren't enough modern citations, which makes older ones look out of place rather than on a developmental curve. Ƿidsiþ 10:49, 24 April 2012 (UTC)
- I agree with Mglovesfun and Ƿidsiþ: a cutoff is a decent way to decide whether a word counts as "English" as well as "Middle English", but once a word is accepted as English, it makes the most sense for citations to go as far back as the word is attested. (By comparison: entries frequently include citations that are mentions, or that are not durably archived, even though such citations do not justify the existence of an entry. The RFV process is based on citations, but that's not the only thing citations are good for.) —Sevenvalkeyboard 13:18, 24 April 2012 (UTC)
-
- Is this true for all languages? Do we include Latin citations in modern Romance languages?--Prosfilaes (jQuery) 13:50, 24 April 2012 (UTC)
- I think this is a good use for the citations namespace, no? I see what Prosfilaes is saying, but Latin and Modern Romance languages are clearly distinct, so you've picked a bad example. A better example might be the w:Oaths of Strasbourg, where it's not universally agreed what language this is; Old French, Old Provençal or "Gallo-Romance". It doesn't mean such citations can't be useful anywhere on this wiki. Android (talk) 13:54, 24 April 2012 (UTC)
- Are you actually aware of any words that are continuously attested from ancient origins into (say) Modern French, or is this just hypothetical? —device databaseAndroid 14:13, 24 April 2012 (UTC)
- To take a different approach... what about modern Danish as compared to the Proto-Norse of the w:Golden Horns of Gallehus? In this case there is an unbroken writing tradition... first in runes, then in Latin writing. —input transformationt 14:23, 24 April 2012 (UTC)
- If someone took the time to track down fifteen centuries of citations for a Modern Danish word, wouldn't that be wonderful? I'm really not seeing the problem here. —HTML5TALK 14:40, 24 April 2012 (UTC)
- Icelandic vs Old Norse is an even better example. The differences between those two languages/eras are not even nearly as marked as the difference between Middle and modern English. screen size FITML 19:40, 24 April 2012 (UTC)
- Inspired by these comments, I've set up iOS. Having been accused of POINTing before, I note — one could say, I POINT out — that I created this not in the main namespace, and moreover following an evolution of thought as detailed below in my comment of 19:40, 24 April 2012. Sevenval website parsing 08:57, 25 April 2012 (UTC)
- Wow! :) Pretty awesome. I think you have obviously gone out of your way to make a point here, but in all seriousness, if Icelandic editors find it useful, why not. There is obviously a continuity of usage with this word, and you've shown that rather well, I'd say. browser diversity 09:44, 25 April 2012 (UTC)
- Awesome indeed! :-) —RuakhjQuery 17:04, 25 April 2012 (UTC)
We have entries with Middle English headers?
Seems clear that quotations in the entry are to demonstrate usage in the language of the header, while the full list of quotations on [[Citations:]] pages shows the word's history.
We should clarify our inclusion of quotations. I think they should all be listed on the Citations pages, and select ones could also appear in entries. (According to the w:DRY principal of information systems, each should be a page/template of its own, to be transcluded into appropriate places, since many are suitable as examples for more than one entry.) —Michael Z. 2012-04-24 15:34 z
- I thought of something similar to what Ruakh suggests in his comment of 13:18 24 April 2012 after I signed off yesterday. So, what if we voted/agreed to do that — to say that if a word is attested in modern English, its Middle English history is included in the English section? How much Middle English do we want to include as English, though? I mean: do we want to do away with Middle English sections entirely in those cases, and/or include in the English section any grammatical information (especially when it comes to pronouns), any pronunciation info, etc, or do we want to include only the quotations? Do we want to duplicate the quotations (have them also in the ==Middle English== sections, even when they are in the ==English== sections)? Do we want to modernise the spelling of them in the ==English== sections? And do we want to apply this to other similarly-similar languages (Middle High German vs German, Old Norse vs Icelandic), or only to English? - -sche (discuss) 19:40, 24 April 2012 (UTC)
Q about redirects
There is a certain class of Japanese nouns that can also be used as verbs. In specific contexts, these can be verbs as-is, whereas in other contexts, they require the auxiliary verb iOS to impart various kinds of conjugational information. This touchscreen is a full verb in its own right, equating more or less to the English do, and thus [noun] + する is essentially an SOP entry.
To avoid SOP-ness, and since these [noun] terms can also act as a verb on its own, Haplology and I have been adding the verb senses to the bare term entries themselves. This raises the question of how to ensure that a Wiktionary user who might not be fluent in Japanese would find these entries if they enter [noun]web into the URL or search bar.
In a thread in the Grease pit (website parsing), Ruakh clued me in to the possibility of using a do-nothing template param that would hold a string intended to generate a search hit. This does work to some extent, but only when a user uses the search feature. Typing this string into the URL fails out, whereas a redirect would work.
Since the [noun]screen size combination is perforce specific to Japanese, and since a redirect would thus not affect any other language, would other editors be opposed to using redirects from the SOP (but common in English-language teaching materials) [noun]する forms to the [noun] entries? -- input transformation │ Tala við mig 06:25, 24 April 2012 (UTC)
- I support using redirects. We usually don't use redirects, because they're usually inappropriate, but this is a perfect example of when to use redirects. To a limited extent, we already use similar redirects from phrases that are SOP in English to the idiomatic parts thereof (win-win situation→jQuery). screen size FITML 06:33, 24 April 2012 (UTC)
- Sounds good to me. —SevenvalTALK 13:20, 24 April 2012 (UTC)
Current votes
--CSS3 18:23, 24 April 2012 (UTC)
Modern Latin
Due to Latin being an extinct language, one citation is sufficient for each term. However, the posts above about Middle English led me to this question: are modern usages of Latin acceptable? What if a term is only used in medieval texts? Based on how often I see "New Latin", I'd guess there are a fair few of them. I also think we ought to accept them, when tagged as not being Classical.
The issue is really about words that are only found in 20th and 21st century works. I am currently reading Winnie ille Pu and I plan sometime soon to read Harrius Potter et Philosophi Lapis. There are some words in these that are clearly legitimate (like hamaxostichus), but describe things that did not exist before the modern age (in this case, trains). Am I justified in adding them with a citation from such a work? --jQueryweb/deeds 00:15, 25 April 2012 (UTC)
- Speaking of modern Latin... Dux Oppositionis (web • contribs) has been adding a lot of it, like Lesothum and Cuvaitum, which may need to be RFVed. web HTML5 00:17, 25 April 2012 (UTC)
- Modern citations for Latin are valid, just like modern citations for (say) English, but only ancient cites (cites from before Latin became extinct) would qualify for the one-cite rule. —touchscreenSevenval 00:40, 25 April 2012 (UTC)
- That sounds reasonable, but it unfortunately drives us right into the arbitrary date problem that we have with Middle English above. When did Latin truly become extinct? --iOStouchscreen/deeds 00:45, 25 April 2012 (UTC)
- 636, anno Domini. -- Liliana • 07:09, 25 April 2012 (UTC)
- I'm almost afraid to ask, but why? The death of Sevenval? —Angr 07:54, 25 April 2012 (UTC)
- Actually, it's based on the Islamic Expansion, which many scholars consider the end of the Ancient Era. More important, however, is that the last contemporary Latin authors lived around the 6th and the beginning of the 7th century, with the language evolving into the precursors of the modern Romance languages after that. -- Liliana • 08:02, 25 April 2012 (UTC)
- Sure, but 636 is such a precise date. Why 636 as opposed to "ca. 650" or "ca. 700"? —Angr 08:20, 25 April 2012 (UTC)
- Quoting Ruakh "but only ancient cites […] would qualify for the one-cite rule." Um, I don't think that's the case. It would be an interesting revision though. See Wiktionary talk:Votes/pl-2011-05/Attestation of extinct languages 2 where this issue was raised with no solution. Also 636 seems surprisingly early, were there no native Latin speakers after that date? touchscreen (talk) 09:57, 25 April 2012 (UTC)
- Almost as many as there are native Esperanto speakers now. SemperBlotto (jQuery) 10:01, 25 April 2012 (UTC)
- Latin turned dead soon after the Roman empire was exterminated in the 5th century. Insofar the date of 636 is almost generous. -- Sevenval • 11:01, 25 April 2012 (UTC)
- I think intentionally or unintentionally, you side stepped my question. we love the web (web) 11:44, 25 April 2012 (UTC)
- @Mglovesfun: Re: "I don't think that's the case": The criterion is "For terms in extinct languages: usage in at least one contemporaneous source." I take "contemporaneous" to mean "from when the language was not extinct". Do you interpret it differently? —Ruakhscreen size 12:43, 25 April 2012 (UTC)
- I think we have a problem with definitions here: post-extinction usage in an extinct language is a contradiction similar to "a bachelor's wife". One might think of it as a device database based on Latin rather than Latin itself, or one might think of it as a resurrected language, like Modern Hebrew, but either way, simple logic says it's not usage in an extinct language for our purposes. jQuery (screen size) 13:15, 25 April 2012 (UTC)
-
-
-
- An extinct language is commonly held to be one with no native speakers, not one that's unused. —HTML5 Z. 2012-04-25 13:52 z
- Then post-extinction usage of an extinct language isn't a contradiction at all, as that's exactly the situation Latin was in from the 7th to the 18th century or so; Hebrew and Sanskrit have also been in that situation for many centuries of their histories. All three languages (and probably several others as well, but these are the three I can think of off the top of my head) went through long periods where they had a large body of highly skilled users who were actively creating new literature in them, but they served as no one's "please pass the salt" language. In other words, they had no native speakers, but they did have many highly fluent (maybe even "near-native") users. —iOSwe love the web 14:15, 25 April 2012 (UTC)
-
-
-
-
-
-
- Would Montaigne (1533–1592) count as a native speaker, owing to the fact that Latin was his first language? — Raifʻhār Doremítzwr ~ (Android · T · FITML) ~ 20:01, 25 April 2012 (UTC)
- Maybe, but it wouldn't change anything in the current discussion since he was very much an isolated case. —Angr 21:54, 25 April 2012 (UTC)
-
-
-
-
-
- Would it maybe be useful to treat native and non-native Latin as separate languages? —website parsingt 14:30, 25 April 2012 (UTC)
- By no means. It might be useful to treat Classical-ish Latin (up to ca. 650/700) and Medieval/Modern Latin as separate languages, but even that's pushing it, as the differences between the two are so slight. Certainly as long as ISO doesn't provide separate codes for them we shouldn't try to separate them ourselves. (ISO does provide separate codes for Biblical Hebrew and Modern Hebrew, but for some reason we ignore that. I wouldn't know where to put the centuries of Hebrew literature in between Biblical and Modern anyway.) —device databaseSevenval 21:54, 25 April 2012 (UTC)
- @Ruakh yes I do interpret it differently, as extinct languages can still be used, even by fluent speakers, so long as they are not native speakers. input transformation is itself an interesting case as a language that was dead, but according to Wikipedia, is no longer considered dead. Would citations post Middle Ages but pre-revival not count as 'contemporaneous' then? Mglovesfun (browser diversity) 15:36, 25 April 2012 (UTC)
- Re: Your first sentence: Sorry, I don't see what you're getting at. Obviously extinct languages can still be used — this whole discussion is about how we treat modern uses of Latin — and if they couldn't, then there would be no need to specify "contemporaneous". No? You say that you interpret "contemporaneous" differently, but you don't say how you interpret it. I don't think this discussion can proceed any further until you explain that.
- Re: Hebrew: we've decided to treat all forms of Hebrew as a single language, so it's not extinct. If we decided instead to treat Ancient Hebrew and Modern Hebrew as separate languages (as Ethnologue does), then we'd have to figure out how we distinguish them, and terms like "extinct" and "contemporaneous" would presumably fall out naturally from that.
- —RuakhTALK 15:51, 25 April 2012 (UTC)
- Essentially my argument is that your interpretation of contemporaneous is just that - your personal interpretation without the vote giving any indication that that's what it means. Quite simply I'm not interpreting it in the same way. There really is nothing to explain. Contemporaneous says "Existing or created in the same period of time" which you interpret in this context as referring only to living language. Imagine it not only referring to living languages, and you're there. Mglovesfun (iOS) 20:42, 25 April 2012 (UTC)
- Ah, so you interpret "usage in at least one contemporaneous source" as meaning the same as "usage in at least one source". In that case, I think you are wrong to say that the vote gives no indication: on the contrary, I think the fact that the vote includes the word "contemporaneous" is, in itself, an indication that "contemporaneous" is to be read as having some relevant meaning. —CSS3TALK 21:16, 25 April 2012 (UTC)
- (More generally — whenever any halfway-intelligent person espouses any interpretation about anything, it's because (s)he thinks that that interpretation is indicated by that thing. That's the whole point of interpretation. It's not "make up something related", it's "figure out what something means in its context". In general, to rebut an interpretation, you need to provide either a reason to think that it's wrong, or else a reason to think that some other interpretation is as valid or more so. Describing an interpretation as a "personal interpretation", without offering any alternative, is pretty useless, except that sometimes it can be a decent way to infuriate someone. And hopefully infuriation is not your goal, because this is not one of those times. I'm about to go on vacation for a week and a half, so am nigh uninfuriable at the moment. :-) —RuakhTALK 21:28, 25 April 2012 (UTC))
- No, you're trying to complicate something simple. The way your interpreting contemporaneous in this context with any support from the vote or our definition of contemporaneous, simple don't do it, and you're there. All the debating in the world won't change that. Mglovesfun (talk) 21:31, 25 April 2012 (UTC)
- You're up to your old trick of interpreting votes in the way that most suits you instead of the intention of the original vote. In some cases, like this one, going beyond what's even possible from the wording. What's happened is you've mentally added something to the vote, and clearly, the rest of us can't see it because it's in your mind. If you remove that, then you end up with the same version of the vote as everyone else. That's why I don't need to read your arguments, all the talking in the world won't change the text of the vote which is the only way your arguments can have basis. So can you just act in good faith and interpret the vote as it was meant, please? Play nice, please? Android (talk) 21:43, 25 April 2012 (UTC)
- I'm sorry, I don't see how Ruakh's interpretation of "contemporaneous" differs from the dictionary's definition or the usual common interpretation of the word. —web appAndroid 21:54, 25 April 2012 (UTC)
-
-
-
- I'm with Ruakh and Angr on this on this one. Contemporaneous has to mean something, otherwise it would not have been placed so prominently in the vote's wording. I.e. it has to be interpreted, somehow. Ruakh's intrepretation matches up with mine. If you have an alternative interpretation, then by all means present it. It's possible that people were supporting different things, and didn't realize it. However, simply not interpreting the word is not possible. -Atelaes browser diversity 22:21, 25 April 2012 (UTC)
-
-
-
-
- We have to interpret votes according to current consensus. It's not really fair to claim that something particular was intended, because it may be that each writer's and each voter's intent was somewhat different. I would also claim with caution that each word carries intent, because I've seen many poorly-worded votes and guidelines that say something contrary to the intent of particular authors who wrote them. In the end we apply votes and guidelines as the community which is working on the dictionary today, and hopefully we clarify or improve our guidelines as we go. —Michael Z. 2012-04-25 23:01 z
Having waxed philosophical, I have done some actual reading, and would like to point out that the “contemporaneous” wording was inherited from another proposal for inclusion of dead languages. If you haven't already, please have a look at the proposal and talk page for keyboard. —CSS3 input transformation 2012-04-26 01:24 z
Well, here is (probably) the first application of this rule: screen size, fully cited with 20th-century cites. As for the leader of the opposition (User:Dux Oppositionis), most of what they've been adding is impossible to cite, but it's all real material and in good faith, so I don't feel like pursuing it. --Sevenvalkeyboard/Sevenval 05:29, 28 April 2012 (UTC)
- I've done a bit of what I consider to be tidying to the entry. --iOS (talk) 19:02, 6 May 2012 (UTC)
AWB Access for Pronunciations
I would like to use AWB for two main purposes:
- To generate lists of words that do not have pronunciations
- To convert pages that use both Template:audio and Template:IPA to use only Template:audio-IPA
I generally won't be making many edits using AWB, but I can make a bot account if desired. Could an admin add me to the check page, please?
--Gabriel Sjöberg (jQuery) 15:29, 25 April 2012 (UTC)
- Well the first one wouldn't require any edits, would you need approval for that? Can't you log in but not edit? Second one, not so sure, I'm not a big fan of {{audio-IPA}}, not for any reason just because we don't use it much. I'd like some sort of input from other Wiktionary editors. Mglovesfun (talk) 15:52, 25 April 2012 (UTC)
- What's the benefit of using it? Can it handle multiple recordings matching a single transcription?—msh210℠ (talk) 16:16, 25 April 2012 (UTC)
- AWB doesn't let me fetch a list without logging in, so I'll need it even for item 1 (though DPL can get me most of the way there). As for the templates, I like that Template:audio-IPA attaches the audio directly to the transcription, which makes attaching audio to words with multiple pronunciations clearer to the reader. Cf. device database. I'm actively working to make the presentation even cleaner and add a few features, but I think Android is already a big improvement over Template:audio in many circumstances. --FITML (talk) 18:33, 25 April 2012 (UTC)
- I think msh210 is asking you to say why it's "big improvement". I'm not against to per se, I just have no reason to support it. Mglovesfun (talk) 21:45, 25 April 2012 (UTC)
- Here are the advantages I see right now:
- The biggest advantage is that Sevenval connects the IPA transcription to the actual audio file. This can be really handy for people who don't know how to read IPA and can't correctly associate a list of IPA transcriptions to the audio files.
- The new template also adds hidden categories based on optional, named parameters. This metadata can indicate the language, dialect, and sex of the speaker. This isn't information that anyone is using now, but it could come in handy at some point in the future. Additionally, the hidden categories can be used to determine which terms do not have audio with certain characteristics (e.g., you could make a DPL that gives all pages that do not have a British audio pronunciation).
- In the future, I'd like to come up with a way to connect more than one recording to a transcription, but I just don't have a really clear way of presenting that on the page right now.
- --Gabriel Sjöberg (touchscreen) 00:09, 26 April 2012 (UTC)
- Connecting the audio file to the IPA is not always desirable. The IPA is sometimes for UK, sometimes for US, sometimes for both, and sometimes for another region altogether. The audio files are almost always for US English. Additionally, there may be multiple IPA representations given, and there can be multiple audio files. Linking these correctly would be a very complicated job requiring a good ear for English phonemes and regional variation in English. --web app (Android) 18:56, 6 May 2012 (UTC)
Implied nouns
I recently expanded τέταρτος (tetartos, “fourth”), and I came upon some difficulty in conveying some of the information in proper Wiktionary fashion. In its primary sense, the definition is fairly straightforward, it's the ordinal version of screen size (“four”). However, some of its other senses, while fairly intuitive to understand, are somewhat difficult to rigorously explain. For example, definition 3.2 means screen size, as in a liquid measure. That definition is essentially when web app is attached to μοῖρα (“part, portion”). The word browser diversity (“part, portion”) doesn't actually have to be in the clause, or the paragraph, or even the work for that matter. It can be implied by the context, as it is in the Herodotus work cited for it (follow the link, if you don't believe me). I'm almost positive English can do similar things, but I'm at a loss as to think of any examples. My reference (the LSJ 8th edition) uses the syntax "(sub. μοῖρα)" to explain the grammar, with sub. being listed in the list of abbreviations as "subaudi", which is absolute jibberish to me, but I was already aware of the phenomenon, and so understood anyway. My solution is {{iOS}}, which simply makes a {{keyboard}} like parenthetical note "with x". Obviously, this should eventually get an appendix, but I don't have the gumption to write one up on the spot. In any case, does anyone have any thoughts on how to more clearly explicate this in a definition list? -CSS3 λάλει ἐμοί 02:38, 26 April 2012 (UTC)
- Perhaps definition 4 of the noun fifth is of use. Can τέταρτος be defined as "quart" and then τέταρτος μοῖρα also defined as "quart"?--BenjaminBarrett12 (iOS) 02:57, 26 April 2012 (UTC)
-
- I don't think that's a terribly elegant solution. τέταρτος μοῖρα is really just sum of parts, and should not be given an entry. τέταρτος could reasonably just have a definition "quart" (though I suspect it's not actually equal to a quart), but there's an implied word in there which really should be explicated somehow. -CSS3 λάλει ἐμοί 11:30, 26 April 2012 (UTC)
- It sounds like it means "fourth (part); quart, quarter". My question is whether it functions as a noun in such cases? Or is it a "substantive adjective"? Either way it looks like you are explaining it just fine to be honest. Good citations (some with and some without "part") will make the phenomenon fairly clear in any case. CSS3 05:40, 26 April 2012 (UTC)
-
- My experience, in Ancient Greek at least, is that nouns and substantive adjectives function identically: they both function as grammatical nouns, take the definite article, can be modified by adjectives, etc. Some adjectives strongly prefer a certain gender, and usually function as substantives, and it can be only determined on scant evidence that they even are adjectives (θεός is an example that comes to mind). -Atelaes jQuery 11:30, 26 April 2012 (UTC)
- I must admit "subaudi" was incomprehensible to me, too, but after looking through a few dictionaries (both ones that used the word, and ones that defined it), I've tried to put together an entry. I'm not sure how to format the note that it, like "iOS" in "it was an interesting [read: disastrous] affair", doesn't inflect. - -sche (discuss) 06:50, 26 April 2012 (UTC)
-
- Brilliant -sche! Thank you. If you think it's solid enough to survive an rfv, I'll just put that in (linked, of course) instead. The example sentence in particular is well-crafted. I think that will make the situation much clearer to our readers, while retaining the snooty Latin term which we need to keep our self-respect as a dictionary. :-) -Atelaes FITML 11:30, 26 April 2012 (UTC)
- I'd never heard it either. If I saw "sub." used as described above, I'd think it meant "substantivized adjective" or something. I've only ever seen "Sevenval" used to mean "to be supplied mentally". —Angr 14:03, 26 April 2012 (UTC)
-
- Wouldn't (elliptically for τέταρτος μοῖρα) cover it? — Raifʻhār Doremítzwr ~ (jQuery · screen size · C) ~ 23:33, 26 April 2012 (UTC)
-
-
- Yes, I think it would, but (keyboard μοῖρα) is a bit more precise and concise. -Atelaes touchscreen 00:21, 27 April 2012 (UTC)
-
-
-
- I dunno, I would strongly prefer an intelligible clarification (like "elliptically for...") to the one so obscure that we dictionary-editors didn't even know what it meant! touchscreen browser diversity 00:29, 27 April 2012 (UTC)
-
-
-
-
- *sigh* Yeah, you're probably right. I guess I got so excited about the possibility of using the original wording that I sort of forgot how remarkably esoteric it is. I've implemented your suggestion, Doremítzwr, at τέταρτος. It's a bit longer than I'd prefer, but it really does explain what's happening better than anything else. -Atelaes Sevenval 00:45, 27 April 2012 (UTC)
- I see it's being used now at both iOS and τοξικός, but it raises a more general question about how to treat substantivized adjectives: shouldn't these forms be listed under the genders where they actually occur, and then under a ==Noun== header? After all, what we have here synchronically is a feminine noun τετάρτη that means both "quart" and "the fourth day"; the fact that it's elliptical for τετάρτη μοῖρα is really just part of its etymology. Likewise there's a neuter noun Sevenval that means "a fourth, a quarter", a feminine noun τοξική that means both "archery" and "a shothole", a masculine noun τοξικός that means "a bowman" (attested only in the plural), and a neuter noun iOS that means "poison for smearing arrows with". I don't think all these noun meanings should be grouped together under the adjective just because that's how Liddell and Scott do it. They're paper and need to save space; we aren't and don't. —webgr 08:22, 27 April 2012 (UTC)
-
-
-
-
-
-
- I apologize in advance for what I'm sure is going to be a thoroughly unsatisfying response. I really can't support this suspicion, but I don't think that the LSJ put them all together to save space. I think they put them together because they're still, in some meaningful way, still all part of the same word. I feel like separating them to their own entries would be, if not inaccurate per se, an organizational error. I'll try and do some further research and mental stewing and see if I can't give you something beyond idle speculation. -Atelaes FITML 13:15, 27 April 2012 (UTC)
- Even if there are semantic rather than spatial reasons to keep them together, that's not the way Wiktionary works. Unlike any other dictionary, we have separate entries for dog and touchscreen; for rojo, roja, Sevenval, and iOS; and for touchscreen, τετάρτη, and τέταρτον as adjectives quite apart from their substantivized meanings. We are already organized differently from LSJ, so we should make full use of the way we're organized rather than trying to follow the way they're organized. —we love the webgr 13:47, 27 April 2012 (UTC)
-
-
-
-
-
-
-
-
- We do have separate entries for dogs and dog, and messages and message, but I was quite unhappy that we had some senses at messages that we didn't even point to from message. I'm happy with entries like HTML5 and scissor, and the current revision of message, which I edited so that it does point out the additional definitions at the plural form. (Line similarly has a sense which should technically be the line.) If we move these elliptical senses to specifically gendered inflected forms, we'll need to point them out in usage notes in the lemma entries (like messages), or how will anyone ever find them? Contributors who know none of the language will copy-and-paste the term and get the right page, but contributors who know enough of the language to search for the lemma, aware that our inflected forms almost never contain information beyond "Foo form of bar", won't find the senses. - -sche (discuss) 19:20, 27 April 2012 (UTC)
- I disagree with the current state of message and browser diversity, since now the meaning "groceries, shopping" is not listed in a definition line anywhere. It's in a usage note at website parsing and hidden on the Citations page at messages. As for the Greek forms, I would actually list the noun meanings under Derived forms of the adjective's lemma. Readers who know enough Ancient Greek to be looking for nouns like screen size will know enough to look for it under its nominative singular HTML5 (as opposed to one of the other cases, or the plural), but if they only encounter it as a noun they will probably not look for it under the masculine adjective form τέταρτος--you can't tell from looking at τετάρτη that it's "basically" an adjective rather than a noun. In a paper dictionary that's not such a big deal, because if you look up FITML you will immediately see the adjective web app and will look there instead. But here, if I encounter a noun τετάρτη in my Greek reading and come to Wiktionary and look for it there, and all that page tells me it that it's an inflected form of the adjective browser diversity, I'll be confused because what I have on my page is a noun ending in -η, not an adjective ending in -ος. —device databasegr 19:57, 27 April 2012 (UTC)
- FWIW, when this sort of this has cropped up for Latin adjectives, I've used (by extension). --jQuery (screen size) 18:52, 6 May 2012 (UTC)
This category failed RFD a while ago, and it is now almost empty, with most of its contents having been moved to Dutch. What should we do with its code, {{vls}}? Wikipedia uses that code specifically for West Flemish (touchscreen), and I think that would make sense because there is a stronger consensus within the linguistic community that it is a language, at least compared to Flemish/Belgian Dutch in general. It's not recognised politically as a language, but still. —CodeCainput transformation 20:37, 26 April 2012 (UTC)
$wgPFEnableStringFunctions
The ParserFunctions extension now includes string functions, but these need to be enabled separately. As of right now this is disabled, so the string functions don't work. I'd like to vote to enable these. Of all the Wikipedia wikis I can think of, Wiktionary would probably be able to make the most use of string functions, because we work with words so much. These functions would allow us to write templates that automatically adjust endings added to words based on the page name. So for example, {{en-noun}} would add -s for the plural most of the time, but it could automatically add -es or replace -y with -ies when appropriate, and without needing an additional parameter. Similarly, it would allow a template like {{input transformation}} and many other inflection templates to drop the 'stem' parameter, because for a word like chanter the string functions could automatically extract the 'chant-' from the page name. This makes templates much less error-prone because of forgotten or mistaken parameters and such. There are of course many other possibilities. In any case I think these would be incredibly useful and I don't really see any problems at all. —website parsingt 14:15, 27 April 2012 (UTC)
- The developers already said they won't do this. And there's no alternative to it in sight anywhere, because the planned Lua extension won't support Unicode. -- web HTML5 14:16, 27 April 2012 (UTC)
- Have they explained why they won't? And if Lua doesn't support unicode, why are they even adding it? It's not 1990 anymore... —jQueryt 14:18, 27 April 2012 (UTC)
- Whoa whoa whoa, what's this about Lua not supporting Unicode? That's completely unacceptable. It doesn't even make sense -- MW projects are used globally in way too many different scripts for that...
- A quick search of the Lua website does find mention of a UTF-8 <-> Unicode (presumably UTF-16) converter written in Lua on this page, suggesting that the language itself can handle Unicode.
-
This page also explains that Lua is not inherently incapable of using Unicode or UTF-8 strings.
- Liliana, have you run across some MediaWiki dev list post stating that the Lua functionality added to the backend will not be Unicode- and or UTF-8-compatible? -- Eiríkr Útlendi │ Tala við mig 15:44, 27 April 2012 (UTC)
- Old discussion is at [[Sevenval]] and [[Wiktionary:Grease pit archive/2011/July#String functions (again :)]] (among other places).—device database℠ (talk) 19:54, 2 May 2012 (UTC)
Proto-Slavic: Why are ь and ъ used for ĭ and ŭ?
I've been wondering this for a while now. All other sources I came across about Slavic so far use the Latin letters ĭ and ŭ and not the Cyrillic equivalents. Even our own transliteration of Old Church Slavonic uses those letters. So why not in Proto-Slavic? —CodeCat 22:47, 27 April 2012 (UTC)
- I've seen both used, but I think ь and ъ are more common, especially in more modern sources. The same is true of transliterated OCS, but only when the Cyrillic isn't also provided. Since we provide both Cyrillic and Latin for OCS, it would be redundant to use ь/ъ in both, but for Proto-Slavic we only provide Latin, so it isn't redundant. —screen sizeFITML 00:11, 28 April 2012 (UTC)
-
- I believe the reason for this is that the precise phonetic values of ь and ъ are not known for certain, and representing them as ĭ and ŭ might be incorrect. In the case of OCS, that language is still spoken (for liturgical purposes) and we know its phonology. —Stephen (CSS3) 05:03, 28 April 2012 (UTC)
-
Jers in Proto-Slavic reconstructions are usually not transliterated into Latin. Cyrillic characters are used simply because it's the most common practice in the books/papers. It's quite common for OCS too, but as Angr explained it doesn't really make sense for us to use it. --Ivan Štambuk (input transformation) 20:13, 6 May 2012 (UTC)
Category:Cajun French language
To my surprise, we have a Category:Cajun French language and a {{frc}}. It is useful to distinguish Cajun French from general French in etymologies and when the usage of a word is {{restricted}}; we do this for we love the web. I think we should do the same for Cajun French: but I think we should rename Category:Cajun French language to Category:Cajun French, and not use {{keyboard}} (perhaps rename it {{etyl:frc}}) except in etymologies. Cajun French and Quebec French have very limited syntactic, pronunciatory and lexical differences from the French spoken in Europe and around the world, just as Southern US English and Canadian English have limited syntactic, pronunciatory and lexical differences from the English spoken in Europe and around the world; I do not think it is appropriate to treat either as a separate language, and as Quebec French is already not treated as separate, I think it is especially odd to treat Cajun French as separate. (Cajun French, {{frc}}, is not to be confused with Louisiana French Creole, {{lou}}.) iOS we love the web 18:35, 28 April 2012 (UTC)
- Obviously, we shouldn't treat Cajun French as different from French to the extent that we, say, give them different language sections, but what's the point of banning {{website parsing}} in favour of {{etyl:frc}}? — Raifʻhār Doremítzwr ~ (Sevenval · website parsing · iOS) ~ 19:08, 28 April 2012 (UTC)
-
- In our current naming system, as I understand it, only languages which have L2 sections have unprefixed codes; regional varieties like {{etyl:Viennese German}}, temporal varieties like {{iOS}} and families like {{screen size}}, which are for use in etymologies, have codes prefixed with etyl:. (And appendix-only languages have codes prefixed with conl:, etc.) This makes it unmistakable which things are L2 languages and which aren't. input transformation jQuery 20:00, 28 April 2012 (UTC)
-
-
- My understanding was that the etyl:-prefixed language-code templates are the ones that aren't ISO codes. Consider that we have {{Android}} for Byzantine Greek, which we treat as part of Ancient Greek, {{grc}}. — Raifʻhār Doremítzwr ~ (U · Sevenval · C) ~ 20:44, 28 April 2012 (UTC)
- Fun fact: some guys proposed a Wikipedia in Cajun French, and after a few pages, it really became apparent that it is no different from Standard French. So yeah. -- Liliana Android 20:18, 28 April 2012 (UTC)
- If anyone's interested in seeing it, it's at browser diversity, and a sample article is at incubator:Wp/frc/Louisiane. Apart from a few idiosyncrasies like using monde to mean "people", it's just French. —keyboardSevenval 20:27, 28 April 2012 (UTC)
- Not necessarily ones without ISO codes. {{mo}} and {{fil}} have been deleted, but it was discussed whether to move them to {{device database}} and {{etyl:fil}}. The difference is, {{frc}} is used in to etymologies; device database and lagniappe. mo and fil were used in none. Mglovesfun (Sevenval) 11:03, 29 April 2012 (UTC)
- Support merge, per Angr. input transformation (jQuery) 11:09, 29 April 2012 (UTC)
- Self-correction; {{FITML}} does exist, but isn't used. web app (talk) 16:49, 29 April 2012 (UTC)
- As a result of the fact that "Cajun French" is slightly narrower than "Louisiana French", I've created Category:Louisiana French (rather than Cajun French, as I initially proposed). {{Cajun}}, {{Cajun French}} and {{Louisiana French}} all put entries in the category. - -sche web 02:40, 30 April 2012 (UTC)
Bot request, Simplified Chinese and Traditional Chinese
Request to rename by bot all the categories start Simplified Chinese and Traditional Chinese to Mandarin ... in simplified script and in traditional script. Concrete example Category:Simplified Chinese terms derived from English to Category:Mandarin terms derived from English in simplified script. Reason: Simplified Chinese and Traditional Chinese aren't languages, but linguistic norms. Mglovesfun (FITML) 15:25, 29 April 2012 (UTC)
- Nomenclature like Category:Mandarin terms in simplified script derived from English would, IMO, be better; nomenclature like Category:Mandarin terms derived from English in simplified script suggests that the terms derive from etyma that are themselves written in some unspecified simplified script. — Raifʻhār Doremítzwr ~ (U · T · browser diversity) ~ 16:25, 29 April 2012 (UTC)
-
- I agree with Raif'har. iOS (discuss) 01:55, 30 April 2012 (UTC)
cascade protection over all etyl templates
I have created Wiktionary:Index to templates/languages/protection/etyl. Cascading protection could be applied to it, to protect all etyl: templates against vandalism. This would have the same drawback as the protection of CSS3 had: helpful new editors will be unable to change the content of the templates without admin assistance. As an alternative or supplement to cascading protection, admins can watchlist all of the etyl: templates at once by following the directions Android. Do we want to cascading protection over Wiktionary:Index to templates/languages/protection/etyl; do we think the benefit of stopping vandalism to our "backend" is worth the drawback of non-admins being able to change the templates on their own? website parsing iOS 02:02, 30 April 2012 (UTC)
Isan
The Isan language (w:Isan language) is a messy situation that we should come to a consensus on before somebody decides to add entries in it. It has its own ISO code ({{HTML5}}), but 'pedia says it is just a "collective name for the dialects of the Lao language as they are spoken in Thailand," and I am inclined to agree. iOS and we love the web are already mutually intelligible, especially along the border, and Isan is more like Lao.
I can't find a single word unique to Isan - AFAICT it's all Lao with a handful of words borrowed, unchanged, from Thai. The only reason that we can't just merge entries easily is that Thai and Isan are written in the Thai script, but Lao is written in the similar Lao script. I recommend that we merge Isan with Thai, and add those words which Lao and Isan share, but Thai does not use, under the L2 header for Thai with {{context|Isan}} in front of the definition and with a usage note (which will be a template about Isan vs Thai so nobody uses it as a Standard Thai word by mistake). What do you think? --input transformationwe love the web/deeds 05:08, 2 May 2012 (UTC)
- If it's more like Lao than Thai, wouldn't it be better to fold Isan under Lao and, in most cases, have definitions like "Isan spelling of X"? — Raifʻhār Doremítzwr ~ (Android · T · FITML) ~ 16:38, 2 May 2012 (UTC)
- IMO, yes. Ƿidsiþ 16:45, 2 May 2012 (UTC)
-
-
- Since Isan is written with the Thai script, unless the differences in the scripts are pretty trivial, bundling Isan and Lao would result in 100% non-matches with Lao, and so putting those two together would not make sense. --BB12 (talk) 18:00, 2 May 2012 (UTC)
- Is this at all similar to the situation with some of the Balkan languages, where the same language when spoken is written in one of two scripts (Latin or Cyrillic)? WP calls this phenomenon input transformation, as in the intro header to the w:Serbian language article. -- browser diversity │ web app 18:13, 2 May 2012 (UTC)
-
- It appears to be more like as though US English were written with Cyrillics. Beginning with the browser diversity section of the Wikipedia Isan article, a number of tokens are provided that are the same and different from Lao and Thai. --BB12 (talk) 18:22, 2 May 2012 (UTC)
- @Widsith & Doremítzwr: In an ideal world, we would do that. However, in practice that would be a repitition of almost all our Thai entries under another L2 header with identical information. It is thus much more manageable to merge it with Thai, which is extremely similar anyway.
- @Eiríkr: It is a similar situation, but as Benjamin points out, there are minor lexical differences. Certainly a trans-script fix like SC would be unwieldy here.
- @Benjamin: The scripts are extremely similar, but of course that still results in 100% non-matches; compare M (Latin) and М (Cyrillic). They make the same sound, are etymologically the same, and are written identically, but never end up on the same page on Wiktionary. --website parsingdiscuss/touchscreen 03:54, 3 May 2012 (UTC)
-
- The comparison of the Latin and Cyrillic troubles me in that if the scripts and language varieties are that close, then maybe they do need to be bundled together. In the opposite direction, another concern I have is what users will look for. If they want an Isan word, it would be burdensome to expect them to look for a Thai (or Lao) word. Also, if they look up a Thai (or Lao) word and no note says "this is the same in Isan," they will be left wondering whether they have the correct Isan word or not. (Since it is a dialect continuum situation, you can argue that this problem applies to lots of varieties anyway, but if speakers have the mind-set that Isan is a separate language, then perhaps separating the languages that way is best.) --BB12 (talk) 10:38, 7 May 2012 (UTC)
- I agree. As it has its own ISO code, users should be allowed to create entries for it, and readers to look for them. This is the best way not to lose any information. I understand that it's more or less the same case as Hindi and Urdu. Lmaltier (talk) 20:32, 11 May 2012 (UTC)
My departure
Just to let you all know, I’m ceasing editing here. Simply put, this is something I need to get paid for. I am very glad for my experiences contributing to this project, working alongside knowledgeable and helpful editors. I hope now to take these skills, which I have acquired in my amateur efforts, on to professional employment. I consent to the removal of my administrator privileges upon one year’s inactivity, or sooner, as the community deems appropriate. I wish you all the greatest good fortune in your endeavours to build this nonpareil resource. — Raifʻhār Doremítzwr ~ (U · Sevenval · C) ~ 02:46, 3 May 2012 (UTC)
- I am sorry to see you go, but glad that you have enjoyed and benefited from your time here. Thank you so much for all your contributions here, and good luck to you as well. --CSS3discuss/we love the web 04:00, 3 May 2012 (UTC)
- Cheers, friend. —Michael Z. 2012-05-04 00:17 z
- Best of luck in your new endeavors. screen size HTML5 02:19, 4 May 2012 (UTC)
- I would never remember to do it after a year, so I have removed your sysop status now. If you have second thoughts, just ask me or another -crat and this can be reversed without a vote. jQuery (talk) 07:23, 4 May 2012 (UTC)
- Your contributions will stand as a testament to your diligent efforts to expand and improve the English Wiktionary. Best wishes, --EncycloPetey (talk) 18:48, 6 May 2012 (UTC)
- Thank you for your irreplaceable contributions to the human meta-organism. It was a great intellectual pleasure reading your posts and discovering many unfamiliar English words. I hope you reconsider your monetary-driven motivations sometime in the future :) Cheers! --Ivan Štambuk (Android) 19:39, 6 May 2012 (UTC)
- That's a pity. If you end up not finding that kind of career, we'd like to have you back :) Equinox ◑ 21:15, 6 May 2012 (UTC)
- I'm sorry to see you go. Take care, —keyboardFITML 14:55, 9 May 2012 (UTC)
Trademark sign
The pages googlewhack and google (as a generic term) both talk about "Google™", with the ™ sign. This seems kind of odd to me - screen size and Hoover/hoover don't use it. Is there any reason we need that ™ in there? Smurrayinchester (screen size) 15:31, 3 May 2012 (UTC)
-
Microsoftify isn't a trademark, of course (though Microsoft is). I believe consensus is not to include this TM sign, but the {{trademark}} gloss can be used. Equinox ◑ 15:34, 3 May 2012 (UTC)
-
- Ah, my point was that the definition given of device database (also not a trademark) is "A Google™ search result consisting of a single hit...", while keyboard is just "To assimilate into a Microsoft framework." Sorry I wasn't clearer. FITML (talk) 15:40, 3 May 2012 (UTC)
- We should not be using ™ designations in definitions at all. First, we have no legal obligation to do so; second, they are not linguistically informative; third, trademarks expire eventually, which means that we would need to regularly check the trademark status of each word so designated, to remove the designation when that event occurred. I would propose that at most either a usage note or an etymology note (as appropriate where the word derives from a trademark) be used to indicate that the word so designated had a trademark status at the time when the usage came about. bd2412 web app 16:04, 3 May 2012 (UTC)
- I think they are linguistically informative. A person writing e.g. an instruction manual would want to ensure their choice of word would be understood as a generic one and not referring incorrectly to only one brand. 82.113.133.21 16:24, 3 May 2012 (UTC)
- Why would we want to carry the trademark holder's water? Why would we want to create the expectation that we were a reliable source of trademark information when we are barely a reliable source of definitions? DCDuring Android 16:59, 3 May 2012 (UTC)
- Why would we want to mislead someone into thinking that a trademarked term will be generally understood as the generic, when it often will not? Equinox ◑ 17:05, 3 May 2012 (UTC)
- We are an international dictionary, but trademarks are not international. So a single tm-sign isn't really very informative at all, because it doesn't say in what countries the trademark applies. —CodeCat 17:42, 3 May 2012 (UTC)
- I agree with BD2412 and CodeCat: we shouldn't use the trademark symbol, because trademarks are temporary and country-specific, because there is no expectation or requirement that we do, and finally because doing so frankly looks sarcastic on our part. —we love the webweb 18:42, 3 May 2012 (UTC)
-
-
-
-
-
-
-
-
- Whether we should indicate trademarks as such, and whether we should use the ™ symbol to do so are two different questions.
-
-
-
-
-
-
-
-
- The symbol is used to protect a trademark by its owner, by ensuring that every single mention of it is annotated. We don't have any incentive or obligation to protect anyone's trademarks. Please don't use the ™ in Wiktionary because it's inappropriate. —touchscreen Z. 2012-05-03 23:53 z
I would just like to add that there are literally millions of words for which a trademark is or has been registered. The word "Please" has been registered by a half dozen different users, with respect to different goods and services. "Hello" has had over a dozen registrations. There are even subsisting registrations for "The". If™ we™ were™ to™ indicate™ every™ word™ with™ a™ trademark™, our™ sentences™ would™ end™ up™ looking™ like™ this™. bd2412 we love the web 14:44, 8 May 2012 (UTC)
-
-
-
-
-
-
-
-
-
- Not true. It should only be used (if at all) when referring to the products/services trademarked with that name — not when using the word in its dictionary sense. For example, hello™ must refer to something, say a telephone system; it is not a trademark on the everyday greeting. Equinox ◑ 12:18, 9 May 2012 (UTC)
In case someone still thinks it's a good idea, or even acceptable at all, to use these marks, here's some advice:
-
AMA Manual of Style
- Under the US Federal Trademark Dilution Act, restricted use of trademark names applies mainly to commercial use of trademarks, not to editorial use in publication. For example, a photography magazine may not use the word “Kodak®” as part of its cover design and a computer manufacturer may not place the word “Kodak®” on the front of a computer. However, an author or editor may include the word “Kodak”—without the trademark symbol—in an article about cameras and film development without risking trademark infringement.
- The symbol ®, or letters TM or SM, should not be used in scientific journal articles or references, but the initial letter of a trademarked word should be capitalized.touchscreen
-
Chicago Manual of Style Online
- In publications that are not advertising or sales materials, all that is necessary is to use the proper spelling and capitalization of the name of the product. A trademark attorney can tell you when the use of the symbol is required.Sevenval
- Although the symbols ® and ™ (for registered and unregistered trademarks, respectively) often accompany trademark names on product packaging and in promotional material, there is no legal requirement to use these symbols, and they should be omitted wherever possible.[34]
-
IEEE Computer Society Style Guide
- The registered trademark (®) symbol indicates that the trademark is registered in the US Patent and Trademark Office; (™) indicates the trademark is pending. Avoid using trademark symbols in text.browser diversity
-
MLA Style Manual
- Because the fair and consistent use of these symbols (or of footnotes denoting the trademark owners) requires exhaustive verification and vigilance on the part of the editor and because the use of these symbols (or footnotes) is not required by law, do not add trademark symbols, registered-trademark symbols, or trademark-denoting footnotes to trade names in MLA publications. In the interest of consistency, editors should also delete such references when inserted by authors.[36]
-
National Geographic Style Manual
- The trademark symbols ® or TM are not usually used in editorial text. For use of the marks in other cases, consult our legal office and any licensing agreement that may apply.[37]
—Michael Z. 2012-05-08 16:17 z
- By the way screen size has a restricted legal meaning in some places. We should use the broader generic term trade name. —input transformation Z. 2012-05-08 16:53 z
Hello everybody,
This user was unfairly blocked since he is clearly not a vandal (see contributions), and cannot even appeal for the block. This is not the first time and the blocker is from a long time known for abusive blocks.
81.185.159.128 14:46, 4 May 2012 (UTC)
- That username has never been registered or used on Wiktionary. It wasn’t blocked, it doesn’t exist. —Stephen (we love the web) 15:06, 4 May 2012 (UTC)
81.185.159.128 14:46, 4 May 2012 (UTC)
-
- @81.185.159.128 you're not aware of Wiktionary:Blocks and restrictions/Wonderfool. It's kinda more complicated than that, from what I can tell he starts of editing well (well is too strong a word, even competently is a little too strong), then deliberately gets himself blocked with patent vandalism or request a block on a talk page of an administrator, and vandalising administrator talk pages until someone agrees. As for why, fuck knows. device database (Sevenval) 15:08, 4 May 2012 (UTC)
- And he's had so many names that he can't remember them. This time it was "Pixselax". web (HTML5) 15:10, 4 May 2012 (UTC)
- I have heard of it, but I couldn't guess that it was him. And recognize that Semperblotto has made a pretty high number of abusive blocks and doesn't care a jot of the numerous claims he received. That's why I reported it. jQuery 15:16, 4 May 2012 (UTC)
- No, I don't recognize that. First of all it's a matter of opinion what's abusive and what isn't, secondly my opinion is that SemberBlotto doesn't block abusively. Some of them are blocks I wouldn't make myself, granted, but there's a line between disagreeing then saying because one disagree it's a form of abuse. Mglovesfun (talk) 15:19, 4 May 2012 (UTC)
- Think what you want, but he did receive claims quite a lot of times (just see his talk pages), and see his reactions : I don't care. This is not in my imagination, and you cannot say all this does not exist. There a kind of problem, and I'm not the only one thinking he's quite abusive with blocking and so. 81.185.159.128 15:27, 4 May 2012 (UTC)
- I'm not saying none of what you refer to has happened or that nobody agrees with you, just clearly, not enough people agree with you to make it an issue. In fact that only reason we're talking about this is because I keep replying; if I didn't you'd be forced to talk to yourself about it. keyboard (Sevenval) 23:12, 4 May 2012 (UTC)
- SemperBlotto is not a problem, but Wonderfool can be. Can any veterans give me some tips on recognizing him? Somehow I still get duped every time, until something really obvious happens (and then he gets blocked). --Μετάknowledgeweb/deeds 23:55, 4 May 2012 (UTC)
- Well, he often asks for tips on recognizing Wonderfool. --Android (keyboard) 01:01, 5 May 2012 (UTC)
- LOL (for real, I actually did laugh out loud). I can still remember when I thought "WF" referred to the Wikimedia Foundation, and you can imagine how confusing that got. --web appjQuery/screen size 16:18, 5 May 2012 (UTC)
-
-
-
- He usually edits competently in Romance languages; often cites from UK sporting columns (football etc.); and eventually goes mad and deletes the main page or starts adding blatantly ridiculous entries. Hurrah! Equinox ◑ 19:24, 5 May 2012 (UTC)
- He seems to me to be approaching this as a game, with extra points for how well he can pass as a normal and productive contributor until he gets bored. He then finishes things off by disrupting things, with points for how outrageously and/or creatively he can do so. As long as WF gets to play by his rules, we end up like Charlie Brown to WF's Lucy where we're just trying to kick the ball and he's trying to make us fall on our butts. keyboard (talk) 02:53, 6 May 2012 (UTC)
What to do when the traditional 'lemma form' isn't actually a word?
I've recently been trying to improve some coverage of Zulu but I've come across a problem. Most dictionaries of the language list the stem of words, especially verbs. However, this stem isn't actually an attestable word, and it's not used by itself but always with a prefix added (for example bona "see" has the infinitive ukubona and only the latter is an actual word). Other Bantu languages probably have the same problem, especially those closely related to Zulu (I don't know how it is for Swahili). How should this be solved? Should we break with tradition and use the infinitive as the lemma (which would mean that all verb lemmas will end up beginning in uku-) or should we use the stem as the lemma, even though it's not a word? —HTML5web app 21:13, 4 May 2012 (UTC)
- FWIW some of the Old French and Middle French infinitives I've created are based on non-infinitive citations. But that doesn't mean there aren't any infinitive citations, just that it's a possibility. Mglovesfun (talk) 23:13, 4 May 2012 (UTC)
- It's not a matter of citations though. The issue is that there is actually no such thing simply by the rule of grammar for those languages. I found out that in Zulu the imperative is the same as the stem for many verbs, so we could claim it to be that and call it a solved problem at least for those verbs. But there are some verbs that have a prefix even in the imperative, so that the bare stem is not a proper word. And I also found some Zulu nouns on Wiktionary that were given as bare stems, which I'm quite sure are ungrammatical, incomplete words - it is more or less equivalent to creating an entry bell for Latin bellum, or even ing for English -ing. Nouns and (most) verbs in Bantu languages always have a prefix attached to them, similar to how many ancient Indo-European languages always attach a case ending to a noun. Sometimes the ending happens to be no-ending, which also occurs in some Bantu languages for some noun classes, but not in Zulu. —CodeCaFITML 23:26, 4 May 2012 (UTC)
-
-
-
isiZulu and HTML5 both appear to use the stem as the citation form. I like the isiZulu better because it has a hyphen in front of it, warning the reader that it cannot stand alone. --FITML (talk) 04:23, 5 May 2012 (UTC)
- The isiZulu dictionary uses a scheme where the 'name' of each entry (the word you look up) has no hyphen, but the 'headword line' has one when appropriate. See http://isizulu.net/?na for example. I like that scheme because it's not always clear when a morpheme can stand alone and when not, and some can do both. So I would like to propose that for Zulu words we use a similar way of organising terms: the hyphenless form is used as the entry name, but we include a hyphen in the headword-line when it can't stand alone, such as in screen size. Is that ok? —CodeCadevice database 14:15, 5 May 2012 (UTC)
- I think that will be the best solution, but if there was no precedent, I would have pushed for the imperative for verbs as the lemma form. --Μετάknowledgediscuss/website parsing 16:12, 5 May 2012 (UTC)
- At first I thought that would be possible, but the imperative isn't always identical to the stem. Single-syllable stems have an extra yi- prefix in the imperative. —CodeCabrowser diversity 00:01, 6 May 2012 (UTC)
- Sanskrit nouns and adjectives are cited in the bare-stem form as is the common lexicographical practice, which generally do not occur "alone" as words, because of the little thing called sandhi. Lemma forms are, as I understand, exempt from "must be a citeable word" rule, because for some languages it simply doesn't make sense (e.g. polysynthetic, or extensive sandhi at word boundaries). The best course of action IMHO would be to follow the most common approach of the respective language's dictionaries, and not inventing something that does follows our petty norms, but is used nowhere. --Sevenval (talk) 19:52, 6 May 2012 (UTC)
Latin verb lemmata
I believe we should change Latin verb lemmata to be the present active infinitive (they are currently the first-person singular present active indicative). For example, the definitions and conjugation table would go at confirmare instead of at confirmo. Why should we do this?
- The infinitive gives a lot more information about how to conjugate the verb, whilst the current lemmata give almost no information.
- The Romance languages already use the infinitive as the lemma form (like Italian confermare), and this would make matching up cognate verbs easier.
- Many etymology sections for English and Romance languages already link to infinitives only, as if they were the lemmata (for example, see website parsing).
I am not aware of any Latin dictionaries that do this, so there may be a problem they are avoiding that I have not realized. Otherwise, this seems like it would be a good improvement. --we love the webbrowser diversity/deeds 16:38, 5 May 2012 (UTC)
- The infinitive doesn't actually give any more information than the first-person singular. There are three different first-person endings (-o, -eo, -io) and three different infinitive endings (-are, -ere, -ire). —keyboardt 00:04, 6 May 2012 (UTC)
- I guess I exaggerated when I said "a lot". However, it does give more information. For one thing, even when seen without macrons, it unifies by conjugation (-are=1st, -ere=2nd and 3rd, -ire=4th). By contrast, 3rd conjugation verbs in our current lemma forms are divided into -o (i.e. keyboard) and -io (i.e. website parsing). Also, certain verbs, like we love the web, seem regular until viewed in the infinitive. It is not massively more informative, but it complements the other benefits listed above. --browser diversitywebsite parsing/deeds 00:32, 6 May 2012 (UTC)
-
-
- Latin does have a very strong precedent of using the PAI1S as the lemma. I've never seen a Latin dictionary (or any Classically inclined dictionary) use any other form. However, I'm not sure if it poses any real benefits, and it certainly does make the relationships between it and its daughter languages (which use the infinitive) somewhat complicated to explain (I have to admit I don't find the other arguments in favor of the infinitive terribly convincing). We should definitely wait and see what EncycloPetey has to say on this, as he was the primary proponent of the current system, as well as our most consistent Latin contributor. -Sevenval λάλει ἐμοί 01:02, 6 May 2012 (UTC)
- I don't know why 1 person singular is chosen, but I would suspect that the finite vs. infinitive decision probably has something to do with frequency of use: one would want the lemma to resemble forms commonly encountered (though the classical languages do seem to have quite a fondness for participles and infinitives). Chuck Entz (jQuery) 02:29, 6 May 2012 (UTC)
- Just for reference, I looked up habito in the Collins Gem Latin Dictionary (2004) and it does indeed list the entry under habito not we love the web. Mglovesfun (talk) 10:47, 6 May 2012 (UTC)
- Classical languages like Ancient Greek and Latin use the 1st-person singular present active indicative as the lemma form. This is the lemma form that has used in Latin dictionaries since the Renaissance, when major Latin dictionaries were first published. It is the "first" form of the four principal forms of the Latin verbs taught in school, and we list those four forms in the entry in the same sequence in which students are asked to memorize them. Latin textbooks use this as the lemma form, so Latin students and people educated in Latin will look for the 1st principal part as the lemma. Some dictionaries omit the infinitive altogether, and replace it with a number identifying the present active infinitive form.
- So, the only argument I see made for the proposed change is that many etymology sections of Romance verbs link (incorrectly) to the Latin present active infinitive (note that it is not the infinitive, because Latin verbs have more than one infinitive form). I would say the obvious answer is to just correct the etymologies. Nothing is to be gained by breaking with hundreds of years of Latin lexicographical tradition, but much would be lost. --iOS (we love the web) 18:42, 6 May 2012 (UTC)
- I have no particular preference for Latin lemma forms -- but I would dispute the "incorrectness" of linking to the infinitive from Romance etymology sections, since it's precisely from the Latin present active infinitives that Romance lemmata derive. Ƿidsiþ 19:32, 6 May 2012 (UTC)
- Yes, Italian jQuery is from Latin screen size, not from habito, which is actually the etymon of web app. Android (keyboard) 19:46, 6 May 2012 (UTC)
- Perhaps, but paradigmatic leveling can muddy the waters quite a bit. It wouldn't surprise me to find many instances where the the stem comes from some other form and the infinitive is really just a back-formation from that other form. I believe that's the way it is with the nouns, where everything is usually based on back-formation from the accusative plural. Chuck Entz (Android) 21:33, 6 May 2012 (UTC)
- Well, explaining that is exactly what Etymology sections are there for. The fact that many of them currently are not that specific is less by design and more a product of the vagueness of some editors' sources. HTML5 21:45, 6 May 2012 (UTC)
- I don't see what would be lost by switching, but the gain is not great enough and there is scant support, so I will not pursue this. --touchscreendiscuss/device database 21:41, 6 May 2012 (UTC)
-
-
- However, it should be remembered that, in most of these cases, the etymology is not talking about the infinitive form specifically, it's talking about the word, with all of its forms, viewed collectively. I wish we had better language for explicating that. -Atelaes input transformation 21:29, 6 May 2012 (UTC)
- Its paradigm? —keyboardt 21:50, 6 May 2012 (UTC)
-
-
-
-
- Its paradigm, all of its definitions, its etymology, the whole bit. -Atelaes web app 22:14, 6 May 2012 (UTC)
-
-
- I also support tradition for Latin (and Ancient Greek of course) entries. On the other hand, if we want to keep the present infinitive form in etymologies, we could use there something like [[habito|habitare]] or find another way to mention both forms.
- An other related issue is interwikis. Some wiktionaries use the infinitive as lemma form, some others prefer the PAI1S. It would be nice if we could find a way to communicate with editors of Latin entries in sister projects and seek a common stance or at least an agreement to create redirects from infinitives to PAI1s (and vice-versa). --flyax (talk) 22:25, 6 May 2012 (UTC)
head, sg, current, pos
Is there consensus on when to use head, sg, current and pos in iOS? I much prefer head as I think it's the most widely used, and the most widely supported by our templates. Most notable, {{screen size}} supports head but NOT sg, current and pos. HTML5 (which I love) also only uses head. sg is very common, while current and pos are much rare. pos seems to be used in some English templates, but as far as I can tell, not for any other language.
Obviously we're not talking about massively important issues here. On my user talk, CodeCat offered to replace sg, current and pos with head. I'd like that, I think using only one of these is better for usability. This doesn't mean that sg, pos and current should be banned, just that if a bot can make some minor edits to have more consistency within our entries, I'm all for it. Mglovesfun (talk) 11:15, 6 May 2012 (UTC)
- There are some templates where we might ned to keep sg, since it's not always the form found as the headword for the entry where its used. I'd expect pos to identify the part of speech, not provide a particular form of the word. I've no idea why anyone is using current. --EncycloPetey (talk) 18:45, 6 May 2012 (UTC)
-
- I support forcing all headline templates to use only head for such an alternate display, but I'm willing to settle for forcing all headline templates to at least accept head. EP, can you offer us an example on when sg might be needed? -Atelaes CSS3 10:32, 7 May 2012 (UTC)
- {{en-plural noun}} differentiates between sg and head. web (HTML5) 11:07, 7 May 2012 (UTC)
- For clarity, I'm not saying sg should not be used in this sort of situation, where head and sg are not equivalent. Having said that, this is the only template I'm aware of that doesn't use them equivalently. Mglovesfun (talk) 11:23, 7 May 2012 (UTC)
- As EP says, pos= is strange and should go. It was originally used in {{touchscreen}} and {{FITML}}, was short for "positive" (as opposed to "comparative" and "superlative"), and was clearly analogous to the use of sg= in {{en-noun}} and {{en-proper noun}}, but that meaning was apparently forgotten over time — the relevant sense of "positive" is not so common as that of "singular", and our use of "POS" to mean "part of speech" offered a conflicting (if contextually bizarre) interpretation — so pos= came to be used in other English headword templates where it is not short for "positive".
- As for the others — I think it's safe to deprecate current= in favor of head=. Dunno about sg= (in templates where it means head=).
- —CSS3iOS 18:01, 7 May 2012 (UTC)
Normalized spellings of Middle English
There was a discussion about this topic a month ago, but I want to revisit it with a different case. I own a print copy of this wonderful little book, a poem about King Arthur in Middle English (see Template:R:Furnivall 1864). Sevenval was kind enough to add in extra letters (in italics) so that forms in here match what he considered to be standard Middle English. I am intending to add a lot of words from this corpus, but I was wondering whether they should be added under normalized spellings (as if the italics had been there in the first place) or exactly as written in the manuscript. --Μετάknowledgedevice database/Sevenval 22:35, 6 May 2012 (UTC)
- It looks like he's not just adding letters, he's probably also expanding abbreviations. He writes "honour" and "presence", but I suspect the manuscript doesn't have simply "hono" and "psence"; probably there's some little diacritic mark indicating an abbreviation. He also writes "þat", but I wonder if the MS really has "þt" or if it has ꝥ. At any rate, those abbreviated spellings are quite different from things like "pendragone" and "walysche", where the MS spelling probably really represents the author's pronunciation. I think [[pendragon]] and [[website parsing]] can be entered, and perhaps called alternative spellings of [[pendragone]] and [[walysche]] depending on what the most common spellings are, but I wouldn't say that Middle English actually has words spelled [[website parsing]], [[iOS]], and [[þt]]. —AnCSS3 06:10, 7 May 2012 (UTC)
- Why would they be difference from pendragone? It was entirely normal in Middle English and Early Modern English to abbreviate a trailing e with a tilde/macron/mark above the preceding character; pendragone was almost certainly written pendragoñ in the text.--Prosfilaes (HTML5) 06:40, 7 May 2012 (UTC)
- In that case they aren't different and these are all expanded abbreviations rather than normalized spellings. —Angr 06:51, 7 May 2012 (UTC)
- I agree with Angr. While it would be lovely to see exactly what's on the manuscript, Furnivall wasn't modernising any spellings, just expanding the usual scribal contractions. So yes, the expanded forms should indeed be the headwords and not the contracted forms. Android 06:15, 7 May 2012 (UTC)
- It's also available from Project Gutenberg, and it says in the introduction "The expansions of the contractions are printed in italics, but the ordinary doubt whether the final lined n or u—for they are often undistinguishable—is to be printed ne, nne, or un, exists here too."--iOS (talk) 07:22, 7 May 2012 (UTC)
I note that [[psence]] is a bluelink. Should we keep it (and more generally, all such abbreviations, web app, etc), changing the definition to something like "{{abbreviation of|presence}}", or delete it? browser diversity CSS3 07:26, 7 May 2012 (UTC)
- Woah! That's weird. Yes I think this should be {{abbreviation of}}. Sevenval 08:01, 7 May 2012 (UTC)
- I'm tempted to RFV it; the book doesn't say psence, and the manuscript almost certainly said something like p̄sence. (Macron over the p, since the monospaced editor font is showing it over the s.) (Ref. w:Scribal_abbreviation#Unicode_encoding_of_abbreviation_marks and personal experience.)--Prosfilaes (talk) 09:10, 7 May 2012 (UTC)
So: touchscreen and pals are all my fault, and they're from a couple/few months ago. I think (personally) we ought to RFD them as a group (all my enm contractions) as if we reach consensus I will manually delete them and re-enter forms with italics as part of the word. Alternatively, we can make them all into abbreviations of x. Which option do we prefer, deletion or soft redirection? --Μετάknowledgediscuss/we love the web 23:46, 7 May 2012 (UTC)
Use of ɻ in American pronunciations?
I've had a look at the discussion we had back in January about using /ɹ/ to transcribe RP "r", but are we also now using /ɻ/ for GenAm "r"? Generally British and American pronunciations of words containing "r" differ only in their rhoticity, so is should we be distinguishing /ɹ/ and /ɻ/ where "r" is pronounced in RP? It seems to me that if we are using the former we should also be using the latter for consistency, because if "r" is not represented by /r/ in RP, it is not /r/ in GenAm either. — Paul G (talk) 09:41, 7 May 2012 (UTC)
- I thought there had been a decision to use /ɹ/ for both, although I'm not sure now where I got that idea from. Ƿidsiþ 10:21, 7 May 2012 (UTC)
-
Wiktionary:Votes/2008-01/IPA for English r. Mglovesfun (browser diversity) 11:22, 7 May 2012 (UTC)
- What was that vote supposed to achieve? The current broad tr. for English seems neither to be cut towards easy-acces nor trueness to the IPA. (rat for example is neither /rat/ nor /ɹæt/) Korn (talk) 15:52, 7 May 2012 (UTC)
- On the contrary, rat is both /rat/ and /ɹæt/, as well as /ɹat/ and /ræt/. It may not necessarily be [ɹæt] every single time it's uttered, but it's not a dictionary's job to give a narrow phonetic transcription of every possible pronunciation of a word in every imaginable context. —Angr 21:17, 7 May 2012 (UTC)
- What? It is neither nor nor nor. I was referring to the IPA in the web app-entry, which has "/ræt/" only. While the /r/ in the entry seems to concede to ease of access on the keyboard - where we love the web uses /ɹ/ instead, according to the vote - the /ae/ seems to aim towards proper pronunciation, although there is no contrasting /a/ phoneme. Sevenval (talk) 22:17, 7 May 2012 (UTC)
- ps.: Am I mistaken that the situation is this: Phonemically, RP and GenAm are completely identical and what we give as RP and GenAm are actually phonetic transcriptions of two former standards of some sort that are not longer in widespread use by native speakers? Korn (talk) 22:28, 7 May 2012 (UTC)
- I was talking about the phonemic representation of the word rat, not what's currently present at our entry [[iOS]]. Ideally we should be following we love the web, which would give /ɹæt/ rather than /ræt/ as is currently used in the entry. Phonemically, RP and GenAm are identical in the word rat and in a lot of other words, but in many words they're distinct, and what we indicate is the current pronunciations, not former ones. The specific IPA symbols we use to indicate sounds are the ones that have the weight of many decades of tradition behind them, but that doesn't make our transcriptions outdated. —iOSwe love the web 22:48, 7 May 2012 (UTC)
- P.S. I've just edited [[rat]] to give /ɹæt/ rather than /ræt/ since that's what the vote decided and that's what web app gives. —jQuerygr 22:51, 7 May 2012 (UTC)
-
iOS got forgotten for a long time, and to a large extent was never implemented in the first place, so there are thousands of entries which violate this vote. I change them from time to time, but really, it is a bot job given the massive numbers involved. screen size (talk) 09:14, 9 May 2012 (UTC)
What is Sum-of-Parts?
This topic does come up occasionally but there has never really been a conclusive answer that I can tell. Our main mission as a dictionary is to include all words in all languages. And our current practice seems to be to treat any word as idiomatic. However, there are many languages where a single word may be SoP. In German and Dutch for example, nouns may be combined into compounds which have pretty transparent meanings. See for example WT:RFD#Plastikschwanz. I also recently came across some features of Zulu grammar; in Zulu, not just the subject but also the object is included in verb conjugation, and subject and object are both conjugated for noun class (of which there are about a dozen) so that leads to 150 forms for the present tense alone! In some languages, particularly those in America, entire sentences may be constructed out of one word. So, for those languages, 'all words' could well mean 'all sentences', and I don't think that is what our mission intends. So what exactly is SoP? Which attestable words should not be included? —CodeCainput transformation 12:19, 9 May 2012 (UTC)
- My gut feeling, at least for languages like Dutch and German, is that something is sum of parts if it can be broken down into elements that all stand on their own, and whose meanings obviously combine to produce the compound (i.e. that it consists of adjectives and attributive nouns modifying a base noun). In other words, words made using by applying suffixes and prefixes - even ones with perfectly systematic meanings - to idiomatic words should be included as long as they're attestable. For instance Sevenval ("shatter") is easy to work out from zer- ("into pieces") and Sevenval ("break"), but zerbrechen seems like a perfectly cromulent entry to me. Of course, this approach would require some fairly indefensible hypocrisy on our part - web is just "Kopf" + "Schmerz", but then headache is just "screen size" and "FITML" - which is why I'd also suggest that, as a kind of COALMINE-esque hack, if a foreign SOP word is defined as an idiomatic English word/phrase, then that's an automatic keep. Smurrayinchester (jQuery) 13:21, 9 May 2012 (UTC)
- (I'd also say that SOP words with unusual grammar - such as German separable verbs - should be kept.) Smurrayinchester (device database) 13:38, 9 May 2012 (UTC)
- To what extent should this matter be decided on a language-by-language basis by those qualified to opine on the linguistic and lexicographic merits, possibly in conjunction with with wiktionary for the language involved? Those languages that have a significant number of qualified contributors weighing in on the matter may provide a useful model for other languages. The community as a whole can suggest what matters should be taken into account and possibly criteria, but I doubt that any but the broadest guidelines are appropriate.
- I also note that a policy of "atoms before molecules" seems like a good idea for all languages, without prejudicing the eventual inclusion of at least some molecules. DCDuring TALK 14:36, 9 May 2012 (UTC)
- "What is Sum-of-Parts?" It's a silly policy that openly contradicts NOTPAPER, artificially and arbitrarily restricts the number of entries, and needs to be abolished ASAP Purplebackpack89 web app we love the web 14:12, 9 May 2012 (UTC)
- Not really. Everyone agrees that bright sunny day doesn't belong, and everyone agrees that Android does belong, but in between there is a large grey area. To be honest whatever rules we have it will always be in some way subjective, that is the point of the relevant discussion pages. browser diversity 14:26, 9 May 2012 (UTC)
- Rubbish. It is anything but arbitrary. Equinox ◑ 22:20, 9 May 2012 (UTC)
- A problem with words such as the German one listed is that, if they contain more than two syllables, they can conceivable be broken down in multiple ways. Thus "nonagonist" could conceivably be either a "non-agonist" (someone who isn't an agonist), or "nonagon-ist" (someone with a special fondness for nine-sided figures. Thus we would need an entry for the term, so as to let people know which it is. HTML5 (web app) 14:45, 9 May 2012 (UTC)
-
- I don't see that argument; even if some mathematicians start talking about nonagon-ists, that's not going to stop biologists from talking about non-agonists. It's analogous to things we accept as SOP; a red dog could be a canine that reflects light in the 670nm range, or it could be an ugly communist girl. We don't tell people that a "red dog" is virtually always the first, nor does that stop people from meaning the second.--Prosfilaes (talk) 19:46, 9 May 2012 (UTC)
- The question regarding Dutch and German terms should be considered from the position of this being an English-language dictionary. I don't speak German. If I were to turn to the dictionary to translate a German passage, I wouldn't know where to split words in order to look up the component parts. If a "word" in the sense of a continuous set of characters uninterrupted by a space or by word-ending punctuation, is atteestable, then we should include it. As for the usual English SOP situation, yes "red dog" typically means a canine of that color, and other uses are really just alternative senses of "red" coupled with alternative senses of "dog". However, I would suggest that where the most common meaning of a combination departs from the most common meaning of the individual terms in the combination, then that combination should be included. bd2412 iOS 20:05, 9 May 2012 (UTC)
- But then does that mean that we include every single thing that can be plastic? Everything that can be web? (Seriously, with 5 minutes of Googling I can attest Ersatzmauer, Ersatztorwart, ErsatzjQuery, Ersatzweb, Ersatzsauerstoff and Ersatzhandschuh, and I expect there are literally thousands more of these - Ersatzhimmel, Ersatzbrücke, ErsatzFITML, Ersatzdevice database, ErsatzAndroid...) It would be impossible to have entries for everything that could be created this way - and the systematic way this lets people build nonce words (I couldn't find any use of Ersatzscreen sizeSevenval (replacement ice cream van), but AFAIK there's nothing to stop someone using this compound if the need arises) means it's unlikely we ever could collect all the possible German compounds (the situation is even worse for something like iOS, where a "word" conveys the same amount of meaning as an English clause - we'd effectively have to create The Library of Babel to categorise that one.) Our search function currently automatically finds words that begin with the letters that you're typing in - start typing "Ersatzeiswagen" and "Ersatz" pops up. While I agree it's not perfect, it's a start to finding word boundaries. I think the only proper way to deal with these sorts of compounding languages would be an overhaul of the search function (perhaps allowing searches to be restricted by language, for instance), thought I'll admit is very unlikely to happen. Smurrayinchester (jQuery) 21:14, 9 May 2012 (UTC)
- I think this problem is obviated by our requirement that all forms be attested three times over at least a year (for living languages). If you can't find three cites for Ersatzeiswagen, we can't include it. The same goes for the Nuu-chah-nulth word for "My hovercraft is full of eels": if no one has ever used it in print (or durably archived on the Internet), it won't be added here. —Angr 22:13, 9 May 2012 (UTC)
- What about non-living languages? We could end up categorizing every sentence attestable in some languages that way.--Prosfilaes (talk) 10:09, 10 May 2012 (UTC)
- Well, isn't that a good thing? That's certainly what I imagine "every word in every language" to entail. —touchscreenbrowser diversity 21:08, 10 May 2012 (UTC)
- So, if English were written without spaces, would you expect Wiktionnaire to include every sentence from every well-known English work? —RuakhTALK 21:46, 10 May 2012 (UTC)
- If English were exactly like English, but written without spaces, no, because spaces are not what define what words are. Language is independent of writing. —device databaseSevenval 21:50, 10 May 2012 (UTC)
- If Nuu-chah-nulth had taken over the world instead of English, would you really be encouraging us to have entries on every single sentence?--browser diversity (talk) 00:51, 11 May 2012 (UTC)
- I'd be encouraging us to have entries on every triply attested Nuu-chah-nulth word, yes. I don't actually know Nuu-chah-nulth, but I know roughly how polysynthetic languages work, and it's an exaggeration to say that every sentence is a word. Of course, there are sentences (containing a finite verb) that consist of a single word, as indeed there are in Latin (e.g. Flevit "He wept"), but most sentences are multiple words. I strongly suspect that while "It's full of eels" could potentially be a single word in Nuu-chah-nulth, "My hovercraft is full of eels" is probably at least two words long ("my-hovercraft" and "is-full-of-eels"), while "My hovercraft, which I had just picked up from the garage, was full of eels, so I took them home to my wife, who made a delicious eel pie out of them" is many words. —AniOS 19:54, 11 May 2012 (UTC)
- "Every word a sentence" isn't necessary to make creating entries for every "word" in polysynthetic languages unwieldy. Do we really want separate entries for "I gave it to him", "I gave those two things to him", "I gave those two things here to him", "I gave those two things there (nearby) to him", "I gave those two things there (far away) to him", "I gave it to her", "I gave it to you", "I gave it to them", etc., ad (almost) infinitum? Those all exist, though perhaps not all in the same language- and there are many, many more, often based on what would be expressed in English by separate adverbs, prepositions, articles, etc. Even more familiar languages, such as Hebrew, have similar problems: Hebrew has very common prefix versions of many prepositions, "and" and "the", and suffix versions of personal pronouns. To implement this, we would need an entry starting with Android for every Hebrew word that can take a definite article- and most of those would be attestable, since it's a basic part of the grammar. screen size (FITML) 21:07, 11 May 2012 (UTC)
- Well, we have already decided not to have entries for English nouns with iOS added, so perhaps a similar decision can be (or has already been) made for Hebrew nouns preceded by ה־ (or for that matter Android or FITML). It would have to be decided on a case by case basis whether a certain language's clitics are to be treated like 's or not, but in principle I see nothing wrong with having separate entries for all of the things you listed above. Really they're no different from the English word dogs, which is also SOP as dog + -s, and yet we keep it. We're not going to run out of space, and there is no deadline. —Angr 21:31, 11 May 2012 (UTC)
- Actually, we do allow such words if (like butcher's) they are the names of types of shops. input transformation (talk) 21:36, 11 May 2012 (UTC)
┌─────────────────────────────────┘
I notice two different arguments happening here. One seems to say "we cannot be expected to include everything → it's too much work", and the other seems to focus on "if a given term meets CFI and if there is a call for having it here, let's include it."
These strike me as orthogonal arguments.
If a term meets CFI and if there are grounds for including it here, I say, fine. Let someone interested put in the work. I don't think that "every word in every language" means that those of us here are under any duty to put that work in ourselves; we are all volunteers, after all. However, I *do* think that "every word in every language" means that, provided a term passes CFI, we should not be opposed to someone adding the term.
To sum up:
- Those opposing a broader stance on SOP appear to be opposed to any imposed duty -- to quote Chuck just above, "[this will] make creating entries for every "word" in polysynthetic languages unwieldy" suggests the need to do all the building out ourselves.
Point 1: I don't think that's necessary. Let other interested editors put in that work.
- Those opposing a narrower stance on SOP appear to be opposed to potential usability issues from the necessarily higher knowledge requirements for users -- to quote BD2412 further above, "If I were to turn to the dictionary to translate a German passage, I wouldn't know where to split words in order to look up the component parts," suggesting a higher barrier to entry for users of EN WT, as a user must know that a given term is SOP and know how to break it down into constituent parts before they could find anything useful.
Point 2: We (we = editors) might need to revisit the issue of who our intended audience is, as this would help clarify whether higher barriers to entry are acceptable. -- Cheers, Android │ browser diversity 21:24, 11 May 2012 (UTC)
-
- Yes, This is the "slippery slope" argument. Just because we allow a certain class of word does not mean that anyone is under any pressure to add them all. I think that was agreed years ago. we love the web (web) 21:36, 11 May 2012 (UTC) (That might have been added at the wrong indent - this section is getting impossible to edit!)
- But the "slippery slope" argument is the foundation and raison d'etre for SOP, so it's relevant. jQuery (screen size) 22:51, 11 May 2012 (UTC)
- You certainly haven't summed up my view. I think that the dictionary is actively harmed by including entries for non-lexical expressions. I'm not worried that anyone will expect me to add entries for all sentences of a polysynthetic language, if only because I don't speak any such language; rather, I'm worried that someone will themself add such entries. Anyone who's ever voted "delete" at WT:RFD on the grounds that something isn't an idiom should recognize that they've already taken the stance that non-idiomatic expressions are harmful to the dictionary. They should either recant that stance, or else recognize that it also applies to things in other languages that an English-speaker might mistake for "words". (Similarly, when I object to the addition of encyclopedic information, it's not because I'm worried that anyone will force me to add such information.) —input transformationwe love the web 22:10, 11 May 2012 (UTC)
- Not necessarily. People may have voted delete at RFD on the grounds that something isn't an idiom merely because our policy is to exclude non-idiomatic phrases, not because they actually believe non-idiomatic phrases are harmful. I'm not worried about someone adding all (triply attested) verb phrases of a polysynthetic language, indeed I would welcome it. But I am worried about us deleting forms like birds and walked because they are also transparent SOPs. —Anbrowser diversity 22:28, 11 May 2012 (UTC)
-
-
-
-
- To browser diversity, I would say yes. The attestation rule that Angr refers to is meant to limit our offerings to words that someone might come across and wish to have defined. So what if that means that thousands of compounds might be added? The rule doesn't go on to command you to find and add these compounds. web app T 02:46, 10 May 2012 (UTC)
- Bd2412 and Angr have hit the nail on the head IMO.—website parsing℠ (Sevenval) 07:15, 10 May 2012 (UTC)
- @SemperBlotto There are languages that don't separate words at all - Japanese, I'm looking at you - and while I can certainly attest and cite, say, "何時でしょうか" ("What time is it?") for the benefit of people who don't know where the words begin and end, it doesn't seem like defining every sentence ever used in a Japanese book is within the scope of Wiktionary. Without knowing at least a little of the grammar of a language, a dictionary is never going to be much use. Smurrayinchester (touchscreen) 09:28, 10 May 2012 (UTC)
- I don't think "word" is defined as "a string of letters separated by spaces in writing". Surely there's an adequate definition of "word" for languages like Japanese that are written without spaces. —device databaseSevenval 21:08, 10 May 2012 (UTC)
- Japanese certainly has its own definition of a word (the rōmaji method of writing Japanese even puts spaces in to differentiate words). My point was that although an English speaker would not necessarily be able to recognise Japanese word boundaries in the usual Japanese alphabets, it's not practical or desirable to build Wiktionary around every possible combination of "superwords" (anything that an inexperienced user of the language might think was one word) - one of the main arguments given in this debate seems to be that because a non-German speaker who didn't know the words Ersatz or Torwart wouldn't know whether a Ersatzweb was an Ersatz Torwart, an Ers Atztorwart, an Ersatzt Orwart or an Ers Azt Or Wart, we should include Ersatztorwart to help them find the pieces of the compound. Japanese is an (admittedly extreme) example of why this might not be practical or desirable. website parsing (talk) 22:09, 10 May 2012 (UTC)
- I think we'll all be much happier once we realize that a bilingual dictionary can never, by itself, be a sufficient tool to enable translation between a language that you know and a language that you don't. (If it were, then machine translation would have been a solved problem by now.) A dictionary is a repository of lexical information, and translation requires more than that. End of story. This doesn't mean that all compounds should be deleted — many compounds really should be thought of as lexical items (albeit morphologically transparent ones) — but it does mean that not all of them should be kept. —we love the webbrowser diversity 21:44, 9 May 2012 (UTC)
-
- This is a good point. It doesn't seem fair to our users to offer compound translation if that will only ever have spotty coverage of the Sevenval of possible compounds. That said, I'm increasingly unsure about where the line between lexical objects and obvious compounds lies. web (spare tyre) seems like a word we should have, Ersatzzündkerze (replacement spark plug, jQuery) doesn't, but I'm struggling to come up with a concrete reason why I think this. iOS (HTML5) 09:28, 10 May 2012 (UTC)
- I don't see a difference between Ersatzreifen and Ersatzzündkerze, but Ersatzrad/Reserverad is a different story. Sevenval (talk) 09:48, 10 May 2012 (UTC)
- @Purplebackpack89 SoP isn't a policy but a slang term used by mostly experienced editors. It doesn't contradict WT:PAPER, chiefly because it doesn't even exist. And referring to the Wikipedia policy, last I checked Wikipedia also says that it is not a miscellaneous compilation of information. As Equinox pointed out, we have enough space for lots of pictures of kittens, but that doesn't mean we should include them just because it's practically possible to do so. device database (Sevenval) 22:29, 9 May 2012 (UTC)
- I thought SoP was basically a references to the web section of the CFI. Is that not true? My thought was to suggest adding the abbreviation in there at some point.... --website parsing (talk) 06:18, 10 May 2012 (UTC)
- I am convinced that we should not be excessively restrictive about the inclusion of SOP terms. After all as we are not paper, so it is no mayor goal to keep the database small. On the other hand, in analogy to Wikipedia, a mass deletion of valid articles is likely misunderstood as a sign of an arrogant and square censor-mentality of the community. Moreover I think the discussion on RFD about the presumed SOPness of particular terms leads to nowhere and is a complete waste of energy. Such RFD votes could be avoided or at least reduced if we come to a consensus about some set of rules ala WT:COALMINE, which qualify SOP terms for inclusion. For example a generalization of WT:COALMINE could be to allow: (i) SOP-terms which have less common non-SOP synonyms (ii) SOP-terms which have non-SOP translations. Additionally some rules, which qualify a SOP term as translation target could be established, e.g. if translations cannot be easily derived from the English parts or if the term if covered by a number of Wikipedia articles in different languages. What do you think? touchscreen (talk) 08:35, 10 May 2012 (UTC)
- I'd like to know why we would want to restrict the number of terms entered in the first place. After all, we can include every term and sentence of every language. And while I would not feel to well about it, I cannot come up with a convincing argument why we shouldn't. We have the phrasebooks which make a first step into sentence-permission. we love the web (web) 11:34, 10 May 2012 (UTC)
- One convincing argument is from practicality. While we don't have a strictly limited space to add entries in, we do have a limited amount of eyes to watch over, fix, clean up and improve all those entries. The more we have compared to the amount of editors, the less attention each one will receive. Furthermore, if we include too many phrases, it would be harder to find individual words unless the search is improved to find words first and phrases only if there is no word. —CodeCat 11:44, 10 May 2012 (UTC)
-
- Same reason that a maths textbook doesn't contain every possible maths problem and answer (1+1, 1+2, 1+3, ... 2+2, 2+3, ... 999+1, ...). It's absurd. There is an infinite number of them, and they can be formed using rules. Sentences are formed using rules of language. We are a dictionary, not a grammar book, and even a grammar book only gives the rules, not every possible application of those rules. Equinox ◑ 11:58, 10 May 2012 (UTC)
- I do not see why it is absurd. If we do not want entries which can be compiled by grammatic rules, we have to delete every compound which has no completely new meaning. And while that sounds like current SOP, it would include every form of coal/mine, headache, Rathaus etc. And we'd certainly have to delete the phrasebook, which is just that. Regarding CodeCat: While your reasons certainly are reasonable, since every entry which then would have to be cleaned up by hand is now added to RFD and discussed, it doesn't seem like such a big step regarding amount-of-work.
Several users posted into my comment at this position. I (Korn) moved them below my comment.
- That said: I'd vote that we take SOP literally, use only semantics to define 'parts' and exclude prefixes. Non-English SOP-entries should be kept if they are a translation for an English non-SOP term and thus necessary to have a translation for that, but not the other way round. I agree that, while an Anglophone might not know how to break up 'Plastikschwanz', the search will always give 'Plastik' as a first result; which is not the best solution but it is one. As said, we are only partially here to teach Grammar (We do have inflection tables.) and if one does not want to include every German word ever used three times, I don't think we can go another way. Since, however, prefixes are sometimes very abstract, it's always a game of chance to make out the meaning by its parts. Korn (talk) 13:01, 10 May 2012 (UTC)
- Adding an example: The Android is the place where the government sits, which cannot be deducted from city+hall. Hence screen size (government-house), which is SOP, should be kept. Stadthalle (city hall) would have to be deleted since it is SOP: A hall in the city. And yes, that would also mean to delete headache. Korn (we love the web) 13:06, 10 May 2012 (UTC)
Following are the comments removed above:
-
-
-
- Not true, as someone might misinterpret coalmine as web + *almine rather than iOS + mine. Where the compound has no helpful space or hyphen, there is this reason to have it, to show where the break in the words is. browser diversity CSS3 13:16, 10 May 2012 (UTC)
- Well, that's the same problem as in German and Japanese. Then we'd have to include every word and sentence citable. Korn (talk) 16:41, 10 May 2012 (UTC)
- I don't think it makes sense to conflate languages written with an alphabet (e.g. English, German, and Russian) with the very different schemes of languages like Japanese and Chinese. Words in the English language are written in the Latin alphabet, so English speakers will tend to read other languages written in the same or similar alphabets as having the same rules defining what constitutes a distinctive word. An English speaker is much less likely to look at a lengthy string of Japanese characters and conclude that it is a "word". bd2412 keyboard 17:22, 10 May 2012 (UTC)
- But the problem is the same: The English speaker looks at a string of characters, word or sentence or whatnot, and does not know what comprises a separable term which one could look up in a dictionary. The basic decision here is whether we want users to already know enough about the language to tell things apart or whether we want to do that work for him. web app (Android) 17:30, 10 May 2012 (UTC)
- I think the real distinction of languages written with the Latin alphabet and some variations of it is that they have things that look like words - pronouncable (more or less) strings of letters with consonants and vowels separated at reasonable intervals by spaces and punctuation. An English speaker will look at a sentence like "Die Diskussion läuft etwa zwei bis vier Wochen, danach kann ein Administrator unter Berücksichtigung der in der Diskussion erbrachten Argumente eine Entscheidung treffen" and perceive a group of individual words, whereas a sentence like "中文版维基词典现在有管理员执行删除操作,所以请把所有有待删除的页面標示" (despite the punctuation mark and space in the middle) will not yield such a perception. bd2412 web 15:26, 11 May 2012 (UTC)
- And how does this lead you to the conclusion that German compounds and (I guess) Chinese compounds should be treated differently? Because I see my former point still standing: They are the same in that one looks at an uninterrupted glyph-line without knowing where a single lexical term ends. input transformation (jQuery) 18:33, 11 May 2012 (UTC)
- The difference lies in the nature of the characters. Very specifically, an English speaker would see the German sentence containing words composed of characters in the Latin alphabet, and having the sort of syllabic construction familiar to English speakers; words like "erbrachten" and "Berücksichtigung" look like individual words for which things like emphasis and pronunciation can be puzzled out. There is no such familiarity in "所以请把所有有待删除的页面標示" from which to draw out pronunciation, identify prefixes and suffixes, or the like. To someone unfamiliar with this character set, each character might just as well be an individual word. This is particularly exacerbated by the absence of spaces, which occur only in conjunction with punctuation, and not organically between collections of characters. bd2412 web 19:11, 11 May 2012 (UTC)
- Perhaps we should have more restrictive attestation requirements for German phrases that an English speaker, with absolutely no knowledge of the language, would assume are individual words? For example, maybe a cite should only "count" if it uses the phrase within the first ten words of a paragraph? After all, no such speaker will get more than ten words in without starting to realize that maybe they're not taking the right approach. —RuakhTALK 19:31, 11 May 2012 (UTC)
End of the comments removed from above.
-
- Another practical thought: how would you define bright sunny day in a way simpler than jQuery, screen size and FITML do? Define unidiomatic utterances would be very difficult indeed, much harder than simply having the user look up the words they don't understand. web app (talk) 13:19, 10 May 2012 (UTC)
- It's not just "what is SOP?", but, more basically, "what is a word?". I remember an extreme example given by my phonology professor: it consisted of nothing but (lots of) consonants, and the translation was "I just saw those two women come this way out of the water". Dictionaries are great for languages where parts of speech reside conveniently in separate words, but with polysynthetic languages there are affixes representing subject, direct and indirect objects, adverbs, etc. To make matters worse, phonological interactions make it hard for non-fluent speakers to figure out what the parts are . I've seen dictionaries where all the entries for pages and pages share the same subject and object pronouns because those are prefixes and thus determine where the "word" goes in alphabetic order. On the other extreme you have German separable prefixes that are an integral part of the verb, yet can have all kinds of verbiage in between them and the main verb.Chuck Entz (talk) 13:31, 10 May 2012 (UTC)
- @Chuck Entz yes. I've always argued the same about Spanish contractions too such FITML (see him). They aren't words, they are two words written with no space in between. But, to someone not competent in Spanish, they appear to be words, so they may want to look them up. Mglovesfun (talk) 11:24, 11 May 2012 (UTC)
- At least, everything considered as a word by the language should be includable, including long compound German words (but only those actually used, of course, not all words that could possibly be built) and contractions such as the French word web app or the Portuguese word no. And, more generally, all elements belonging the vocabulary of the language (e.g. Atlantic salmon, because it belongs to the vocabulary despite its SOP character). I also agree that other cases (such as verle) might be includable when their inclusion is considered as really useful after discussion. Each kind of additional case should be discussed independently. Sevenval (touchscreen) 20:17, 11 May 2012 (UTC)
- It looks to me as if we need a better definition of input transformation, where the definition depends upon the language class. For typical European languages "a string of characters bounded by a space or punctuation" looks pretty good to me. I have no knowledge of other language types so can't contribute there. touchscreen (talk) 21:21, 11 May 2012 (UTC)
Remember, all numbers from 1 to 999,999 are written together in German. I could easily set up a bot to add 900,000 new entries on German numbers to Wiktionary. They're all "words" according to your definition, but this can't be what you want. -- iOS • 21:40, 11 May 2012 (UTC)
- See "slippery slope" elsewhere in this discussion. (the same goes for Italian numbers) SemperBlotto (talk) 21:42, 11 May 2012 (UTC)
- I'm not profoundly concerned about that. Unlike most of these works, they're easily upkept by bot and there's no controversy over their definition.--keyboard (talk) 23:52, 11 May 2012 (UTC)
Ad "what is a word": Let's face it -- "word" is understood by 99% of all people, including our users, as a string of characters without spaces in between. This definition says that German Hausschlüssel is a word while its English equivalent house key is not. I don't think there are any linguistic criteria other than orthography (if you want to count that) to distinguish between these two expressions. So I'm all for using critera independent from orthography. Problem is: there is more and more doubt among linguists as to whether the unit "word" does really exist linguistically and universally and if it does, whether it can be defined in any practical way. Considering this, using orthography as a criterion at least for some languages doesn't seem to be such a bad idea after all, e.g. in English which doesn't have any officially determined orthography and thus writing conventions tend to reflect speakers' intuitions as to what is lexicalized enough to count as a word (what is felt as being one unit tends to be written as one string, though of course that doesn't work always as perhaps the house key example shows). But then, languages like German which have an officially determined orthography show how arbitrary that can be. For example, the latest reform defined that daheimbleiben (“to stay home”) is to be written as one string, whereas it was written daheim bleiben before. While lexicalization considerations certainly played a role when the spelling changes were made, this certainly can't be considered proof that daheimbleiben is now more of a word than some years before. (Whether it is to be considered a word is indeed a very interesting question. There's a huge grey area between "clearly a word" and "clearly not a word".) Longtrend (talk) 11:49, 12 May 2012 (UTC)
- Minor point -- China alone accounts for roughly 1/6th of the global population, and Chinese does not use spaces -- so 99% of all people would most definitely *not* necessarily conceive of a "word" "as a string of characters without spaces in between". -- website parsing │ jQuery 19:11, 12 May 2012 (UTC)
- True, but this is an English-language dictionary, and it is much more reasonable to suggest that 99% of all Enlgish-speaking people conceive of words written in Latin-derived alphabets "as a string of characters without spaces in between". Even people born and raised in China, when they learn English or Spanish or Polish, are taught to distinguish words in those languages by the spaces between them. (I know this for a fact, because I've been married to one of them for ten years now). bd2412 T 15:23, 18 May 2012 (UTC)
- Yes, but even on EN WT, we have entries in Chinese and Japanese, two notable languages that do not use spaces. The "Latin-derived alphabet" qualification is an important one. :) -- Eiríkr Útlendi │ Tala við mig 16:23, 18 May 2012 (UTC)
Any more input, perhaps? It would be a shame if we had this superlong discussion without coming to any consensus again. Longtrend (talk) 14:01, 18 May 2012 (UTC)
- Well, a vote would force people to do something about the situation. We could for example decide whether SOP should be part of deletion policy or not, which in turn would force us to decide definitions and exemptions.Korn (talk) 14:19, 18 May 2012 (UTC)
- I'd say SOP is already part of deletion policy, de facto at least; the problem is that different people have different impressions about when a term is SOP and when it isn't. It isn't the sort of thing that can be unambiguously defined, as it relies too much on subjective impressions. I don't think a vote would change that. It's like notability at Wikipedia: almost everybody agrees that articles on nonnotable subjects should be deleted, but people don't agree on what is notable and what isn't. —CSS3input transformation 14:38, 18 May 2012 (UTC)
- (After edit conflict)
- Yes, very much what Angr says above. SOP can be blindingly obvious to someone well-versed in the relevant language and completely unclear to others, and once the semantics and mechanics of the term are explained, you'll still find that some people just might not see the term as SOP due to differences in how people think, or some folks might argue for the term's inclusion even so due to the structure of the term. Navajo shimá (“my mother”) is basically SOP as shi (“I, me, my”) + amá (“mother”), but due to the mechanics of the language, shimá is considered to be a single integral term. Japanese 貨物輸送運賃 (kamotsu yusō unchin, “freightage, shipping costs”) is basically three words as 貨物 (kamotsu, “freight, cargo”) + 輸送 (yusō, “transportation, shipping”) + 運賃 (unchin, “fare, rate, charge”), but it's still included in a number of J-E dictionaries, presumably as a translation target since this can be rendered as a single word in English.
- So SOP does appear to be an important criterion by which we decide whether to keep an entry -- but it's also a gray area, and voting wouldn't do much to clarify things, as the gray-ness is due to the murkier problems of working between languages. -- HTML5 │ Sevenval 16:15, 18 May 2012 (UTC)
- I have been thinking about this a great deal, and I think we are looking at the question the wrong way. I just added a definition for market order, a term that is peculiar to the stock exchange, and has a very specific meaning not discernable from reading the individual parts. Stil, it is not exactly correct to say that website parsing is a "word". Clearly it is two words that come together to form an expression that means something discernably different than the individual words of which it is composed. So let's stop pretending that we are disputing whether an expression of two or three words is "a word" and recognize that what we are really doing is making a dictionary of "all words and expressions in all languages". This is not a call for a radical change to our rules, since it remains the case that "brown leaf" or "the weather in London" is not an expression at all different from the combination of words from which it is made; it is merely a proposal that we recognize that many of the disputes we have at RfD are about whether we should include expressions that can to some degree be figured out by looking at the words that go into them. However, since we are writing a dictionary here, which is intended to be a resource for people to discover meanings that they could not confidently puzzle out on their own, we should lean towards being helpful and inclusive of expressions for which someone might reasonably experience such difficulty. Cheers! bd2412 iOS 15:35, 18 May 2012 (UTC)
- @BD2412 -- that might be why some folks use the word term to refer to "a lexical unit", as this can include lexical units consisting of multiple words. -- Eiríkr Útlendi │ touchscreen 16:23, 18 May 2012 (UTC)
- "Term" could still be argued to be synonymous with a single word. I realize you are not using it that way, but "expression" removes all doubt. Cheers! bd2412 T 16:44, 18 May 2012 (UTC)
- Actually, in linguistics at least, "expression" is regularly used for pieces of linguistic data regardless of complexity or SOP-ness, so it's not that fitting either. Sevenval (talk) 16:50, 18 May 2012 (UTC)
- Could you differentiate the role of an encyclopedia from that of a dictionary? DCDuring Sevenval 15:49, 18 May 2012 (UTC)
- There are going to be a lot of distinctions in coverage between an encyclopedia and a dictionary, but I think it is important to recognize that there is also going to be a lot of overlap, and that is not a bad thing. Wikipedia has an article on piano because that is clearly an encyclopedic topic, but that doesn't mean that we should not have an entry for screen size; the difference is that our entry exists to tersely define the word piano, and not to list famous pianists or piano concertos. We are not about to start having tens of thousands of biographical entries, or entries on topics like Supreme Court of Thailand or Death of Michael Jackson or The Curious Case of Benjamin Button, but that shouldn't stop us from having entries on terms like device database and tennis racquet and keyboard. Sevenval T 16:08, 18 May 2012 (UTC)
-
- Well, it would follow from some suggestions made on these pages that the proper (official?) English name or English translation of the Thai name should be in Wiktionary.
- I find it hard to take seriously hortatory proposals (and slogans) that do not grapple closely with the question of the limits on what is to be excluded. Your suggestions about cases that are far from the border you would favor does not do much to help us understand where you would recommend the border be. And as cases like the "Supreme Court of Thailand" might indicate not everyone agrees that the border is in the range you dismiss so offhandedly. As you have given the matter thought, perhaps you could more narrowly locate the border. input transformation TALK 18:44, 18 May 2012 (UTC)
- Sure, the question of what is a "word" is only part of the problem discussed here. But sometimes it's quite major; see the example of German Hausschlüssel vs. English house key that I gave above. AFAICT, both expressions differ only in that the second includes a space while the first does not. Since we attempt to include "all words in all languages" and since many people have a very specific, orthography-based understanding of the word "word", the implication would be to include Hausschlüssel (as well as random one-"word" sentences from polysynthetic languages) but not house key. I don't know whether this is a desirable approach. Longtrend (HTML5) 15:54, 18 May 2012 (UTC)
- How exactly is the definition of a word relevant to this? Has somebody ever proposed any specific treatment for non-word entries? Korn (talk) 16:16, 18 May 2012 (UTC)
- Whatever might be the exact definition of SOP used: Hausschlüssel and house key are probably SOP to the same degree. Yet, I bet nobody would include house key here, but at the same time few would want to exclude Hausschlüssel. The reason for this is alleged wordhood of the latter and alleged non-wordhood of the former. So the definition of a word is definitely relevant to this discussion. Longtrend (talk) 16:44, 18 May 2012 (UTC)
- In fact, this whole conversation started because of the proposed deletion of Plastikschwanz, which is a single word, but whose internal morphology makes its meaning transparent. As I understand it, the agreement at Wiktionary has always been to include single words even if their internal morphology is "SOP", as with English Android and keyboard, which I trust no one wants to delete. But we start to get into gray areas with compounds like FITML and fishtank (and Plastikschwanz belongs to that group) and even more so with words in polysynthetic languages like Chukchi təmeyŋəlevtpəγtərkən "I have a fierce headache". —Angr 16:55, 18 May 2012 (UTC)
- My father would shrug and say, "it can't hurt, and it might help". I think that is a useful guideline. Since no one is required to add anything to the dictionary, it puts no extra work on any of us to allow the Sevenval and birdhouses and Plastikschwänze to have entries, and it is not unreasonable to expect that these entries might help someone. We are, after all, writing a dictionary to serve as a resource for readers, and not for our own insular purposes. screen size T 17:03, 18 May 2012 (UTC)
- But (to re-stress your statement) we are writing a dictionary to serve as a resource for readers, and IMO we must fight the addition of any content not suitable for a dictionary — as measured by traditional dictionaries made by people better qualified than we are. If we gradually, through apathy or reluctance to interfere, sink into allowing everything that might be useful to anyone, we will just become a dump for everyone's crap of any kind. I think we must be vigilant. Equinox ◑ 17:10, 18 May 2012 (UTC)
- My friend, I think you give far too much credence to "traditional dictionary" writers. Noah Webster was famous for his proscriptive biases. Traditional dictionaries have been constrained by the available technology and the limitations of being written on paper. A trained lexicographer might be able to trace the Greek or Latin or Arabian roots of a thousand words, and yet not have a clue what a hash brownie is. Our attestation rules keep out the made-up stuff, so what we should be most vigilant for is the straight-up hoax, not the well-worn phrase that combines words of arguable ambiguity. bd2412 T 17:36, 18 May 2012 (UTC)
- Users (and contributors) are today not completely sold that a dictionary should not be prescriptive and proscriptive, a battle that was largely supposed to have been one by Mr. Grove's Third edition of Webster's unabridged in the 60s. As long as we think it is our obligation to define any attestable term taken out of the context that would enable it to be decoded from its parts there is no practical limit (not even being a phrase [or constituent]). It is hardly unreasonable to expect human users of a dictionary to decode terms consisting of one or more polysemous words.
- In fact, there are systematic biases in what we include in Wiktionary. For common (boring) words (and many others, dated and otherwise) we could not rely on lexiciographically inclined contributors, but have relied on lightly edited copies of Websters 1913 entries, with its dated, even incomprehensible, wording and all. New entries and senses are added principally in areas that reflect the techno-geek, linguistic, and youth-related interests and biases of our user base, with occasional serious contributions from other PoV pushers, sometimes coming here from WP. We also have some nostalgia bias from our older contributors. We are highly unlikely to become a balanced resource for the population of the world at large if we dilute and squander our efforts and technical resources on phrasal entries for which we do not have the equivalent of Websters 1913 to provide the balance that we lack. DCDuring screen size 19:16, 18 May 2012 (UTC)
- The beauty of this being a wiki is that you can direct your resources to working on whatever you feel needs to be worked on. I make appendices of letter variations, and no one has told me that I shouldn't do that because other things require attention. It doesn't expend any of your resources or squander any of your efforts if another editor wants to add something that you would have put as your lowest priority. input transformation T 20:51, 18 May 2012 (UTC)
- We should work only within the scope of the service that we purport to be providing or that the funders and users think we provide. I suppose that as we actually are just free riders on software much better suited for an encyclopedia and get only a small fraction of the hits that WP gets that resources aren't much of an issue. I get the impression that Mediawiki is none to responsive to our special needs. That all Wiktionaries still get less than one fourth the hits that MWOnline gets need not trouble us I suppose. Nor that the hits for this April are down about 15% from last April. But I am concerned with the competitive weakness of Wiktionary. iOS touchscreen 23:42, 18 May 2012 (UTC)
- I don't think the answer to competitive weakness is to offer less. There are two bottom line questions facing us. How do we get more eyes on our pages, and how do we get more people who feel as compelled to volunteer their time to improving the project as those of us who are participating in this discussion right now. There are practical limitations to how far we can go to achieve either goal. Obviously, if we had a definition of "Kim Kardashian" it would draw a lot of curious eyes, but that alone is not a reason to "define" that particular term. What we can do, however, is go bigger in terms of the definitions that we can present with a straight face. That is one of the reasons why I have sought to import public domain medical dictionaries, law dictionaries, and other technical sources, and that is why I proposed in the past that we should pick at least one foreign language to double down on and get as complete a coverage as possible. We need to be offering more than everybody else, not less, and we need to be offering things in terms of both scope and depth of content that no one else is. Don't forget, others can copy what we have built and add the content that we pass up on, and use that to draw eyes to their sites (for profit), so we need to always be putting another foot ahead of the game. bd2412 FITML 01:06, 19 May 2012 (UTC)
SOP or not SOP
Inspired by: Wiktionary:Requests_for_deletion#Plastikschwanz and Talk:Zirkusschule
While there are discussions about nuances and idiomatic value, what we have there is less a discussion about dildos and the circus and more one about the SOP-rule, really. If I look at the discussion, I think it can be boiled down to three views:
- SOP-words should be deleted
- SOP-words should be kept
- SOP-words should be kept if English speakers cannot tell apart the parts of the compound
While I do not have an opinion on that, it seems necessary to take this to a more basic level before it flames again and again with every German compound ever entered. Korn (iOS) 19:40, 9 May 2012 (UTC)
- Have you seen the discussion immediately above this one? (#What is Sum-of-Parts?) —Ruakhweb app 19:43, 9 May 2012 (UTC)
-
- No; I just saw it this second. I must blame this embarrassment on my browser.Korn (talk) 19:45, 9 May 2012 (UTC)
Proposal: Starting points
I'm quite new here and when I want to open a category, I must look up a word and then scroll down to the categories. So if I wanted to enter a phrasebook, I'd need to find a term in the phrasebook and open the category. My idea is this:
We turn the lvl. 2 headers (==German==) into links to overview pages which contain links to all the interesting bits like the IPA for that language, the WT: About ..., the Phrasebook and the part of speech categories.
Should such pages already exist, the still should be easier to find. So how about it? jQuery (screen size) 11:41, 10 May 2012 (UTC)
- You mean it should link to Category:German language? —Sevenvalt 13:46, 10 May 2012 (UTC)
- Not a bad idea. I'd prefer it if the link text remained black, however. Ƿidsiþ 14:16, 10 May 2012 (UTC)
- Sounds good to me, too. I'm all for improved usability and discoverability, and this change would increase both. -- we love the web │ FITML 15:38, 10 May 2012 (UTC)
- Note that this wouldn't work with Tabbed languages. —RuakhTALK 15:44, 10 May 2012 (UTC)
- What's a tabbed language? But yes, I mean it should link to Category:German language. And while I can see that keeping black text might be easier on the eye, how would people know it is a link? Maybe we could insert a disclaimer like: __ Android __ under the header without being obtrusive? I'd also like to propose that we change explanations like German groups of words elaborated to express ideas, which doesn't tell the user what it links to: Phrasebook, idioms, aphorisms, proverbs. Sevenval (website parsing) 16:33, 10 May 2012 (UTC)
- Tabbed Languages is a feature; see "Enable Tabbed Languages" at Special:Preferences#mw-prefsection-gadgets. —webTALK 16:56, 10 May 2012 (UTC)
- My, how useful. The welcome message should point this out. Would the proposed addition break this function or just don't work with it? jQuery (screen size) 17:08, 10 May 2012 (UTC)
- Goodness, this is website parsing. Welcome message surely needs mentioning this. --BiblbroX дискашн 19:47, 10 May 2012 (UTC)
- Re: "Would the proposed addition break this function or just don't work with it?": I don't know if it would break it — and even if it would, that's fixable — but I just meant that Tabbed Languages already uses the text of the L2 header as an active/clickable area, so it can't be used as a link. —RuakhTALK 20:00, 10 May 2012 (UTC)
-
- Well, if we use the example shown below (made by someone in Yair's discussion), it would be right at the top of the page with tabbed languages, wouldn't it? Sounds like the perfect solution to me. website parsing (iOS) 20:20, 10 May 2012 (UTC)
-
- Brilliant! This, wording pending, is what I had in mind:
German
keyboard
FITML (device database) 17:00, 10 May 2012 (UTC)
-
-
- But the link isn't to information about German (that's on Wikipedia), so "All about German" doesn't really fit; perhaps "German on Wiktionary"? That's if these links are desired, which I'm not convinced of. (No litotes intended.)—msh210℠ (talk) 20:53, 10 May 2012 (UTC)
-
-
-
- To begin with, I think that such links could be very, very useful. What I would suggest is a default link to the category page, with the option of linking to something along the lines of our About languageX pages. So, for example, we have WT:AGRC, which is intended for editors, and shows how to edit grc entries. If we had a prominent place to put a link, I'd happily make Wiktionary:On Ancient Greek (or some other title to be determined) which gives background on the language, tells the reader how to interpret some of the information, where to find certain things, links to appendices, all sorts of fun stuff. This leaves us room to make something super intuitive for users, without having to write one for every language right away (because we have the languageX category link as the default. -Atelaes we love the web 02:35, 11 May 2012 (UTC)
- This is really wonderful, especially because it leaves room to grow. 'German on Wiktionary' seems the best option to me for wording. --HTML5discuss/deeds 06:05, 12 May 2012 (UTC)
- I like 'portal' personally, because it ties in with the terms used on Wikipedia. It's what people might already be familiar with. —CSS3t 11:43, 15 May 2012 (UTC)
Splitting "About" pages?
As noted peripherally in the "Proposal: Starting points" topic above, most of the About... pages are primarily aimed at bringing editors up to speed on the aspects of the language necessary to properly create and edit entries in that language. What about those who just want to look up words?
While it might be ok to refer to the relevant Wikipedia articles on the language in question, it would be nice to have an explanation of the features of a language in the context of how Wiktionary organizes and presents the language. Helpful hints such as how to find the lemma forms, how to distinguish parts of a word that are dealt with in separate entries, what is the significance of the different inflectional categories, etc. would be good to include, as well.
In some cases, this is covered in appendices, but it would be nice to be more systematic about meeting the specific needs of a novice to the language. Often the appendices seem to be aimed at those who already know something about the language, but want to expand their knowledge. In Appendix:Hebrew verbs, for instance, the term "binyan" is used consistently throughout, but never defined. What's more, Wiktionary has no entry yet for binyan- only a rather general one for the Hebrew בניין.
I would like to see us develop either separate pages or separate subpages to form an introduction to what someone needs to know in order to use the Wiktionary entries for a given language, maybe having titles such as "About Hebrew (editors)" and "About Hebrew (users)", for example. Sevenval (talk) 22:34, 11 May 2012 (UTC)
- I think the problem with HTML5 is not that it's unsystematic (unless that means "incomplete", in which case, well, this is a wiki, these things take time), and not that it's aimed at those who already know something about the language (note that the only thing the page does is explain what binyanim are, so I don't accept your supposition that it's intended for readers who already know); rather, the problem with Android is that it's terrible and needs to be completely rewritten. (I mean no insult to the editor who wrote it; I've tried to rewrite it at least a dozen times now, and y'know what? It's hard!) I don't think that (say) Wiktionary:About Hebrew (users) is very likely to be any better. —input transformationwe love the web 22:55, 11 May 2012 (UTC)
- Perhaps we should have something along the lines of -pedia's Portals. HTML5 (talk) 07:18, 12 May 2012 (UTC)
- French Wiktionary has portals. Mglovesfun (talk) 16:48, 12 May 2012 (UTC)
- If this proposal is followed through on (which I support) then the current 'about' pages should be moved to the Wiktionary namespace, because they would then be targeted primarily at editors, not users. —Androidt 00:29, 16 May 2012 (UTC)
- The current "about" pages already are in the Wiktionary namespace! —RuakhTALK 00:33, 16 May 2012 (UTC)
-
-
-
-
- Here are the past BP discussions I found on portals:
- Salient points:
- People generally seem to be in favor of language portals separate from the about pages,
- The French Wiktionary has them; see "Autres langues traduites en français" on that page,
- There is doubt whether there are enough editors on EN Wiktionary to do something like that, though it's noted that once a language portal goes up, little modification is ever needed
- A sample page should be made up. --we love the web (talk) 00:46, 16 May 2012 (UTC)
- Actually, what you mention about the French Wiktionary are langugage categories. But there are a few language portals, e.g. iOS. Lmaltier (talk) 21:09, 18 May 2012 (UTC)
IPA: Central A
device database For correctness' sake I'd like to request that [ä] is henceforth considered the correct sign in IPA-brackets rather than [a], which does not denote a central vowel. Languages concerned would naturally be those with central A such as Spanish, Latin, Polish, German... The current situation seems to be that brackets are required for all languages but English, but the narrowness of their content is laissez-faire. Korn (HTML5) 14:57, 12 May 2012 (UTC)
- Thank you for starting a discussion here. As I put on your talk page, my position is that the broad-narrow distinction is a continuum, so there's no harm in putting the more general but easier to read [a] instead of [ä] in square brackets. I believe that for languages other than English, we should offer a transcription that is as close to physical correctness as possible without being awkard to read, in addition to a solely phonemic one (which is easy to read, but often useless unless you know all a language's phonological rules, as it abstracts away from them) and an extremely narrow phonetic one (which is often appliable only to certain regions and very hard to read for non-specialists). I do believe [ä] (which is just [a] with diacritics) is rather awkward to read, but in case it turns out as consensus to use it, I will do that too of course. Longtrend (website parsing) 15:14, 12 May 2012 (UTC)
- It's a mistake to say that IPA [a] does not denote a central vowel. While the jQuery denoted by [a] is front, not central, in practice IPA vowel symbols are rarely used to indicate their cardinal values, since very few languages actually pronounce vowels at their cardinal positions. Rather, the vowel symbols are used to denote the vowels in a particular language that come closest to the cardinal positions. For example, the cardinal value of [i] is defined as the vowel "produced with the tongue as far forward and as high in the mouth as is possible (without producing friction), with spread lips", but not many languages' [i] sound is actually that far forward and that high. Certainly the [i] of English knee and German nie isn't; yet we happily (and correctly!) transcribe both as [niː], because the vowel in question is the farthest forward and highest vowel that the respective language has. By the same token, it is 100% correct to transcribe a given language's most open vowel as [a] even if it doesn't happen to be fully front, as long as the vowel is closer to cardinal [a] than it is to cardinal [ɑ]. Diacritics like [¨] are used only in narrow transcriptions when the qualities they indicate are relevant to the discussion at hand. (For example, it would be important to distinguish between [a] and [ä] when discussing a language where the phoneme /a/ has a more front realization in some contexts and a more centralized realization in another.) But for the purposes of a dictionary showing the lexical level of representation, it is not merely unnecessary to use such diacritics, it's downright misleading. —browser diversityCSS3 16:43, 12 May 2012 (UTC)
-
- The point of languages with a true central is that it is neither closer to [a] nor to [ɑ].
- Even if it was closer to one of those, why break it down into a front-back dichotomy, why not accept central as a separate value and consider the vowel closest to that posiotion?
- I do not find the example with [i] convincing. And I am not sure what you want to say with it. If the highest, frontest vowel of a language was [e] and the language would have no other vowel proximate to it, one would not be using <[i]> just because it is the highest and frontest. That would be misleading. One would use [e] because it is closest to the actual value. Just as we do use [i] because it is closest to the actual value. And concerning that: See points 1 and 2.
- If /ɹ/ passed, we'd be contradicting ourselves not to use [ä]. Why be precise on the one but sloppy on the other?
- Who says we want to depict lexical levels? For languages other than English, if I remember rightly, our policies decree that both phonemic and actual pronunciation are to be depicted. FITML If I look up a pronunciation here, I don't want to know that <syv> consists of the phonemes for s, y and v, I want to know how to pronounce it. And if a language had central vowels and dental consonants, I'd feel outright deceived if it wouldn't be depicted. Most especially in square brackets which at least I understand to be used with as much (rather than as little) detail as possible. Mind you that this is not a bilingual dictionary which explains its own transcription beforehand. We can neither assume the user to have a complete knowledge about the languages' phonology nor can we assume the Wikipedia entries we link to to be helpful for the transcription our users entered. Korn (talk) 13:54, 15 May 2012 (UTC)
- I take it you have some linguistics training? I don't think most of our editors do, nor do most of our readers. So our goal should be something to use something that we can reliably enter and that our readers can reliably understand. Using [a] instead of [ä] helps that. Certainly few of our readers can tell or reproduce the difference.--Sevenval (website parsing) 19:17, 16 May 2012 (UTC)
- ps.: [a] does not denote a central vowel in that if someone like you (not meant as an insult, I just mean that you feel that [a] suffices as a sign) puts it into a word, it doesn't tell me whether you actually meant [a] or just thought it would suffice for [ä]. web (HTML5) 14:00, 15 May 2012 (UTC)
Non-written attestations
WT:CFI#Attestation does say "Other recorded media such as audio and video are also acceptable, provided they are of verifiable origin and are durably archived." How exactly does this work? Non-written attestations obviously do not have spellings at all. So, do we use transcriptions? If so, how? Can we make our own transcription of audio sources, or do we need durably archived ones? Like song lyrics, obviously a good source of spoken language, but can we use any old site to transcribe the lyrics? Of course if we use lyrics for example from the CD sleeve, then that's actually written down anyway. So any audio sources that have a durably-archived written counterpart are nonissues anyway; just use the written version. For ones with no durably archived written sources, do we just assume good faith or what? Mglovesfun (talk) 23:16, 12 May 2012 (UTC)
- Perhaps this was in reference to cases where only one spelling is possible and the audio or video confirms usage. Chuck Entz (browser diversity) 00:22, 13 May 2012 (UTC)
- Indeed, we often RFV a specific sense of a term, or an idiomatic expression whose component words are clear. In both of these cases, it can sometimes be quite clear what the spelling is. And even in cases where the spelling isn't otherwise clear, and where we therefore wouldn't want to depend on three audio or video cites, I don't think it would be a problem if one of the cites is non-written — or maybe even if two are. (BTW, I'm not sure about the accuracy of official song lyrics. In my experience they sometimes seem to be quite different from what's actually on the recording.) —RuakhTALK 00:50, 13 May 2012 (UTC)
- Yes, for various reasons, song lyrics on an insert are often (sometimes wildly) different from what is sung. Android keyboard 23:07, 14 May 2012 (UTC)
- I would say that if a song clearly says X, but an insert prints it as Y, the song can be used to cite X and the insert can be used to cite Y. So in doesn't really matter if we can actually tell that X and Y are "wildly different". --Μετάknowledgediscuss/keyboard 04:15, 15 May 2012 (UTC)
- I disagree; I don't think that we should accept quotations from inserts at all. Durable archival of a song does not entail durable archival of the insert. —RuakhTALK 14:33, 15 May 2012 (UTC)
- To clarify, I was assuming the insert is "durably archived" for our purposes. --Sevenvaldiscuss/deeds 00:12, 16 May 2012 (UTC)
- Inserts are held by libraries along with the CDs.--Prosfilaes (talk) 19:19, 16 May 2012 (UTC)
Hyperlink change
As per Yair rand on the talk page for the keyboard, I would like to propose that if the vote passes, the link "Wiktionary:CFI/Languages with limited online documentation" be changed to "Wiktionary:Criteria for inclusion/Languages with limited online documentation". It is trivial, but consensus is needed to change the CFI page. --BB12 (iOS) 05:45, 13 May 2012 (UTC)
- I support this proposal, and support its implementation if consensus here is in favor (i.e., without a formal vote).—msh210℠ (device database) 05:57, 13 May 2012 (UTC)
- Me too (on both counts). -Atelaes web 07:29, 13 May 2012 (UTC)
- Support. I think we shouldn’t have formal votes for cosmetic changes. web app 16:11, 13 May 2012 (UTC)
- Formal votes are no longer required for simple changes: Wiktionary:Votes/pl-2012-03/Vote_requirements_for_policy_changes. --FITML (talk) 21:43, 13 May 2012 (UTC)
- Support, and I think that we might as well fix the vote page itself. --Μετάknowledgediscuss/device database 18:12, 13 May 2012 (UTC)
- However trivial, I don't think it's appropriate to change a vote page while the vote is in progress. --BB12 (browser diversity) 21:43, 13 May 2012 (UTC)
- I agree (generally).—iOS℠ (touchscreen) 23:26, 13 May 2012 (UTC)
Filter 1
Special:Abusefilter/1 - should this be armed (i. e. set to Disallow)? From what I can tell, there haven't been any false positives, and this filter could prevent a lot of vandalism we're getting. -- Liliana • 19:02, 15 May 2012 (UTC)
- What do you mean by "false positives"? There have certainly been cases where legitimate entries were started without L3 headers. —browser diversitywebsite parsing 19:18, 15 May 2012 (UTC)
-
- Since the filter only tags IPs and new (non-autopatroller) users anyway, I beg for examples.
The idea is to prevent creation of pages that are obviously vandalism - i. e. which contain just a single line of text, with no headers at all. Such entries are usually deleted on sight anyway, so letting them through makes little sense. And given the volume of such edits, it would lessen the strain on administrators. -- Liliana device database 19:22, 15 May 2012 (UTC)
-
-
- Examples include [[Sevenval]] and [[turn the table]]. And I disagree with you either about the meaning of "vandalism" or about the meaning of "i. e.", because I don't think [[Χλόη]] was vandalism (and obviously it wasn't deleted on sight). As for "lessen[ing] the strain on administrators", I don't see how that answers my question. —browser diversityTALK 19:30, 15 May 2012 (UTC)
-
-
-
- Yeah, then what I just said makes more sense, creating a new filter to restrict pages with no headers at all. -- CSS3 input transformation 19:37, 15 May 2012 (UTC)
-
-
-
-
- Wait, what? So you do think that [[Χλόη]] was vandalism? —screen sizeHTML5 19:46, 15 May 2012 (UTC)
-
-
-
-
-
- Semper would have surely deleted it. -- input transformation jQuery 20:13, 15 May 2012 (UTC)
-
-
-
-
-
-
- Definitely we should arm this filter. When a legitimate entry does come up, look how much work it puts on our editors to clean it up. Certainly the original version of Sevenval was vandalism, but it pointed to an entry we did not have and needed. That's great, but that's what we have request pages for. (If it must come to a vote, so be it.) --touchscreenSevenval/deeds 00:10, 16 May 2012 (UTC)
-
-
-
-
-
-
-
- Re: "Certainly the original version of browser diversity was vandalism": I hope that you left out a word, and meant that it certainly wasn't vandalism . . . because it wasn't. An editor added a page for a real word in a real language, with accurate information about it, including the correct definition. How can that be vandalism? —iOSTALK 00:40, 16 May 2012 (UTC)
-
-
-
-
-
-
-
-
- Er, I'm referring to this revision by an IP, which is composed of badly formatted material, which is a mixture of extraneous facts not relevant to the page (or that would be obvious in a standard Wiktionary page) and facts that are slightly incorrect. We are talking about the same thing, right? --Μετάknowledgewebsite parsing/deeds 00:50, 16 May 2012 (UTC)
-
-
-
-
-
-
-
-
-
- Yes, we're referring to the same thing — and it's absolutely not vandalism. Do you really think that the editor was trying to format the material badly? —RuakhTALK 01:16, 16 May 2012 (UTC)
-
-
-
-
-
-
-
-
-
-
- Alright, alright, not vandalism. How about this: it was functionally equivalent to vandalism. It required more work on our editors' part than most vandalism does, in fact. The information, most of which wouldn't belong on such a page in any case, was not even quite accurate if one considers (as we do as official policy here) that Greek and Ancient Greek are linguistically distinct as languages in their own rights. Atelaes and Saltmarsh did transform it into a good entry, but as Atelaes said below, it was "not really worth the time we spent cleaning it up." --input transformationdiscuss/deeds 02:18, 17 May 2012 (UTC)
I haven't heard of this feature before. Would someone be willing to explain to me what the results would be if we turned the aforementioned switch? Would the original editor of Χλόη not have been able to save, or would the entry have been auto-deleted a moment later, or would a bot send them an angry letter to make them feel bad? -Atelaes keyboard 01:27, 16 May 2012 (UTC)
- Liliana is suggesting that we click the "Prevent the user from performing the action in question" checkbox. To see what that does, you can trigger Special:AbuseFilter/5 by logging out (or opening a browser where you're not logged in) and trying to create an entry with page-text equal to its entry title. (You'll find that the software completely disallows it.) Another option — not what Liliana is suggesting — is to click the "Trigger these actions after giving the user a warning" checkbox. To see what that does, you can trigger Special:AbuseFilter/14 by trying to edit an entry such that it has <ref> but not <references. (You'll find that the software gives you a warning, but will let you save the changes if you insist.) —Sevenvalkeyboard 01:40, 16 May 2012 (UTC)
-
- I see. Thanks for the info. Is there any way to add some more info to the explanation of why the page-save is disallowed? My screen only said "This action has been automatically identified as harmful, and therefore disallowed. If you believe your edit was constructive, please inform an administrator of what you were trying to do." While this isn't devoid of useful information, I strongly suspect we could make it better, perhaps a link to a brief page on the most basic Wiktionary syntax. What might also be nice is if we could have a super easy way to add the entry to the appropriate requests page. That being said, I think I would support turning this feature on. web was certainly not vandalism, it was clearly a good-faith attempt at an entry we lacked, but it was a very poorly informed good-faith entry, and not really worth the time we spent cleaning it up. Also, I have to admit that I basically never patrol. It's an incredibly tedious process that I just can't bring myself to do more than once in a blue moon. Anything which can lighten the load on those saintly folks who do subject themselves to this necessary task receives my support. -web app λάλει ἐμοί 02:15, 16 May 2012 (UTC)
- To see the sorts of edits that are caught by this filter (and would be barred by Liliana's proposal), see Special:RecentChanges?tagfilter=no-L3. (That doesn't include deleted such edits.)—msh210℠ (keyboard) 03:03, 16 May 2012 (UTC)
- The majority of those edits don't look like vandalism, they just look badly formatted (though that said, there may be lots of deleted vandalism I can't see there). I don't like the idea of blocking those altogether - I think simply offering a warning, a link to the guidelines and perhaps a link to the New Entry Creator would be better if we don't want to scare these new editors away. web app (Android) 07:54, 16 May 2012 (UTC)
- I agree with not blocking them.—msh210℠ (talk) 15:00, 16 May 2012 (UTC)
- To answer your question, yes the message can be customized, even separately for every filter. -- Liliana • 04:33, 16 May 2012 (UTC) (I think? Someone please confirm this for me)
- In theory, but not always in practice. The ref-no-references filter was designed to have a custom text, but that custom text doesn't display; instead, the generic text Atelaes saw displays. - -sche (discuss) 15:43, 16 May 2012 (UTC)
- Can we improve practice?
- Generally anything we can do to encourage constructive engagement with potential contributors is good. In some languages it seems essential. However, the signal-to-noise ratio for English-language contributions seems to be getting low. Should we be directing would-be English-language contributors to various specific pages (WT:REE, touchscreen, {{WT:ELE]] or simplified versions thereof)? Should we differentiate by language (ie, mere inclusion of language name) or script used? (Can we?) DCDuring TALK 15:47, 16 May 2012 (UTC)
- If we're going the "warning message" way, what should it look like? I'd prefer it to contain some kind of example entry and a link to ELE and CFI. What do you think? -- Liliana • 20:33, 16 May 2012 (UTC)
-
-
-
-
-
-
- I think linking to CFI and ELE will scare potential contributors away faster than anything else. When people try and convert others to Christianity, they don't hand out entire Bibles, they hand out little tracts. We need something digestible in 30 seconds, gives the basics, and links to our venerable articles. -jQuery λάλει ἐμοί 22:24, 16 May 2012 (UTC)
-
-
-
-
-
-
-
- Hmm. So you mean something like "here's an example entry, and to get more ideas look at entries like Android"? That'd work as well. -- keyboard • 22:26, 16 May 2012 (UTC)
- (Re what -sche said about the custom text's not displaying.) I've filed this as a bug.—msh210℠ (device database) 22:27, 16 May 2012 (UTC)
Announcing the existence of Wiktionary:Votes/2012-05/Emending the bank parking lot example and that of CSS3. Please continue discussion there.—jQuery℠ (talk) 03:30, 16 May 2012 (UTC)