-
Notifications
You must be signed in to change notification settings - Fork 21
Description
I have the following HTML snippet:
<ul class="addetailslist">
<li class="addetailslist--detail">
Art<span class="addetailslist--detail--value" >
Weitere Kinderzimmermöbel</span>
</li>
<li class="addetailslist--detail">
Farbe<span class="addetailslist--detail--value" >
Holz</span>
</li>
<li class="addetailslist--detail">
Zustand<span class="addetailslist--detail--value" >
In Ordnung</span>
</li>
</ul>These are 3 different attributes:
- "Art" (en: Type) with value "Weitere Kinderzimmermöbel"
- "Farbe" (en: Color) with value "Holz"
- "Zustand" (en: Condition) with value "In Ordnung"
My current attempt to parse this looks like this:
type Ad struct {
Details []string `goquery:".addetailslist--detail--value,text"`
[..]
}
var CONDITIONS = []string{"Neu", "Gut", "Sehr Gut", "In Ordnung"}
var COLORS = []string{"Beige", "Blau", "Braun", "Bunt", "Burgunderrot",
"Creme", "Gelb", "Gold", "Grau", "Grün", "Holz", "Khaki", "Lavelndel",
"Lila", "Orange", "Pink", "Print", "Rot", "Schwarz", "Silber",
"Transparent", "Türkis", "Weiß", "Sonstige"}
[..]
for _, detail := range advertisement.Details {
switch {
case slices.Contains(CONDITIONS, detail):
advertisement.Condition = detail
case slices.Contains(COLORS, detail):
advertisement.Color = detail
default:
advertisement.Type = detail
}
}So, this works, kinda.
But the obvious problem is, that it will fail if there are overlappings (e.g. a Type occuring as a Color) or if the site adds or removes values. I'd have to constantly monitor these lists and update my code.
As far as I understand the DOM, the attribute names "Art" or "Zustand" are just text values of the <li> elements. Of course I might use manual go code to parse this (using a tokenizer or regexes). But look how the string looks if I extract the whole text of the list using goquery:".addetailslist,text":
Art
Weitere Kinderzimmermöbel
Farbe
Holz
Zustand
In Ordnung
I could try to trim it and parse it line-wise. But how stable would that be? Any tiny change might break my code.
Maybe there's a better way, do you have an idea?
any help would be much appreciated!
Tom