SCRF Analytics and Tag Mining?
In my day job, I do text analysis with an unsupervised learning algorithm called “topic models.” These models basically take texts and allow you to cluster them automatically. After clustering them you can discover latent topics contained within the texts and label those topics appropriately. The model can also automatically assign labels to texts.
I was thinking: would you all be interested in having me run all the SCRF posts through a topic model to see what kind of new labels/tags might emerge? The cool thing about this is that once the code is written, you can continuously estimate new models to discover new tags and, at least in theory, automatically apply those tags proactively and retroactively.
Also topic models have some cool plots associated w/ them that can be posted on SCRF. Here’s an example of one such plot using tweets: https://alexisperrier.com/assets/LDA_topic_7.png. The bubbles represent how many documents fall into a topic and the distance between them represents how similar they are in terms of keywords.
@jmcgirk @zube.paul @Larry_Bates @Rich @eleventh
That sounds AMAZING!
We have finally hit a point where we have enough material to have something like this done. I would have been hesitant 6 months ago in that it might have been premature and limited the scope, whereas now I believe this type of analysis would be extremely beneficial to get an iterative round of statistical analysis rather than just human-inferred direction.
Thanks @Larry_Bates! @Rich is there a SCRF API that I can connect to to collect posts and put them into a database?
A few more that have come up in engagement for everyone’s consideration:
- Artificial Intelligence / Machine Learning / Deep Learning
- Use Cases
- Case Studies
- Incentives / Incentivization
Yah, I think we can try to augment it a bit as I’m not convinced that the BIS is willing to provide a balanced definition considering their public positioning in the past.
Good old investpedia has a pretty decent definition I think:
The term central bank digital currency (CBDC) refers to the virtual form of a fiat currency. A CBDC is an electronic record or digital token of a country’s official currency. As such, it is issued and regulated by the nation’s monetary authority or central bank. As such, they are backed by the full faith and credit of the issuing government.
Discourse exposes hooks to almost everything.
This seems to work:
curl -X GET "https://www.smartcontractresearch.org/posts.json" | jq
If you need anything more detailed i think you can generate an API key from your user account. Ping me in chat if you have any problems.
Awesome, thanks Rich! Looking forward to digging into this. I’ll post the Python notebook that I used to create the model as a link. Hope it leads to something interesting.
Revising some new proposed tags given feedback.
||Central Bank Digital Currency
||A CBDC refers to the virtual form of a fiat currency. A CBDC is a programmable electronic record or digital token of a country’s official currency that issued and regulated by the nation’s monetary authority or central bank. Source
||A subfield of ethics that examines the moral and social impacts of blockchain-based technologies which include, but are not limited to: cryptocurrency, non-fungible tokens (NFTs), decentralized identity, decentralized autonomous organizations (DAOs), smart contracts and oracles.
Thanks for solidifying the CBDC definition. I now have some confidence that is in strong enough shape for the glossary. Would you be willing to make the pull request to the Glossary? Unless @Rich objects.
Regarding the ethics definition, I’m not sure that the source clearly defines blockchain ethics as being a distant thing. I know there has been a lot of discussion about this, but I still feel like I’m reading applied ethics to various realms of blockchain more than anything, unless I am misreading something. I’m going to try to get some more people to weigh in on this also. My primary reservation is that this is so broad that it doesn’t have much tag utility because we could be discussing ethics in each and every summary. Additionally, I’m not sure we have made a case here for Blockchain Ethics being a distinct concept.
I don’t know if this has been discussed already somewhere else, but wouldn’t it be very convenient if there was a way to directly link a common term that researchers add in the background section of summaries to the GitHub glossary? I’ve noticed that people writing summaries often re-attempt at writing a definition of common terms (e.g. DAO) from scratch and since there exist agreed-upon definitions of these terms, this might be redundant.
This could make the writing process faster, as well as provide the reader with more content if they wish to dig deeper.
This is a great idea! Thank you for suggesting it. Not only does this address the writing efficiencies that you’re talking about, but I think it also amplifies the value of the glossary SCRF is working on producing and updating.
There must be a way to do this easily. I’m going to start doing some looking, but anyone who has more GitHub repo experience might already know the way to do this. I suppose we also could just be linking to the source from the Glossary, but I like your idea better.
This is a succint and simple navigation guide. It’s absolutely easy to follow and understood with perfect clarity. Kudos to the team
I read through a few more posts ( 1. 2. 3. 4. 5. ) and came up with these:
- user-risks / security / risks
- deanonymization / identity (anonymity currently exists)
- ens (Ethereum Name Service) / domains
- stablecoins / stability / death-spiral
- economics (game-theory exists but doesn’t apply to all topics)
- automation / IIoT / IoE / industry / industry 4.0. (IoT exists)
- standardization / standards (Interoperability currently exists)
- artificial intelligence / AI / Machine Learning / ML /Deep Learning
- open-access / open-source
I cross-checked the terms with the ones we have in the glossary to avoid repetition.
Which ones do you think we should consider adding?
Which ones do you think we should consider adding?
My fundamental concern is that keyword search sucks past a threshold volume of text corpus. I propose using DLTontological map below based on the ACM classification code. This means searching for wallet will disambiguate between (wallet isa DTLcomponent) and (wallet subclassOf accessControl)
It’s a bit of an upfront PITA but as the TL;DR summaries grow, will allow more powerful and pinpoint searches.
This is a good list. We should also be adding definitions to these and sources to those definitions. Do you have any start on those?
@drllau The same question here. Any of these that you think are critical to have, the definition, and source of that definition?
@jasonanastas did a pretty good job of conveying that in this thread, so it might make for a good example
Yes, I like the format in the example. A modified copy of my earlier post is below.
Since it is a large list, we might later consider having a separate document (similar to the glossary) where we aggregate all previously proposed tags and definitions until approved.