Umbrella IT Talks About Neural Networks and Text Detection Application

Umbrella IT Talks About Neural Networks and Text Detection Application

Umbrella IT

Sultan, our Head of the Mobile Development Department, told us about the Text Detection application, neural networks, and artificial intelligence.

Interviewer: Hello, Sultan! Please tell us about the Text Detection app.

Sultan: We can say that the Text Detection project is fully based on the operation of neural networks. Surely, this is not about machine learning, but about the use of ready-made and already trained modules. Interaction with the neural networks is always fascinating, while approaches can be different.

The Text Detection app is simply an integration of two different approaches. On the one hand, Vision, and on the other hand, Tesseract. Two neural networks. We work with them exclusively as consumers. No deep learning based text detection operations were required: we simply used these neural networks to give them input and get output. For example, Vision helps us recognize characters and their location, after that a camera viewfinder is masked – with colored rectangles. Blue rectangles are letters, red ones are words.

The first screenshot shows the result of Vision. Vision is a part of iOS, a part of SDK, and comes “out of the box” for iOS 11 and older. Vision is a framework that contains some ready-made models of the neural networks, it combines all working principles of the neural networks – Vision can do a lot of things, it has a full list of features, including recognition.

Interesting fact: the point is that Vision and MLCore are already native components, in other words, there is no need to use a third-party library from some strange people or a paid service. Moreover, such solutions work offline, which expands the scope of their application and processing speed.

Text Detection also uses a third-party approach through a third-party open source Tesseract. This is a very well-known service. If you need to convert a text in a picture into a comprehensible text, then Tesseract is exactly what you need! It has many various APIs, suitable both for mobile apps and web.

The second screenshot illustrates the result of Tesseract’s text detection in images. By and large, we can’t influence this anyhow, we just upload a frame from a camera buffer to Tesseract, and then it either recognizes the text on the image successfully or not. It is fair to say that recognition often fails, because the technology is imperfect, and the result depends on many factors – the quality of a source image, lighting, font and angle of shooting.

The trick is that the Text Detection app is very simple and essentially consists of two functions; in fact, nothing more was ever planned. This is not an end-product, it's just a demo, the main purpose of which is to demonstrate the options for image and text detection and recognition, as well as to analyze what Apple has added to iOS 11 in terms of machine learning and neural networks for text detection.

I: Will the app’s image and text detection functions be improved anyhow?

S: No, it won’t. It was a summer evening. One of our project managers said that the customer set the task to get an app that could recognize text and convert it into a readable form.

The PM asked: “Is it possible to implement this?” I replied yes, it was. These are 2 different tasks, but if you combine them, you will get a solution. And I did it. Literally, it took me a couple of hours. I did and showed it. And he said: “Whoa, my goodness! How is it possible to make a text detection app!” Later, we shown it to our solution architect Kate and the customer. Everybody checked it out and oohed. And, basically, that was that. The text detection can be improved using deep learning, but the app has never meant to be anything global.

When all the talk started about the GitHub publication, I immediately recalled the Text Detection app, because all our other projects are very complex. We always do a lot of projects for our customers, so we have no time to develop any interesting things for us.

So, in about 4, I have shared with you all the basics of the Text Detection app.

As for in-house developments of the Mobile Department, I would rather mention such an interesting and refined thing as iOS Clean Architecture with Coordinator pattern. Our developer really did a great job and presented a very interesting implementation of the internal architecture for iOS apps. At the same time, it will not delight any average user. Such things can’t be sold to a customer since there is no nice way to present them. This is one of the paradoxes of the IT industry.

A reader of letters on the screen, created in 2 hours, will seem an amazing product to someone. At the same time, iOS Clean Architecture with Coordinator pattern is indeed an impressive thing for developers, but it's an absolutely unsaleable product. That's the dilemma.

That's why I want to step aside from the conversation about the app (in the end, it's quite simple and consists of a partially reused code) and raise the issue of neural networks and machine learning, in general. The neural networks are our future and present, they are already everywhere, though we are not always aware of this. The Text Detection app is just a small demonstration of the way the neural networks work, nothing more.

I: Text Detection is like a small drop in the ocean of neural networks.

S: Yes, we have used the new functionality of iOS 11 in it – the ability to work with neural network models at the native level and offline. These calculations are performed by a graphic subsystem through Metal API, which, though they may not appear so to onlookers, are important and promising innovations, increasingly focused on framelessness of the new iPhone. I don’t know whether it makes any sense to talk about the neural networks, neurons, and their training because this is public information, many people already know it.

I also can’t help but mention that some people consider the neural networks to be the artificial intelligence, or at least, a big step towards the artificial intelligence.

I: Please, tell me what you think about it?

S: I see fundamental differences between a full AI and a modern neural network. I believe, at present, there is no artificial intelligence. At all. Paradoxical as it may be, the term is used all the time – smartphones are released, a new Huawei is released, which has a shell supposedly fully based on the artificial intelligence and stuff.

The whole point is that the concept of the artificial intelligence is somewhat blurred. Modern sophisticated algorithms, trained through the use case approach, where everything is based on the analysis of huge arrays of empirical data, can very quickly and effectively solve some problems. To beat a man at chess or even at Go is no longer a problem. And these systems are partly AI, however, their problem lies in the narrowness of their specialization.

A fully-fledged artificial intelligence must be able to solve all cumulative problems encountered by the intellect of a living person. However, from the mathematical point of view, the complexity of such algorithm is incredible and now such systems are possible only in theory.

It is even funny to talk about it, but modern neural networks, naturally, have no self-awareness, while self-awareness is a component of the artificial intelligence.

I. Actually, it is something like an attempt to mirror the way a biological process happens, perhaps, the same biochemistry.

S: Yes, but of course, there is no intelligence behind this.

I: I agree it is simply automatic. Much as a person walking down the street does not think about how his movements are happening.

S: Something like that. Many people admire the neural networks. A neural network can, say, find a cat in any photo, but there is no intelligence behind it. All the neural networks are based on machine learning and... these all are just comparisons. Simply put, it happens as follows: 10,000 images of cats are uploaded to the neural network, and now it can find cats, but it can’t find dogs at the same time, and again it can’t be conscious of self.

If the neural network first recognizes cats, then dogs, then elephants, then... this does not make it smarter, in the broad sense of the word. It just acquires more instructions for recognizing some objects. And all the neural networks come to this. And with the current development, I would not say that it gives them intelligence. In my opinion, at this stage, there are no serious preconditions for this.

As I see it, the artificial intelligence, most likely, will not become a product of modern classical neural networks. A fundamentally different approach is required here. At present, the neural networks are just a complex algorithm for accomplishing routine operations.

It should be noted that the development of the neural networks is also a fairly routine operation because the uploading of 10,000 cats is done manually.

We are talking about some kind of automation, we can’t say to the neural network: “Look for cats!” And that’s all. It does not know them, so everything needs to be prescribed manually and many people find it strange – how so? In our technologically advanced age... to upload cats manually... Yes, that’s the way it is.

I: I would say that intelligence is the ability to draw more subtle distinctions. Let's say an apple can be described in terms of its shape or color. For example, to describe the color of an apple, you can list many shades, or you can poetically describe the taste of an apple, etc. Thus, in my view, the manifestation of intellect is not just when a person says that an apple is a meal or a fruit; intellect is manifested when a person can describe an object comprehensively. Am I right?

S: Yes, you are. Usually, the neural networks don’t manipulate any complex sets of properties. If we talk about visual neural networks, such as in Text Detection, the neural networks rely on a certain visual image.

Roughly speaking, some part of the neural network gives a signal, with a degree of probability: “Well, probably, this is an apple, with the probability of 40% this looks like an apple”, the other part of the neural network says: “It doesn’t look like an apple”, eventually, they give the result of 73%, which means that the depicted object is an apple. This how it basically works (although in order to improve this level of accuracy it’s possible to perform text detection in images using deep learning). But despite the fact that I have described the process in a deliberately simplistic way, and in reality, everything is somewhat more complicated, the neural networks actually have no intelligence at all, it's just a complicated algorithmization.

The miracle of the neural networks is that everything happens very quickly. More classical approaches to building algorithms while trying to maintain the accuracy of calculations are much slower. Now, this happens in a snap and it's an achievement indeed, it is very cool.

As a mathematical model, the neural networks are an outstanding technology that really opens up new possibilities for data analysis; however, there is no intelligence behind all this, in the usual sense.

I believe, any person who closely deals with the neural networks could say: “No, it's not that someday it will become SkyNet, so there is no need to be afraid of it now”.

Our Text Detection app is just an example of the way the neural networks operate.

The essence of the machine learning approach is based on making a decision with a certain degree of probability, driven by analysis and comparison of input data with the contents of the neural networks, to find some or other consistency. And image recognition is a particular example of such algorithm.

Earlier, we recognized QR codes, even earlier there had been bar codes. Over the years, our algorithms have become very fast, and currently, they allow us recognizing an object as a whole.

From my philistine point of view, it's not far from the bar code. Therefore, the neural networks will evolve and we will keep using them frequently. There are many options for using the neural networks, and often these are really interesting and exciting challenges.

I: Many people are afraid that machines will enslave people, as in the Terminator movie.

Yes, some people think that we are standing on the threshold of this.

I: Elon Musk has mentioned the danger of the artificial intelligence enslaving people, emphasizing that he himself works in the area of technologies and therefore, knows what he is talking about. May it be just a marketing ploy for him?

S: Yes, by the way, I'm very interested in this issue. Because some people involved in technologies, including Zuckerberg, see no potential threat to humankind in the modern neural networks. Personally, I think that when you look closely, it's all just cosmically far apart – the transition from the current technology to the artificial intelligence capable of awareness, to the sense, and therefore supposedly to some opposition to mankind.

As for Musk’s views – I wonder what information he relies on and why he draws such conclusions. Perhaps, given the pace of modern technology development, he believes that the danger is closer than it seems from the outside. But, in any case, I'm sure it won’t happen in the next 10 years, so everything is ok.

And again, all the subtleties of implementing the neural networks are dry mathematics, some sort of a collective mix, some variation on the subject of human brain, but we can’t create anything like that within machine intelligence, simply because we have not fully explored and got to know ourselves yet.

I: It is rather funny – a man hasn’t got to know his brain, but at the same time he tries to create an analog of the brain, which is very paradoxical. And, theoretically, a man can’t do this, simply because he does not fully know how his brain works. Next, this is projected on machines, but again, it will manifest itself quite differently in the machines.

S: In a way. Another question is that among the people involved in engineering, basically, there are not so many who really strive to create something of the kind.

Maybe from the outside, it looks like this. For example, if I want to create something capable of recognizing a text, then it may be deduced that I will want to go further and come up with something that will write a text, will recognize more texts better, and then, perhaps, will write a story all by itself. However, I don’t want this, neither do those who use Tesseract or Vision. I think, in turn, they don’t aspire to this either.

Engineers treat the neural networks very pragmatically. They have tasks and they solve them. There is no goal to reduce everything to some intellectual growth of the neural networks, the neural networks must simply work.

Currently, the neural networks don’t work very well. With MLCore, I experimented trying to make an app I created to recognize a coffee machine in our office. And, no matter how hard I tried, every time I got the result that it was an ATM with about 70% probability. Though it surely looks similar, it's not an ATM.

I: Yes, it turns out that some visual feature is triggered, while a man perceives things a little differently.

S: Of course, that's the difference of approaches. A man manipulates a complex set of knowledge about an object, while a neural network analyzes, say, a visual image by a banal comparison of the object with its input database.

I: Quite entertaining. You can put a dog's suit on a cat and the neural network will produce the result of “a dog”.

S: The point is that verdict rendering is quite complex. For example, in the Apple face recognition technology Face-ID, the analysis is also performed by complex neural networks. And there you can experiment with the weight of decisions, covering some part of the face – “Oh, it will recognize it!” covering the other part of the face – “Oh, it won’t recognize it!”

There are certain criteria for a successful recognition, and, therefore, a full visual contact with the user's face is not required. If you wear glasses and a scarf – it will recognize you, but if you cover your mouth and a part of your nose with a scarf – there won’t be enough data for analysis and comparison. However, being based on a neural network, this technology is also constantly self-learning, which makes it so convenient to the end user.

As soon as the neural network starts recognizing everything with a 100% probability, it will be able to find a person by his/her photo, or a place on a map using a photo of some forest or mountain. However, it is important to understand that even after learning to make only the right decisions and bringing the comparison-based analysis to perfection, no neural network will ever become fundamentally smarter, in the broad sense of the word, and it will remain just an applied tool.

In conclusion, I would like to come back to what I have already said – despite all the colossal benefits and most importantly, the prospects, the neural networks and the whole approach to machine learning don’t bring us any closer to creating the very artificial intelligence, so often described by science fiction writers in their books. And who knows, this may be for the better for all of us.

I: Thank you very much for your time! It was very interesting!

S: I'm always ready to share my experience and knowledge, so please feel free to contact me again and I'll be happy to help!